Daily Scream

Logic riddle (October update)

A programmer says “I have 2 kids, and the sum of their age is 4”. The logician thinks and says “not enough info”. The programmer adds “The eldest likes Bluey”. The logician smiles and replies “Ah ! You must be using vibe coding a lot”. Alright, I’ve been working on a new project, Goxy: an OpenAI proxy that track & limit spending. You can set an hourly limit (say 1$ per hour) and the proxy will return 429 errors when the spend reaches the limit. That lets you release LLM using projects while being confident you won’t get hit with a thousand-dollar bill at the end of the month. ...

September Update

I just came back from a two weeks vacation in France with my daughter, and without my laptop, which was pretty nice. As a result I don’t have much to say for this month ! After coming back I had an idea for a small project, a “guess who” app that I vibecoded in a few hours with Firebase Studio. It got me a Next.js+tailwindcss app pretty quickly, with LLM calls handled by Genkit. Originally it was using the Gemini API and hosted on Firebase, but I’ve changed it to GPT-5 and Fly.io for hosting. ...

Training a small LLM

So in my [last post about Pilish]({% post_url 2025-08-09-pillmish %}) I mentioned that a follow up to get better results would be to use a LLM with word level tokenization. It’s actually a bad idea in general because the vocabulary size can be huge, and that’s why most LLM these days use BPE or subword tokenization, but I decided to give it a quick try and train a LLM from scratch with word level tokenization. Back in 2023 I used to be quite into finetuning and all that but this year I haven’t done much low level tinkering with LLM so it was a good exercise. ...

Pillmish

A few days ago, I heard about Pilish for the first time. As Wikipedia puts it, Pilish is a style of constrained writing in which the lengths of consecutive words or sentences match the digits of the number π (pi). My second thought (right after “who would enjoy doing that”) was, this sounds easy for LLMs! So I set out to make a minimal proof of concept. Long story short, the trick is to constrain the output to follow the digits of Pi—the same way that structured output (JSON, etc.) works. With Hugging Face Transformers, it’s easy to do with a custom logits_processor. Here, we just constrain the tokens to those matching the desired number of characters at each step. I shared the code on GitHub. ...

August 2025 Update (GPT-5, Veo 3, Gemini CLI, Open Source)

Lot of things going on this summer, but yesterday’s big news was the release of ChatGPT 5. I don’t have an opinion on it yet, but I guess it’s nice to have a single model (at least exposed, even if this is routed under the hood) instead of switching between 4o and o4-mini-high, etc. Recently I’ve been on the market for a “CLI Agent” (synchronous, like Claude Code, not async like Devin and co.). I’ve been using Gemini CLI with good results — as long as I’m within the free PRO requests. As soon as it switches to the “flash” model it becomes useless, can’t edit a file properly, and just loops on itself. I wouldn’t pay for it. Claude Code seems too expensive; I might give OpenAI Codex a try with GPT-5, but I haven’t heard much about it. ...