Software Engineer Blog

Shipping AI isn’t about bigger models. It’s about smaller mistakes.

Production notes from a builder who ships, operates, and occasionally breaks his own AI systems.

Read the notes Who’s writing this

Latest

Fresh off the press

next token predictionhow LLMs generate textautoregressive generationsampling vs greedy decoding

Next-Token Prediction: How an AI Actually Writes Text (Not Magic — Just Probability)

An LLM never sees a finished sentence. It answers one tiny question, over and over: given everything so far, what's the next token? It builds a probability distribution over the whole vocabulary, samples from it (not always the top score), glues the winner on, and re-runs from scratch. That loop is the entire engine — and it explains why the same prompt can give two different answers, what 'personalization' really is, and where hallucinations come from.

July 23, 2026Read the post

All posts

9 articles

Open source

Things I’ve open-sourced

Production-shaped code you can clone, read, and run — the same patterns I write about.

llm-gateway

OpenAI-compatible gateway over OpenAI, Anthropic & Gemini — one endpoint, cost/latency/token tracking, gateway-issued keys, streaming, retries + fallback.

FastAPILiteLLMSQLAlchemyPostgres

ai-engineering-series

Clone-and-run code for the "AI Engineering from Scratch" YouTube series — first LLM call, LiteLLM, your own gateway, and beyond.

PythonuvOpenAI SDKGemini

Vahid Aghajani — Applied ML Builder