Production notes from a builder who ships, operates, and occasionally breaks his own AI systems.
Latest
Three ways to hand data to an LLM agent: the Model Context Protocol, a boring REST API with an API key, or a curated Markdown file. Each is right some of the time and wrong a lot of the time. Here's the honest decision tree.
Every MCP server you connect loads its tool schemas into the context window before the first user turn. Here's the arithmetic on how expensive that gets, why most teams never measure it, and how to stop paying for tools the agent will never call.
Gemini 2M and Claude 1M made 'just paste it all' a real engineering option. Here's the cost math, the latency curve, the quiet failure mode of context dilution, and the rule for when stuffing beats RAG — and when it silently hurts.
You tuned the embedding model. You went hybrid. Your RAG still misses. The bug is upstream — in how you split documents. Five chunking strategies, when each wins, and how to actually evaluate them.
Every LLM-powered feature breaks the same way in production: the model returns almost-JSON. Markdown fences, trailing commas, a chatty preamble, a missing closing brace. Here's the 3-layer fix that ships — native structured outputs, Pydantic validation, and json_repair + retry loops.
Every RAG demo shows embeddings and stops there. Real production search almost always mixes keyword and semantic retrieval. Here's what's happening under the hood, why hybrid wins, and a runnable Postgres example in ~40 lines.
Graph databases look like the obvious answer for AI memory — entities, relationships, multi-hop queries. So why did OpenClaw, MemOS, and every shipping system pick flat markdown instead? A contrarian deep dive into the real tradeoffs.
OpenClaw treats AI as an infrastructure problem. This deep dive covers its 3-tier memory architecture, MemOS, hybrid search, automatic memory flush, and what it means for the future of AI assistants.
A practical guide to building persistent AI memory: Memory CRUD operations, post-conversation sweeps, context tree curation, prompt templates, and the unsolved problems nobody talks about.