RAG fails most often at retrieval, not generation. A practitioner's guide to the architecture, the failure modes, and what production teams actually do in 2026.
Guides
Long-form pieces
8 guides, newest first. Opinionated notes on senior ML interviews, system design, and applied practice.
- Designing a RAG system that actually works
- LLM Evals: The hardest part of shipping LLMs, and why most teams get it wrong
Your model is only as good as your eval. Your eval is a product. Treat it like one. The patterns that separate teams that ship from teams that thrash.
- What L5 vs L6 actually means at FAANG ML
Level lines are mostly invisible from the outside but sharp on the inside. A practical calibration of L4 through L7 in ML / Applied Scientist tracks.
- How to think about LLM inference cost
Most teams calculate inference cost by multiplying token price by token count. The actual cost structure has five layers and most of the optimization wins are in the bottom four.
- System design case study: building personalized search ranking
An end-to-end design of a personalized search ranking system at scale, from problem framing through deployment and monitoring. The same template works for most ML system design interviews.
- The 5 things every applied scientist interview is actually testing for
Strip away the questions and the role-specific jargon. Every senior AS loop is checking the same five things. If you know what they are, the prep gets sharper.
- Applied Scientist vs MLE vs Research Engineer: what these roles actually do
The role taxonomy is confusing because companies use the same titles to mean different things. Here's the actual decomposition, and which one you should target.
- Lessons from Marin 8B: what an open pretraining log actually teaches you
Marin trained the first open-source 8B model to beat Llama 3.1 8B and published every mistake. The transferable lessons aren't about TPUs. They're about how to run pretraining like a science.