27 AI Engineering Patterns in 60 Seconds Each

One production AI engineering pattern per week. 27 episodes and counting. Each Short covers a real pattern engineers hit in production — the problem, the fix, and the code.

Follow @DPO-AI ↗ Full Playlist ↗

Why This Series
#

Most AI content explains what something is. This series explains when you need it and why it works. Every episode opens with a concrete failure mode — a real number, a real cost, a real silent bug — then shows the pattern that fixes it.

The format is strict: 60–70 seconds, no fluff, one pattern per episode. If it can’t be explained in under 70 seconds it goes in a blog post instead.

The Full Series
#

Retrieval & RAG
#

EP	Pattern	Key Stat
EP36	RLM Instead of RAG	—
drop03	$29B for a model picker. The brain was never theirs. #Cursor #Claude #Shorts	—
drop02	They built OpenAI. Then they walked out. #Anthropic #AIEngineering #Shorts	—
EP28	MoE Routing	60% cost cut
EP27	Hybrid Search — BM25 + vectors + RRF	Recall 40% → 80%, 15 lines
EP25	Agentic RAG — 4-tool router	40% of queries need something other than vector search
EP23	RAG Fusion v2 — multi-query + RRF	Recall 45% → 72%
EP22	Corrective RAG (CRAG) — 3-tier confidence routing	Filters irrelevant chunks before generation
EP21	Self-RAG — retrieval on demand	Reduces hallucination by skipping retrieval when not needed
EP14	Query Decomposition — sub-query fan-out	Handles multi-hop questions single-pass RAG can’t answer
EP13	RAG Fusion — parallel queries + RRF	Original: 5 query variants, 45% → 72% recall
EP07	Prompt Compression — LLMLingua	512 tokens → 80 tokens, same answer
EP02	Speculative RAG — draft-then-retrieve	Retrieve on the answer, not the question

Inference Optimization
#

EP	Pattern	Key Stat
EP17	Disaggregated Inference — prefill/decode split	3x throughput on long-context workloads
EP04	Speculative Decoding — draft + verify	2–4x faster generation, same quality
EP01	KV Cache Prefix Optimization	P99 2400ms → 900ms, zero code changes

Evaluation & Quality
#

EP	Pattern	Key Stat
EP24	LLM-as-Judge v2	$0.002/eval, calibrated scoring
EP19	Constitutional Self-Critique	Self-corrects against principles before output
EP15	LLM-as-Judge — original	Structured rubric, GPT-4o-mini at scale
EP12	Structured Output Forcing	Eliminates JSON parse failures in production
EP11	Self-Consistency — majority vote	67% → 88% on math/reasoning tasks

Agent Architecture
#

EP	Pattern	Key Stat
EP39	The Future of Agents Isn’t Smarter Prompts. It’s Smarter Plumbing. #AIEngineering	—
EP38	Harness Engineering: How OpenAI Shipped 1M Lines Without Writing Them #AIEngineering	—
EP33	Stop Interviewing, Start Acting	—
EP32	LLM Wiki	—
EP32	LLM Wiki	—
EP31	519K Lines. 50 Hidden Tools. Inside Claude Code’s Leaked Source #AIEngineering	—
EP29	688 Stars. Zero Fine	—
EP29	688 Stars. Zero Fine	—
EP29	688 Stars. Zero Fine	—
drop01	one engineer. no budget. 19,000 views. how? #AIEngineering #Shorts	—
EP28	Agent Skills Explained	—
EP26	Multi-Agent Orchestration	34% failure → 91% success with specialist agents
EP20	Context Distillation	16K context → 800 tokens, knowledge preserved
EP16	Context Engineering	What goes in the context window determines everything
EP10	Parallel Tool Calls	4 sequential calls → 1 parallel batch
EP09	LLM Router	Route by complexity, cut costs 60%
EP08	Agent Checkpointing	Zero lost work on agent failure

Reliability & Cost
#

EP	Pattern	Key Stat
EP34	Tool Result Caching	—
EP30	3 Cheap Models Beat GPT	—
EP06	Semantic Caching	40% cost reduction on real workloads
EP05	Circuit Breaker for LLMs	Stop cascading failures at the LLM layer
EP03	Hedged Requests — P99 killer	P99 collapses to ~P50 of slower backend

Safety & Capability
#

EP	Pattern	Key Stat
EP35	Anthropic Nerfed Claude On Purpose	—

Inference & Serving
#

EP	Pattern	Key Stat
EP37	TurboQuant: 6x KV Cache Compression at 1M Tokens #AIEngineering	—

What’s Coming
#

EP28 — MoE Routing (mixture of experts, when to use which expert)
EP29 — Tool Call Caching (cache tool results, not just LLM outputs)
EP30 — Streaming Structured Output (token-by-token JSON validation)

Each week. Subscribe to not miss them.

Subscribe ↗

Why This Series#

The Full Series#

Retrieval & RAG#

Inference Optimization#

Evaluation & Quality#

Agent Architecture#

Reliability & Cost#

Safety & Capability#

Inference & Serving#

What’s Coming#