Skip to main content
27 AI Engineering Patterns in 60 Seconds Each
  1. Blog/

27 AI Engineering Patterns in 60 Seconds Each

Table of Contents
One production AI engineering pattern per week. 27 episodes and counting. Each Short covers a real pattern engineers hit in production — the problem, the fix, and the code.
Follow @DPO-AI ↗ Full Playlist ↗

Why This Series
#

Most AI content explains what something is. This series explains when you need it and why it works. Every episode opens with a concrete failure mode — a real number, a real cost, a real silent bug — then shows the pattern that fixes it.

The format is strict: 60–70 seconds, no fluff, one pattern per episode. If it can’t be explained in under 70 seconds it goes in a blog post instead.


The Full Series
#

Retrieval & RAG
#

EPPatternKey Stat
EP36RLM Instead of RAG
drop03$29B for a model picker. The brain was never theirs. #Cursor #Claude #Shorts
drop02They built OpenAI. Then they walked out. #Anthropic #AIEngineering #Shorts
EP28MoE Routing60% cost cut
EP27Hybrid Search — BM25 + vectors + RRFRecall 40% → 80%, 15 lines
EP25Agentic RAG — 4-tool router40% of queries need something other than vector search
EP23RAG Fusion v2 — multi-query + RRFRecall 45% → 72%
EP22Corrective RAG (CRAG) — 3-tier confidence routingFilters irrelevant chunks before generation
EP21Self-RAG — retrieval on demandReduces hallucination by skipping retrieval when not needed
EP14Query Decomposition — sub-query fan-outHandles multi-hop questions single-pass RAG can’t answer
EP13RAG Fusion — parallel queries + RRFOriginal: 5 query variants, 45% → 72% recall
EP07Prompt Compression — LLMLingua512 tokens → 80 tokens, same answer
EP02Speculative RAG — draft-then-retrieveRetrieve on the answer, not the question

Inference Optimization
#

EPPatternKey Stat
EP17Disaggregated Inference — prefill/decode split3x throughput on long-context workloads
EP04Speculative Decoding — draft + verify2–4x faster generation, same quality
EP01KV Cache Prefix OptimizationP99 2400ms → 900ms, zero code changes

Evaluation & Quality
#

EPPatternKey Stat
EP24LLM-as-Judge v2$0.002/eval, calibrated scoring
EP19Constitutional Self-CritiqueSelf-corrects against principles before output
EP15LLM-as-Judge — originalStructured rubric, GPT-4o-mini at scale
EP12Structured Output ForcingEliminates JSON parse failures in production
EP11Self-Consistency — majority vote67% → 88% on math/reasoning tasks

Agent Architecture
#

EPPatternKey Stat
EP39The Future of Agents Isn’t Smarter Prompts. It’s Smarter Plumbing. #AIEngineering
EP38Harness Engineering: How OpenAI Shipped 1M Lines Without Writing Them #AIEngineering
EP33Stop Interviewing, Start Acting
EP32LLM Wiki
EP32LLM Wiki
EP31519K Lines. 50 Hidden Tools. Inside Claude Code’s Leaked Source #AIEngineering
EP29688 Stars. Zero Fine
EP29688 Stars. Zero Fine
EP29688 Stars. Zero Fine
drop01one engineer. no budget. 19,000 views. how? #AIEngineering #Shorts
EP28Agent Skills Explained
EP26Multi-Agent Orchestration34% failure → 91% success with specialist agents
EP20Context Distillation16K context → 800 tokens, knowledge preserved
EP16Context EngineeringWhat goes in the context window determines everything
EP10Parallel Tool Calls4 sequential calls → 1 parallel batch
EP09LLM RouterRoute by complexity, cut costs 60%
EP08Agent CheckpointingZero lost work on agent failure

Reliability & Cost
#

EPPatternKey Stat
EP34Tool Result Caching
EP303 Cheap Models Beat GPT
EP06Semantic Caching40% cost reduction on real workloads
EP05Circuit Breaker for LLMsStop cascading failures at the LLM layer
EP03Hedged Requests — P99 killerP99 collapses to ~P50 of slower backend


Safety & Capability
#

EPPatternKey Stat
EP35Anthropic Nerfed Claude On Purpose

Inference & Serving
#

EPPatternKey Stat
EP37TurboQuant: 6x KV Cache Compression at 1M Tokens #AIEngineering

What’s Coming
#

  • EP28 — MoE Routing (mixture of experts, when to use which expert)
  • EP29 — Tool Call Caching (cache tool results, not just LLM outputs)
  • EP30 — Streaming Structured Output (token-by-token JSON validation)

Each week. Subscribe to not miss them.

Subscribe ↗