One production AI engineering pattern per week. 27 episodes and counting. Each Short covers a real pattern engineers hit in production — the problem, the fix, and the code.
Vector RAG retrieves documents. Graph RAG retrieves relationships. When your agent needs to reason across entities, timelines, and decisions, the graph wins. Open Interactive Version → The Problem I Was Trying to Solve # My AI agent PostSingular, running on OpenClaw, talks to me every day. It helps me build Luminar, manages my YouTube channel, and tracks infrastructure decisions across sessions.
The DPO channel (@DPO-AI) publishes AI/ML technical Shorts. 7 videos uploaded so far, covering agent memory, HNSW indexing, and agentic protocols. The entire production pipeline costs less than a coffee per video. Why Build This # I wanted to publish technical AI content that goes beyond surface-level explanations — real system architecture, real algorithms, real trade-offs. And I wanted it to be visually compelling, not just a talking head.
The default state of a language model is amnesia. Every session, it wakes up fresh with no memory of what happened before. I built a memory system that fixes this — and somewhere in the process, the agent got a name, a personality, and an opinion about font choices. The Problem # Every LLM session is stateless by design. You can inject previous conversation history, but:
Luminar has 173 source files, 21,586 lines of production code, 43 API endpoints, and 155+ tests. It was built almost entirely by AI agents. Here’s the team structure, the workflow, and the honest truth about what breaks. The Team # I didn’t want generic agents. I wanted specialists — each with a clear domain, sharp ownership boundaries, and a persona that shapes how they approach problems.
Vector databases are fast and convenient. But they can’t answer “what did I decide about the auth system 3 weeks ago and why?” For that, you need relationships — and that means a knowledge graph. The Problem with Pure Vector Memory # Most AI memory systems work like this: embed text, store in ChromaDB, retrieve by cosine similarity. It works well for “find things similar to this query.”
Premise: use F5-TTS to clone a voice from a short reference clip and generate high-quality narration for AI content. Reality: mediocre output, weird artifacts, wrong prosody. Here’s the honest post-mortem. The Setup # F5-TTS is a non-autoregressive TTS model that uses flow matching for zero-shot voice cloning. You give it:
Cloud is convenient. But when you already own a gaming PC with a 2080 Ti collecting dust, the math changes fast. Here’s how I turned mine into a production AI server running everything from Neo4j to GPU rendering — at €25/month. The Hardware # I didn’t buy anything new. This is the machine I had:
Introduction # In the age of AI, Optical Character Recognition (OCR) has evolved from simple pattern matching to sophisticated vision-language models that can understand context, preserve formatting, and handle complex documents. DeepSeek-OCR represents the cutting edge of this evolution — and the best part? You can run it entirely offline on your own hardware.