Running a YouTube Channel for $0: Manim + Local TTS + FFmpeg + YouTube API

Table of Contents

The DPO channel (@DPO-AI) publishes AI/ML technical Shorts. 7 videos uploaded so far, covering agent memory, HNSW indexing, and agentic protocols. The entire production pipeline costs less than a coffee per video.

Why Build This
#

I wanted to publish technical AI content that goes beyond surface-level explanations — real system architecture, real algorithms, real trade-offs. And I wanted it to be visually compelling, not just a talking head.

The constraint: I have zero budget for production software and zero patience for manual video editing.

The solution: automate the entire pipeline from script to upload.

The Pipeline
#

flowchart TD
    A["💡 Topic"] --> B["🤖 Research
Kimi K2.5 (~$0.001)"]
    B --> C["✍️ Script
Claude Opus (~$0.05)
~500 chars = 35s speech"]
    C --> D["🎬 Animation
Manim on RTX 2080 Ti
OptiX GPU rendering"]
    D --> E["🎙️ Narration
Kokoro TTS local
am_michael voice"]
    E --> F["✂️ Edit
FFmpeg: sync + speed
max 1.2x"]
    F --> G["📤 Upload
YouTube Data API v3
OAuth2 token"]
    G --> H["📱 Short published
< 60s, vertical 9:16"]
    style D fill:#1e3a5f,color:#fff
    style E fill:#10b981,color:#fff

Step 1: Research (Kimi K2.5)
#

Kimi K2.5 is cheap and has a massive context window. I use it to research the topic, pull together relevant papers, and draft a content brief. A full research pass costs literally fractions of a cent.

The brief includes: core concept, key visual metaphors, 3-5 key points to hit, suggested Manim scene structure.

Step 2: Script (Claude Opus)
#

The script is the most important part. The constraint is strict:

~500 characters = ~35 seconds of natural speech. YouTube Shorts must be under 60 seconds. That gives you roughly 800 characters max, and you want breathing room.

The script has to be dense but not rushed — every word has to earn its place. Claude Opus at ~$0.05 per script is worth it for the quality jump over cheaper models. Sonnet works too but needs more prompting.

Step 3: Manim Animation
#

Manim is the same library 3Blue1Brown uses for his math videos. It’s pure Python — you describe animations programmatically and render them.

class AgentMemoryScene(ThreeDScene):
    def construct(self):
        # Create 3D surface representing embedding space
        surface = Surface(
            lambda u, v: np.array([u, v, np.sin(u)*np.cos(v)]),
            u_range=[-3, 3], v_range=[-3, 3],
            resolution=(30, 30)
        )
        self.play(Create(surface))
        # ... animate memory retrieval path

The RTX 2080 Ti renders with OptiX — GPU-accelerated ray tracing. A 30-second 720p animation renders in about 45 seconds. Without GPU it would take 10+ minutes.

For the DPO channel, scenes include: HNSW graph traversal, agent memory hierarchy, protocol stack diagrams, gradient descent visualization.

Step 4: Kokoro TTS (Local Voice)
#

Kokoro is a local TTS model that runs on the gaming server. The am_michael voice is clear, natural, and works well for technical content.

import kokoro

pipeline = kokoro.KPipeline(lang_code='a')  # American English
audio, sample_rate = pipeline(script, voice='am_michael')

No API calls, no rate limits, no cost. The RTX 2080 Ti generates speech faster than real-time.

The only limitation: no emotional range. Kokoro is clear and neutral — great for explanation content, not for hype. For the occasional premium video, I switch to ElevenLabs Lily.

Step 5: FFmpeg Editing
#

The sync step is simple but fussy:

# Merge audio + video, extend if needed
ffmpeg -i animation.mp4 -i narration.wav \
  -c:v copy -c:a aac \
  -shortest output.mp4

# Speed up slightly if over 60s (max 1.2x sounds natural)
ffmpeg -i output.mp4 -filter:v "setpts=0.85*PTS" \
  -filter:a "atempo=1.18" final.mp4

1.2x is the ceiling for natural-sounding audio speedup. Above that, speech starts sounding chipmunk-ish. If the script is too long, cut it — don’t speed it up.

Step 6: YouTube Data API v3
#

// node upload.js --file final.mp4 --title "..." --description "..."
const youtube = google.youtube({ version: 'v3', auth: oauth2Client });
await youtube.videos.insert({
  part: ['snippet', 'status'],
  requestBody: { snippet: { title, description, tags }, 
                 status: { privacyStatus: 'public' } },
  media: { body: fs.createReadStream('final.mp4') }
});

OAuth2 token stored at x-autopilot/yt-token.json. The same Node.js infrastructure that handles X/Twitter posting handles YouTube uploads — shared auth layer.

Cost Per Video
#

Step	Tool	Cost
Research	Kimi K2.5	~$0.001
Script	Claude Opus	~$0.05
Animation	Manim (local)	$0
Voice	Kokoro TTS (local)	$0
Editing	FFmpeg (local)	$0
Upload	YouTube API	$0
Total		~$0.05

Seven Shorts published. Total cost: about 35 cents in API calls.

The channel is still small — it’s more of a proof-of-concept than a growth vehicle right now. The real value is the pipeline: once a video format works, producing more is nearly free.

Why Build This#

The Pipeline#

Step 1: Research (Kimi K2.5)#

Step 2: Script (Claude Opus)#

Step 3: Manim Animation#

Step 4: Kokoro TTS (Local Voice)#

Step 5: FFmpeg Editing#

Step 6: YouTube Data API v3#

Cost Per Video#