The Team#
I didn’t want generic agents. I wanted specialists — each with a clear domain, sharp ownership boundaries, and a persona that shapes how they approach problems.
| Agent | Role | Model | What They Own |
|---|---|---|---|
| 🏗️ Forge | Platform Architect | Opus | ADRs, system topology, tech decisions |
| 🧠 Sage | AI Engineer | Opus | LiteLLM gateway, model routing, evals |
| ⚡ Bolt | Backend Engineer | Opus | FastAPI, PostgreSQL, Stripe, integrations |
| 🔌 Wire | Frontend Engineer | Opus | Next.js 16, console UI, Vercel deploy |
| ⚡ Flux | QA Engineer | Sonnet | Test suites, E2E tests, documentation |
| 🎨 Muse | Design Lead | Opus | Design system, component specs |
| 🌊 Drift | DevOps Engineer | Sonnet | CI/CD, Docker, GitHub Actions |
| 💎 Prism | Business Architect | Opus | Pricing, GTM, investor narrative |
| 🔭 Scout | Researcher | Sonnet | Protocol deep-dives, library evaluations |
The key was the ownership model. Forge doesn’t write code. Sage doesn’t touch UI. Bolt doesn’t make architecture decisions. When an agent hits a boundary, they defer — not freelance outside their domain.
The Dispatch Workflow#
flowchart LR
A["📋 Linear Issue
created"] --> B["🏷️ Label:
agent:bolt"]
B --> C["🔔 Webhook fires
/opt/linear-webhook/handler.py"]
C --> D{"Route by label"}
D --> E["⚡ Bolt spawned
on Lumi's server"]
E --> F["🔨 Implement
+ commit"]
F --> G["📬 PR opened
→ develop"]
G --> H["👁️ Lumi reviews"]
H --> I["✅ Merged"]
style C fill:#1e3a5f,color:#fff
style E fill:#10b981,color:#fff
Each agent gets triggered by a Linear label. The webhook handler on my AWS EC2 (Lumi, the project manager AI) reads the label and spawns the right agent via OpenClaw sessions. The agent reads the Linear issue, checks the existing codebase via GitHub, implements the change, and opens a PR to develop.
AGENT_MAP = {
"agent:bolt": "bolt",
"agent:sage": "sage",
"agent:wire": "wire",
"agent:flux": "flux",
"agent:forge": "forge",
"agent:drift": "drift",
}
No manual dispatch. Add the label, the agent starts working.
What’s Been Built#
After 14 sprints:
- Backend: FastAPI + LangGraph + Pydantic AI, 9 PostgreSQL tables
- Protocols: UCP (Google), ACP (OpenAI/Stripe), A2A (Google), MCP (Anthropic), TAP (Visa), AP2
- Payments: Stripe Connect Express with Separate Charges and Transfers
- Search: Pydantic AI buyer agent with A2A protocol
- Auth: API keys + OAuth 2.0 for Shopify
- LLM: AWS Bedrock Claude Sonnet via LiteLLM gateway
- Tests: 155+ tests, 23 production evals as launch gate
The codebase grew from 0 to 21k lines over roughly 6 weeks. Each sprint lasted 2-3 days.
What Actually Works#
ADR-first architecture saved us multiple times. Forge writes an Architecture Decision Record before any major feature. When an agent makes a wrong assumption, the ADR is the source of truth. Agents can reference ADR-017 and understand why LiteLLM was chosen over direct Bedrock calls.
Specialist context beats generalist context. A “senior backend engineer” prompt who owns auth and integrations knows to check the existing API key middleware before adding a new one. A generic “code assistant” prompt won’t.
What Actually Breaks#
Context resets are the main pain point. When Lumi’s server restarts mid-sprint, agents lose their session state. They restart from scratch, sometimes re-implementing work that was already done, sometimes conflicting with merged PRs.
Webhook timing is fragile. Labels applied before the webhook handler restarts = lost events. I’ve had to re-trigger dispatch manually by removing and re-adding labels more than once.
Agents don’t review each other’s work well. Bolt reviewing Wire’s frontend PR doesn’t catch CSS bugs. I eventually gave Lumi the review responsibility — she has broader context about what “done” looks like.
The Economics#
| Cost | Monthly |
|---|---|
| Anthropic API (Opus + Sonnet) | ~$40-80 depending on sprint activity |
| AWS t3.medium (Lumi’s server) | ~$30 |
| Bedrock (production LLM) | $0 so far (credits) |
| Everything else | $0 |
Per feature, the cost is roughly $5-15 in API calls. A full sprint (8-10 features) costs about $60-80. That’s a few hours of a junior developer’s time — for a week of parallel work across 9 specialists.
The ROI isn’t mainly cost — it’s speed and scale. Five agents working in parallel, 24/7, with no context-switching overhead.