Why Vector Memory Alone Isn't Enough: Knowledge Graph Memory for AI Agents

Table of Contents

Vector databases are fast and convenient. But they can’t answer “what did I decide about the auth system 3 weeks ago and why?” For that, you need relationships — and that means a knowledge graph.

The Problem with Pure Vector Memory
#

Most AI memory systems work like this: embed text, store in ChromaDB, retrieve by cosine similarity. It works well for “find things similar to this query.”

It fails for:

Entity tracking: “What do I know about Katarina?” → ChromaDB returns chunks, not a coherent entity
Temporal reasoning: “What changed in the codebase this month?” → No native timeline
Relationship queries: “Which decisions depend on the LiteLLM choice?” → No graph traversal
Contradiction detection: “Did I say X before?” → No structured fact store

The missing piece is a knowledge graph — structured facts about entities and their relationships over time.

The Stack
#

graph LR
    DAILY["📝 Daily Notes
memory/YYYY-MM-DD.md"]
    MEMORY["🧠 MEMORY.md
long-term facts"]
    CRON["⏰ Cron @ 2 AM
daily-ingest.sh"]
    
    DAILY --> CRON
    MEMORY --> CRON
    
    CRON --> GRAPHITI["🔗 Graphiti v0.28.1
entity + edge extraction"]
    GRAPHITI --> KIMI["🤖 Kimi K2 Turbo
LLM extraction"]
    GRAPHITI --> NEO4J["🗄️ Neo4j 5
Docker, port 7687"]
    GRAPHITI --> EMBED["📐 all-MiniLM-L6-v2
384-dim, CPU"]
    
    QUERY["🔍 Query Time"] --> RRF["RRF Scoring
KG + ChromaDB + grep"]
    NEO4J --> RRF
    CHROMA["ChromaDB"] --> RRF
    RAW["Raw files"] --> RRF
    style NEO4J fill:#1e3a5f,color:#fff
    style RRF fill:#10b981,color:#fff

Graphiti is the key piece. It’s a Python library that takes free-form text, extracts entities and relationships using an LLM, and writes them to Neo4j with temporal metadata. Think of it as a structured fact extractor that builds your knowledge graph automatically.

What Gets Extracted
#

From a daily note like:

Sprint 14 kicked off. Bolt is working on LUM-97 (API key auth).
Sage owns LiteLLM. Lumi's server keeps running out of RAM with 5 concurrent agents.
Added 2GB swap as a safety net.

Graphiti extracts:

Entities: Sprint 14, Bolt, Sage, Lumi’s server
Relationships: Bolt → WORKS_ON → LUM-97, LUM-97 → TYPE → API key auth
Facts: Lumi’s server has memory pressure, 2GB swap added
Temporal: these facts valid from 2026-03-02

After 6 weeks of daily ingestion, the graph has:

133 entities
117 relationships
23 episodes
Every significant decision and event from the project

The LLM Choice: Kimi K2 Turbo
#

Graphiti normally requires response_format: json_object. Kimi’s API doesn’t support this parameter. The fix: inject a JSON schema requirement into the system prompt via a custom client wrapper.

I replaced the default OpenAI client in Graphiti with a KimiClient wrapper that intercepts calls and adds the JSON schema instruction to the system prompt. This took about 50 lines of Python and now runs cleanly.

Why Kimi? It’s cheap (~$0.01 per 1K tokens for K2 Turbo) and the graph extraction quality is good enough. The embedder is all-MiniLM-L6-v2 running on CPU — fast, free, and sufficient for 384-dim embeddings.

The Hybrid Query: RRF
#

At query time, I don’t rely on just the KG or just ChromaDB. I use Reciprocal Rank Fusion:

async def search_all(query: str) -> List[Result]:
    # 1. KG semantic + BFS traversal (Graphiti)
    kg_results = await graphiti.search(query)
    
    # 2. Vector semantic search (ChromaDB)
    chroma_results = chroma.query(query, n_results=5)
    
    # 3. Grep over raw memory files
    grep_results = grep_memory_files(query)
    
    # Combine with RRF scoring
    return rrf_combine([kg_results, chroma_results, grep_results])

This gives you the best of all three systems:

KG: structured facts, entity relationships, temporal context
ChromaDB: semantic similarity across all memory chunks
Grep: exact matches, recent notes not yet ingested

The Nightly Ingest
#

# Runs at 2 AM via cron
0 2 * * * /home/amine/.openclaw/workspace/kg/daily-ingest.sh

The script:

Ingests yesterday’s daily notes file (memory/2026-03-02.md)
Re-ingests MEMORY.md if it changed (tracked via mtime)
Logs everything to kg/ingest.log

Each ingest chunk takes ~3-5 seconds (one LLM call per paragraph for entity extraction). A full MEMORY.md ingest with 18 chunks takes about 5-7 minutes — fast enough for nightly cron.

What It Enables
#

The question “what do I know about Lumi’s server issues?” now returns:

Entity: Lumi’s server (your-ec2-instance)
t3.medium, 4GB RAM (downgraded from t3.large 2026-02-10)
Runs OpenClaw gateway PID 985
Memory pressure: zombie gateway processes at 137% CPU (2026-02-28, 2026-03-03)
Fix: cleared delivery queue, added 2GB swap, set CPU burst to unlimited
Role: agent dispatch server for Sprint 14

That’s a coherent entity with history — not just a list of chunks sorted by embedding distance.

For a personal AI agent that needs to maintain context across sessions, weeks, and projects, this architecture is the right foundation. Vector alone is a search engine. Graph + vector is memory.

The Problem with Pure Vector Memory#

The Stack#

What Gets Extracted#

The LLM Choice: Kimi K2 Turbo#

The Hybrid Query: RRF#

The Nightly Ingest#

What It Enables#