Engineering Deep Dive

Graph RAG
in Practice

How I wired Neo4j into my AI agent's memory — and why vector search alone wasn't enough.

Amine El Farssi  ·  March 2026  ·  8 min read
Scroll

Vector Memory Has a Blind Spot

My AI agent PostSingular, running on OpenClaw, talks to me every day. It helps me build Luminar, manages my YouTube channel, and tracks infrastructure decisions across sessions. Memory was working — but in a subtle broken way.

"Which decision did we make about the auth system last month, and why?"

ChromaDB returned 6 chunks. All semantically similar. None connected to each other. No timeline. No causal chain. Just floating text.

The problem isn't retrieval quality — cosine similarity was fine. The problem is structural. Real knowledge has relationships. Facts connect to other facts. A vector store doesn't model that.

Vector RAG Fails At
  • Entity tracking: "What do I know about X?" → chunks, not entities
  • Temporal reasoning: "What changed this month?" → no timeline
  • Relationship queries: "What depends on decision Y?" → no traversal
  • Contradiction detection: "Did I say X before?" → no fact store
Graph RAG Handles
  • Entity graph: Persons, Projects, Decisions as nodes
  • Causal chains: Decision → caused_by → Bug → resolved_by → Fix
  • Timeline queries: All decisions in February, ordered
  • Multi-hop: "Everything linked to the auth system"

Standard RAG vs Graph RAG

Standard RAG

Query
Embed
Cosine Search
Top-K Chunks
LLM Answer

Graph RAG

Query
Extract Entities
BFS Traversal
Ranked Subgraph
LLM Answer

The difference is what you're searching. Vectors find text that looks like your query. Graphs find entities and facts that are structurally connected to your query.

Three Layers, One Answer

My setup runs three retrieval layers, fused with RRF (Reciprocal Rank Fusion):

LayerToolWhen It Wins
Neo4j BFSGraphiti v0.28.1Entity relationships, causal chains, timelines
ChromaDBall-MiniLM-L6-v2Semantically similar chunks, topic recall
grepripgrepExact strings, IDs, issue numbers, code

RRF gives each result a score based on its rank across all sources, not its raw similarity. A result ranking 3rd in KG and 2nd in ChromaDB beats one ranking 1st in only one source.

133
Entities
117
Relationships
23
Episodes
162
Semantic Chunks

Nightly Cron to Neo4j

Every night at 2 AM, a cron job ingests the day's notes into Neo4j via Graphiti. Graphiti extracts entities and relationships from raw markdown using an LLM, then stores them with timestamps and embeddings.

Daily Notes
memory/YYYY-MM-DD.md
Graphiti
entity extraction
Kimi K2 Turbo
LLM
Neo4j
graph store

// daily-ingest.sh

#!/bin/bash
YESTERDAY=$(date -d "yesterday" +%Y-%m-%d)
NOTES="memory/${YESTERDAY}.md"

if [ -f "$NOTES" ]; then
    .venv/bin/python memory.py ingest "$NOTES"
fi

# Also ingest MEMORY.md changes
.venv/bin/python memory.py ingest MEMORY.md

The Patch That Took 4 Hours

Graphiti's default client uses response_format JSON schema — which Moonshot's API doesn't support. I had to patch it with a custom client:

// KimiClient — inject schema into system prompt

class KimiClient(OpenAIClient):
    async def generate_response(
        self, messages, response_model=None, **kwargs
    ):
        if response_model:
            schema = response_model.model_json_schema()
            schema_str = json.dumps(schema, indent=2)

            # Inject schema into system message instead of response_format
            injection = (
                f"\n\nRespond with valid JSON matching:\n"
                f"```json\n{schema_str}\n```\n"
                f"Return ONLY the JSON, no other text."
            )
            for msg in messages:
                if msg["role"] == "system":
                    msg["content"] += injection; break

        kwargs.pop("response_format", None)  # Moonshot rejects it
        return await super().generate_response(messages, **kwargs)
This is the kind of thing you only find by reading the source code. The fix is 20 lines. The debugging is 4 hours.

Vector vs Graph: A Real Query

Query
"What infrastructure decisions did we make in February, and why?"

Returns 5 chunks about infrastructure. Some from February, some not. No ordering. No causal relationships between decisions. No "why."

  • → Tailscale gateway config (Feb 19)
  • → Docker migration notes (Jan 31)
  • → SSH setup guide (Feb 8)

Returns a traversal with timestamps and causal chain:

  • InfraDecision[Tailscale-fix]
  • → caused_by: BugReport[ws-security-block] (Feb 19)
  • → resolved_by: Fix[serve-https-proxy] (Feb 23)
  • InfraDecision[Gaming-server-migration]
  • → motivation: Mac-latency-issue (Feb 10)
  • → hardware: i5-9600K, 64GB, RTX 2080 Ti

The graph knows why the decision was made and what it connects to. The vector store just knows it's semantically close to "infrastructure."

Wired Into the Agent Runtime

The whole thing runs inside OpenClaw as my primary agent runtime. Memory search is a first-class tool — called automatically before every response:

// OpenClaw tool definition

{
  "tool": "memory_search",
  "description": "Neo4j KG + ChromaDB + grep. Returns ranked snippets.",
  "parameters": {
    "query": "string",
    "maxResults": "number",
    "minScore": "number"
  }
}
KG handles
  • What did we decide and why
  • Relationship traversal
  • Timeline queries
ChromaDB handles
  • What did we say about this topic
  • Semantic similarity
  • Broad recall

What I'd Do Differently

Start with a schema
  • Graphiti auto-extracts but defining entity types (Person, Project, Decision) upfront gives cleaner traversals
Use a strong LLM for extraction
  • Tried Qwen3:8b first — noticeably worse quality. Kimi K2 Turbo is worth the cost for this step
Ingest daily, not in bulk
  • Bulk ingesting 3 months = rate limits + duplicate edges. Daily cron is the right cadence
Plan for deduplication
  • "Luminar" vs "luminar-labs" vs "LuminarLabs" become 3 separate entities without a dedup pass
Graph RAG isn't a silver bullet. It's extra infrastructure, extra latency, and you need an LLM good enough to extract entities cleanly. But for an agent that's your co-builder — not just a chatbot — it's the difference between short-term and long-term memory.