How I wired Neo4j into my AI agent's memory — and why vector search alone wasn't enough.
My AI agent PostSingular, running on OpenClaw, talks to me every day. It helps me build Luminar, manages my YouTube channel, and tracks infrastructure decisions across sessions. Memory was working — but in a subtle broken way.
"Which decision did we make about the auth system last month, and why?"
ChromaDB returned 6 chunks. All semantically similar. None connected to each other. No timeline. No causal chain. Just floating text.
The problem isn't retrieval quality — cosine similarity was fine. The problem is structural. Real knowledge has relationships. Facts connect to other facts. A vector store doesn't model that.
The difference is what you're searching. Vectors find text that looks like your query. Graphs find entities and facts that are structurally connected to your query.
My setup runs three retrieval layers, fused with RRF (Reciprocal Rank Fusion):
| Layer | Tool | When It Wins |
|---|---|---|
| Neo4j BFS | Graphiti v0.28.1 | Entity relationships, causal chains, timelines |
| ChromaDB | all-MiniLM-L6-v2 | Semantically similar chunks, topic recall |
| grep | ripgrep | Exact strings, IDs, issue numbers, code |
RRF gives each result a score based on its rank across all sources, not its raw similarity. A result ranking 3rd in KG and 2nd in ChromaDB beats one ranking 1st in only one source.
Every night at 2 AM, a cron job ingests the day's notes into Neo4j via Graphiti. Graphiti extracts entities and relationships from raw markdown using an LLM, then stores them with timestamps and embeddings.
// daily-ingest.sh
#!/bin/bash YESTERDAY=$(date -d "yesterday" +%Y-%m-%d) NOTES="memory/${YESTERDAY}.md" if [ -f "$NOTES" ]; then .venv/bin/python memory.py ingest "$NOTES" fi # Also ingest MEMORY.md changes .venv/bin/python memory.py ingest MEMORY.md
Graphiti's default client uses response_format JSON schema — which Moonshot's API doesn't support. I had to patch it with a custom client:
// KimiClient — inject schema into system prompt
class KimiClient(OpenAIClient): async def generate_response( self, messages, response_model=None, **kwargs ): if response_model: schema = response_model.model_json_schema() schema_str = json.dumps(schema, indent=2) # Inject schema into system message instead of response_format injection = ( f"\n\nRespond with valid JSON matching:\n" f"```json\n{schema_str}\n```\n" f"Return ONLY the JSON, no other text." ) for msg in messages: if msg["role"] == "system": msg["content"] += injection; break kwargs.pop("response_format", None) # Moonshot rejects it return await super().generate_response(messages, **kwargs)
This is the kind of thing you only find by reading the source code. The fix is 20 lines. The debugging is 4 hours.
Returns 5 chunks about infrastructure. Some from February, some not. No ordering. No causal relationships between decisions. No "why."
Returns a traversal with timestamps and causal chain:
The graph knows why the decision was made and what it connects to. The vector store just knows it's semantically close to "infrastructure."
The whole thing runs inside OpenClaw as my primary agent runtime. Memory search is a first-class tool — called automatically before every response:
// OpenClaw tool definition
{
"tool": "memory_search",
"description": "Neo4j KG + ChromaDB + grep. Returns ranked snippets.",
"parameters": {
"query": "string",
"maxResults": "number",
"minScore": "number"
}
}
Graph RAG isn't a silver bullet. It's extra infrastructure, extra latency, and you need an LLM good enough to extract entities cleanly. But for an agent that's your co-builder — not just a chatbot — it's the difference between short-term and long-term memory.