Skip to main content
Inside OpenClaw: The Architecture That Turns LLMs Into Autonomous Agents
  1. Blog/

Inside OpenClaw: The Architecture That Turns LLMs Into Autonomous Agents

Table of Contents
I’ve been obsessed with a question: Why can’t AI just… do things? ChatGPT can write a perfect email, but you still copy-paste it. Claude can explain how to automate your workflow, but you implement it. Then I found OpenClaw — and everything clicked.

The Problem With Chatbots
#

Traditional AI: Smart brain, no body. Limited to generating text.

Agentic AI: Smart brain + hands + eyes + memory. Can accomplish tasks.

Most AI interactions look like this:

flowchart LR
    A[🧠 AI Brain] --> B[📝 Text Output]
    B --> C[😩 You Do The Work]
    style C fill:#ef4444,color:#fff

With an agent orchestration gateway like OpenClaw, it becomes:

flowchart LR
    A[🧠 AI Brain] --> B[🦞 OpenClaw Gateway]
    B --> C[📱 WhatsApp]
    B --> D[💻 Files & Shell]
    B --> E[🌐 Browser]
    B --> F[📅 Calendar]
    style B fill:#10b981,color:#fff

The Big Picture
#

OpenClaw is an agent orchestration gateway — a single long-lived process that connects AI brains to the real world.

flowchart TB
    subgraph WORLD["🌍 YOUR WORLD"]
        WA[📱 WhatsApp]
        TG[💬 Telegram]
        DC[🎮 Discord]
        SL[💼 Slack]
        SG[📨 Signal]
        IM[🍎 iMessage]
    end
    
    subgraph GATEWAY["🦞 OPENCLAW GATEWAY"]
        direction TB
        INBOX[Inbox Router]
        SESSIONS[Session Manager]
        AGENT[Agent Loop]
        TOOLS[Tool Executor]
        MEMORY[Memory Search]
    end
    
    subgraph PROVIDERS["🧠 AI PROVIDERS"]
        CL[Claude]
        GP[GPT-4]
        GE[Gemini]
        LL[Llama]
    end
    
    WORLD --> GATEWAY
    GATEWAY --> PROVIDERS
    
    style GATEWAY fill:#1e3a5f,color:#fff

The Gateway is model-agnostic. Plug in Claude, GPT-4, Gemini, or local models. The magic isn’t in the AI — it’s in the infrastructure that lets the AI act.


The Agent Loop: Where Messages Become Actions
#

Here’s the core cycle that makes agents work:

  1. 1. Message Arrives

    Input

    WhatsApp/Telegram/CLI → Gateway receives your message and routes it to the right session.
  2. 2. Context Assembly

    Prepare

    Gateway loads conversation history, user preferences (SOUL.md, USER.md), available tools, and relevant skills.
  3. 3. AI Thinks

    LLM

    The model receives everything and decides what to do. It might respond directly, or decide to use tools.
  4. 4. Tool Execution

    Action

    If tools are needed: Gateway executes them (send message, read file, run command, browse web).
  5. 5. Loop Continues

    Iterate

    AI sees tool results, decides if more actions needed. This can repeat multiple times per request.
  6. 6. Response Delivered

    Output

    Final response sent back through the original channel (WhatsApp → WhatsApp, etc.)

In Code Terms
#

// What happens when you say "Send a project update to Alexander"

// 1. AI receives context + tools
// 2. AI outputs:
{
  "tool_calls": [{
    "name": "message",
    "arguments": {
      "action": "send",
      "channel": "whatsapp",
      "target": "+32498022391",
      "message": "Hey Alexander, here's the project update..."
    }
  }]
}

// 3. Gateway executes, returns result
// 4. AI sees success, responds: "Done! Sent the update ✅"

The Tool System: AI Superpowers
#

Tools are functions the AI can call to interact with the world. This is what transforms a chatbot into an agent.

Core Tools
#

ToolWhat It DoesExample
execRun any shell commandgit status, npm install, deploy scripts
read/write/editFile system accessRead configs, write code, edit docs
browserFull Chrome controlClick buttons, fill forms, screenshot
messageMulti-platform messagingWhatsApp, Telegram, Discord, Slack
web_searchSearch the internetResearch, find docs, check facts
web_fetchExtract web contentScrape pages, read articles
cronSchedule future tasksReminders, daily briefings, monitoring
memory_searchSearch agent memoryFind past decisions, preferences

Browser Automation: The Cool Part
#

sequenceDiagram
    participant A as Agent
    participant G as Gateway
    participant B as Browser
    
    A->>G: browser.snapshot()
    G->>B: Get page structure
    B-->>G: Accessibility tree
    G-->>A: Structured elements [ref=1,2,3...]
    
    A->>G: browser.click(ref=12)
    G->>B: Click element #12
    B-->>G: Success
    
    A->>G: browser.type(ref=15, "hello@email.com")
    G->>B: Type into element #15
    B-->>G: Success
    
    A->>G: browser.screenshot()
    G->>B: Capture screen
    B-->>A: Image data

The agent sees a structured representation of the page (accessibility tree), not raw HTML. This makes navigation way more reliable than traditional scraping.


Session Management: How It Remembers
#

Every conversation gets a session key that tracks its state:

agent:main:main                    → Primary DM session
agent:main:whatsapp:group:abc123   → A WhatsApp group
agent:main:telegram:dm:user456     → A Telegram DM
cron:daily-briefing                → Scheduled task
flowchart TB
    subgraph SESSIONS["Session Keys"]
        M[agent:main:main]
        W[agent:main:whatsapp:group:123]
        T[agent:main:telegram:dm:456]
        C[cron:daily-report]
    end
    
    subgraph STORAGE["Persistence"]
        JSON[sessions.json
metadata] JSONL[*.jsonl
transcripts] end SESSIONS --> STORAGE style M fill:#10b981,color:#fff

Session Features
#

Daily Resets: Sessions expire at a configurable hour (default 4 AM) to prevent context bloat.
Compaction: When nearing token limits, old context is summarized and compressed.
JSONL Transcripts: Full conversation history persisted as append-only logs.

The Soul Files: Personality & Memory
#

This is what makes agents feel *continuous* across sessions.

OpenClaw uses plain Markdown files to define personality and store memories:

flowchart TB
    subgraph WORKSPACE["~/.openclaw/workspace"]
        SOUL["📜 SOUL.md
Who the agent is"] USER["👤 USER.md
Who the human is"] MEMORY["🧠 MEMORY.md
Long-term memories"] DAILY["📅 memory/YYYY-MM-DD.md
Daily notes"] TOOLS["🔧 TOOLS.md
Local tool config"] end SOUL --> |"Always loaded"| AGENT[Agent Context] USER --> |"Always loaded"| AGENT MEMORY --> |"Main session only"| AGENT DAILY --> |"Today + yesterday"| AGENT style MEMORY fill:#f59e0b,color:#000

Example: SOUL.md
#

# SOUL.md - Who You Are

**Be genuinely helpful, not performatively helpful.** 
Skip the "Great question!" — just help.

**Have opinions.** You're allowed to disagree.

**Be resourceful before asking.** Try to figure it out first.

**Earn trust through competence.** Be careful with external 
actions (emails, tweets). Be bold with internal ones (reading, organizing).

Why MEMORY.md is Main Session Only
#

Privacy: MEMORY.md contains personal context that shouldn’t leak into group chats or shared sessions. It’s only loaded when you’re in a direct, private conversation with the agent.

Protocols: How Everything Connects
#

Gateway Protocol (WebSocket)
#

All clients communicate with the Gateway over WebSocket:

sequenceDiagram
    participant C as Client (CLI/TUI/App)
    participant G as Gateway
    participant A as Agent
    
    C->>G: connect (auth token)
    G-->>C: hello-ok (health snapshot)
    
    C->>G: req:agent {message: "Hello"}
    G->>A: Run agent loop
    A-->>G: Streaming chunks
    G-->>C: event:agent (streaming)
    G-->>C: res:agent (final)
// Request
{"type": "req", "id": "1", "method": "agent", "params": {"message": "Hello"}}

// Response
{"type": "res", "id": "1", "ok": true, "payload": {...}}

// Server-push event
{"type": "event", "event": "agent", "payload": {"stream": "assistant", "chunk": "Hi!"}}

Multi-Channel Architecture
#

flowchart LR
    subgraph CHANNELS["Channel Connectors"]
        BA[Baileys
WhatsApp] GR[grammY
Telegram] DJ[discord.js
Discord] BO[Bolt
Slack] SC[signal-cli
Signal] end subgraph GW["Gateway"] UR[Unified Router] end BA -->|WebSocket| UR GR -->|Long-poll/Webhook| UR DJ -->|WebSocket| UR BO -->|Socket Mode| UR SC -->|dbus| UR UR --> AGENT[Agent Loop]

Each channel maintains its own connection to the respective service, but they all feed into the same unified router and agent loop.


Skills: On-Demand Expertise
#

Skills are modular knowledge packages loaded only when relevant:

github-skill/
├── SKILL.md         # Instructions for using GitHub
├── scripts/         # Helper scripts
└── references/      # Documentation
flowchart TB
    Q["User: Create a PR for this fix"]

    Q --> SCAN[Scan available skills]
    SCAN --> MATCH{Matches github skill?}
    MATCH -->|Yes| LOAD[Load SKILL.md]
    LOAD --> EXEC[Execute with skill knowledge]

    MATCH -->|No| DEFAULT[Use base knowledge]

This keeps the base prompt small while enabling deep expertise when needed.


Why This Architecture Matters
#

The architecture enables true agency through:

  1. Unified Gateway — One process handles all channels, sessions, and tools
  2. Tool Abstraction — Complex actions become simple function calls
  3. Persistent Memory — Sessions and personality survive restarts
  4. Plugin System — Extend without modifying core code
  5. Multi-Protocol Support — WebSocket, ACP, HTTP, and more

Getting Started
#

View on GitHub
npm install -g openclaw
openclaw setup
openclaw gateway

Scan a QR code to connect WhatsApp, and you’ve got an AI assistant in your pocket.

Resources
#


Final Thoughts
#

The future isn’t AI that answers questions. It’s AI that gets things done.

After digging through the codebase, I’m convinced this is where AI is heading. Not smarter chatbots — but AI that participates in your digital life.

The architecture is clean, extensible, and open source. Whether you want to use it, contribute to it, or just understand how agentic AI works under the hood, OpenClaw is worth exploring.


P.S. — I wrote this article with the help of an OpenClaw-powered agent. It read the codebase, helped me understand the architecture, and even sent me WhatsApp reminders to finish writing. Very meta. 🤖

Written by Amine El Farssi — Exploring the future of AI agents