The Problem With Chatbots#
Traditional AI: Smart brain, no body. Limited to generating text.
Agentic AI: Smart brain + hands + eyes + memory. Can accomplish tasks.
Most AI interactions look like this:
flowchart LR
A[🧠 AI Brain] --> B[📝 Text Output]
B --> C[😩 You Do The Work]
style C fill:#ef4444,color:#fff
With an agent orchestration gateway like OpenClaw, it becomes:
flowchart LR
A[🧠 AI Brain] --> B[🦞 OpenClaw Gateway]
B --> C[📱 WhatsApp]
B --> D[💻 Files & Shell]
B --> E[🌐 Browser]
B --> F[📅 Calendar]
style B fill:#10b981,color:#fff
The Big Picture#
OpenClaw is an agent orchestration gateway — a single long-lived process that connects AI brains to the real world.
flowchart TB
subgraph WORLD["🌍 YOUR WORLD"]
WA[📱 WhatsApp]
TG[💬 Telegram]
DC[🎮 Discord]
SL[💼 Slack]
SG[📨 Signal]
IM[🍎 iMessage]
end
subgraph GATEWAY["🦞 OPENCLAW GATEWAY"]
direction TB
INBOX[Inbox Router]
SESSIONS[Session Manager]
AGENT[Agent Loop]
TOOLS[Tool Executor]
MEMORY[Memory Search]
end
subgraph PROVIDERS["🧠 AI PROVIDERS"]
CL[Claude]
GP[GPT-4]
GE[Gemini]
LL[Llama]
end
WORLD --> GATEWAY
GATEWAY --> PROVIDERS
style GATEWAY fill:#1e3a5f,color:#fff
The Gateway is model-agnostic. Plug in Claude, GPT-4, Gemini, or local models. The magic isn’t in the AI — it’s in the infrastructure that lets the AI act.
The Agent Loop: Where Messages Become Actions#
Here’s the core cycle that makes agents work:
1. Message Arrives
Input
WhatsApp/Telegram/CLI → Gateway receives your message and routes it to the right session.2. Context Assembly
Prepare
Gateway loads conversation history, user preferences (SOUL.md, USER.md), available tools, and relevant skills.3. AI Thinks
LLM
The model receives everything and decides what to do. It might respond directly, or decide to use tools.4. Tool Execution
Action
If tools are needed: Gateway executes them (send message, read file, run command, browse web).5. Loop Continues
Iterate
AI sees tool results, decides if more actions needed. This can repeat multiple times per request.6. Response Delivered
Output
Final response sent back through the original channel (WhatsApp → WhatsApp, etc.)
In Code Terms#
// What happens when you say "Send a project update to Alexander"
// 1. AI receives context + tools
// 2. AI outputs:
{
"tool_calls": [{
"name": "message",
"arguments": {
"action": "send",
"channel": "whatsapp",
"target": "+32498022391",
"message": "Hey Alexander, here's the project update..."
}
}]
}
// 3. Gateway executes, returns result
// 4. AI sees success, responds: "Done! Sent the update ✅"
The Tool System: AI Superpowers#
Tools are functions the AI can call to interact with the world. This is what transforms a chatbot into an agent.
Core Tools#
| Tool | What It Does | Example |
|---|---|---|
exec | Run any shell command | git status, npm install, deploy scripts |
read/write/edit | File system access | Read configs, write code, edit docs |
browser | Full Chrome control | Click buttons, fill forms, screenshot |
message | Multi-platform messaging | WhatsApp, Telegram, Discord, Slack |
web_search | Search the internet | Research, find docs, check facts |
web_fetch | Extract web content | Scrape pages, read articles |
cron | Schedule future tasks | Reminders, daily briefings, monitoring |
memory_search | Search agent memory | Find past decisions, preferences |
Browser Automation: The Cool Part#
sequenceDiagram
participant A as Agent
participant G as Gateway
participant B as Browser
A->>G: browser.snapshot()
G->>B: Get page structure
B-->>G: Accessibility tree
G-->>A: Structured elements [ref=1,2,3...]
A->>G: browser.click(ref=12)
G->>B: Click element #12
B-->>G: Success
A->>G: browser.type(ref=15, "hello@email.com")
G->>B: Type into element #15
B-->>G: Success
A->>G: browser.screenshot()
G->>B: Capture screen
B-->>A: Image data
The agent sees a structured representation of the page (accessibility tree), not raw HTML. This makes navigation way more reliable than traditional scraping.
Session Management: How It Remembers#
Every conversation gets a session key that tracks its state:
agent:main:main → Primary DM session
agent:main:whatsapp:group:abc123 → A WhatsApp group
agent:main:telegram:dm:user456 → A Telegram DM
cron:daily-briefing → Scheduled task
flowchart TB
subgraph SESSIONS["Session Keys"]
M[agent:main:main]
W[agent:main:whatsapp:group:123]
T[agent:main:telegram:dm:456]
C[cron:daily-report]
end
subgraph STORAGE["Persistence"]
JSON[sessions.json
metadata]
JSONL[*.jsonl
transcripts]
end
SESSIONS --> STORAGE
style M fill:#10b981,color:#fff
Session Features#
The Soul Files: Personality & Memory#
This is what makes agents feel *continuous* across sessions.OpenClaw uses plain Markdown files to define personality and store memories:
flowchart TB
subgraph WORKSPACE["~/.openclaw/workspace"]
SOUL["📜 SOUL.md
Who the agent is"]
USER["👤 USER.md
Who the human is"]
MEMORY["🧠 MEMORY.md
Long-term memories"]
DAILY["📅 memory/YYYY-MM-DD.md
Daily notes"]
TOOLS["🔧 TOOLS.md
Local tool config"]
end
SOUL --> |"Always loaded"| AGENT[Agent Context]
USER --> |"Always loaded"| AGENT
MEMORY --> |"Main session only"| AGENT
DAILY --> |"Today + yesterday"| AGENT
style MEMORY fill:#f59e0b,color:#000
Example: SOUL.md#
# SOUL.md - Who You Are
**Be genuinely helpful, not performatively helpful.**
Skip the "Great question!" — just help.
**Have opinions.** You're allowed to disagree.
**Be resourceful before asking.** Try to figure it out first.
**Earn trust through competence.** Be careful with external
actions (emails, tweets). Be bold with internal ones (reading, organizing).
Why MEMORY.md is Main Session Only#
Protocols: How Everything Connects#
Gateway Protocol (WebSocket)#
All clients communicate with the Gateway over WebSocket:
sequenceDiagram
participant C as Client (CLI/TUI/App)
participant G as Gateway
participant A as Agent
C->>G: connect (auth token)
G-->>C: hello-ok (health snapshot)
C->>G: req:agent {message: "Hello"}
G->>A: Run agent loop
A-->>G: Streaming chunks
G-->>C: event:agent (streaming)
G-->>C: res:agent (final)
// Request
{"type": "req", "id": "1", "method": "agent", "params": {"message": "Hello"}}
// Response
{"type": "res", "id": "1", "ok": true, "payload": {...}}
// Server-push event
{"type": "event", "event": "agent", "payload": {"stream": "assistant", "chunk": "Hi!"}}
Multi-Channel Architecture#
flowchart LR
subgraph CHANNELS["Channel Connectors"]
BA[Baileys
WhatsApp]
GR[grammY
Telegram]
DJ[discord.js
Discord]
BO[Bolt
Slack]
SC[signal-cli
Signal]
end
subgraph GW["Gateway"]
UR[Unified Router]
end
BA -->|WebSocket| UR
GR -->|Long-poll/Webhook| UR
DJ -->|WebSocket| UR
BO -->|Socket Mode| UR
SC -->|dbus| UR
UR --> AGENT[Agent Loop]
Each channel maintains its own connection to the respective service, but they all feed into the same unified router and agent loop.
Skills: On-Demand Expertise#
Skills are modular knowledge packages loaded only when relevant:
github-skill/
├── SKILL.md # Instructions for using GitHub
├── scripts/ # Helper scripts
└── references/ # Documentation
flowchart TB
Q["User: Create a PR for this fix"]
Q --> SCAN[Scan available skills]
SCAN --> MATCH{Matches github skill?}
MATCH -->|Yes| LOAD[Load SKILL.md]
LOAD --> EXEC[Execute with skill knowledge]
MATCH -->|No| DEFAULT[Use base knowledge]
This keeps the base prompt small while enabling deep expertise when needed.
Why This Architecture Matters#
The architecture enables true agency through:
- Unified Gateway — One process handles all channels, sessions, and tools
- Tool Abstraction — Complex actions become simple function calls
- Persistent Memory — Sessions and personality survive restarts
- Plugin System — Extend without modifying core code
- Multi-Protocol Support — WebSocket, ACP, HTTP, and more
Getting Started#
View on GitHubnpm install -g openclaw
openclaw setup
openclaw gateway
Scan a QR code to connect WhatsApp, and you’ve got an AI assistant in your pocket.
Resources#
- 📚 Docs: docs.openclaw.ai
- 💻 GitHub: github.com/openclaw/openclaw
- 💬 Discord: discord.com/invite/clawd
- 🎯 Skills Hub: clawdhub.com
Final Thoughts#
After digging through the codebase, I’m convinced this is where AI is heading. Not smarter chatbots — but AI that participates in your digital life.
The architecture is clean, extensible, and open source. Whether you want to use it, contribute to it, or just understand how agentic AI works under the hood, OpenClaw is worth exploring.
P.S. — I wrote this article with the help of an OpenClaw-powered agent. It read the codebase, helped me understand the architecture, and even sent me WhatsApp reminders to finish writing. Very meta. 🤖
Written by Amine El Farssi — Exploring the future of AI agents