The Agent Model¶
The problem¶
The naive way to build an agent is a function you call directly:
This breaks the moment anything real happens. If the agent needs to wait three hours for a human to approve a payment, your thread is stuck. If the process crashes, the run is gone. If you want the agent to live on another machine, you have to rewrite the call. And if one agent needs to ask another agent something, you have a tangle of direct references.
The idea: agents are addresses, not objects¶
In Ravi, you never hold a reference to a running agent. You hold its address — an AgentId — and you send it a message. The runtime is responsible for delivering that message, starting a run, and routing the reply back.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '14px'}}}%%
graph LR
classDef caller fill:#FCE4EC,stroke:#880E4F,color:#880E4F,font-weight:bold
classDef runtime fill:#E3F2FD,stroke:#1565C0,color:#0D47A1,font-weight:bold
classDef agent fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
C([Caller]):::caller -->|"submit(AgentId, Message)"| RT[Runtime]:::runtime
RT -->|"enqueue run"| SCH[Scheduler]:::runtime
SCH -->|"lease"| WK[Worker]:::runtime
WK -->|"run(ctx, inbox)"| AG[Agent]:::agent
AG -.->|"reply"| RT
RT -.->|"result"| C Because the caller talks to an address, the call site never changes. Whether the agent runs in the same process, in another service, or in a Kubernetes pod is a deployment decision — not a code change. This is the actor model, applied to LLM agents.
The three identities¶
Ravi separates three things that beginners often conflate:
| Identity | What it is | Lifetime |
|---|---|---|
AgentId | The agent's address — type + key. Routing only. | As long as the agent is registered |
session_id | A conversation thread. History is keyed by this. | Long-lived; spans many runs |
run_id | One execution of run(). Scopes budget, the event log, supervision, and the progress channel. | Short-lived; one run |
A single agent (AgentId) can hold many conversations (session_ids), and each conversation is made of many runs (run_ids). Keeping these separate is what lets the same agent serve thousands of users without their histories leaking into each other.
What an agent actually is¶
An agent is any object satisfying the kernel Agent Protocol — essentially two things:
class Agent(Protocol):
id: AgentId
async def run(self, ctx: RunContext, inbox: list[Message]) -> None: ...
That's it. id is the address. run is called by the Worker with a ctx (the journaled execution context) and the inbox (messages waiting for this agent). Everything else — tools, memory, guardrails — is configuration the concrete agent reads.
The standard concrete agent is ReActAgent: it runs the Reason + Act loop.
The ReAct loop¶
ReActAgent implements the classic pattern: the model reasons, decides to act by calling a tool, sees the result, and reasons again — until it produces a final answer or hits its iteration cap.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E3F2FD','primaryTextColor': '#0D47A1','primaryBorderColor': '#1565C0','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TD
classDef process fill:#E8EAF6,stroke:#3949AB,color:#1A237E
classDef decision fill:#FFF3E0,stroke:#E65100,color:#BF360C,font-weight:bold
classDef terminal fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
classDef error fill:#FFEBEE,stroke:#C62828,color:#B71C1C
START([Message arrives]):::terminal --> LOAD["Load history + compact"]:::process
LOAD --> LLM["Call the model — ctx.llm()"]:::process
LLM --> CHECK{"Tool calls in response?"}:::decision
CHECK -->|"No"| FINAL["Persist turns, reply to sender"]:::process
FINAL --> DONE([Run complete]):::terminal
CHECK -->|"Yes"| TOOL["Invoke each tool — ctx.tool()"]:::process
TOOL --> APPEND["Append tool results to messages"]:::process
APPEND --> CAP{"Hit max_iterations?"}:::decision
CAP -->|"No"| LLM
CAP -->|"Yes"| BUDGET["Raise BudgetExhaustedError"]:::error A few details that matter in practice:
- Compaction runs before every model call, not just at the start. Tool results can balloon the context across iterations, so the compaction pipeline trims a view of the messages while the full list is kept for persistence.
ctx.llm()andctx.tool()are journaled. That's what makes the loop crash-safe — see Durability.- The iteration cap is a real budget, not a guess. Hitting it raises
BudgetExhaustedErrorrather than looping forever.
A minimal agent¶
from ravi.agents import ReActAgent, Runtime
from ravi.agents.context import ContextConfig, InMemoryHistoryProvider
from ravi.integrations.llm import LLMFactory
from ravi.capabilities.tools import CalculatorTool
model = LLMFactory("gpt-4o", api_key).build()
agent = ReActAgent(
"helper",
model=model,
tools=[CalculatorTool()],
context=ContextConfig(InMemoryHistoryProvider()),
system_instructions="You are a helpful assistant.",
)
async with Runtime() as rt:
await rt.register(agent)
run_id = await rt.submit(agent.id, boot_message) # fire a message at the address
Notice the flow: you build the agent, register it with the runtime (so the runtime knows the address), then submit a message. You never call agent.run() yourself — the Worker does, when it leases the run.
Beyond a single agent¶
The same model scales to teams. An OrchestratorAgent holds a roster of sub-agents and exposes each as a delegation tool. When the model decides to delegate, the orchestrator spawns the sub-agent (a new run) via ctx.spawn() and awaits its reply via ctx.ask() — the same message-passing primitives, one level up. See Supervision & Budgets for how spawning is bounded.
Where this lives¶
| Piece | Location |
|---|---|
Agent Protocol, AgentId | kernel/runtime/agent.py, kernel/core/identity.py |
ReActAgent | agents/core/react.py |
OrchestratorAgent | agents/core/orchestrator.py |
Runtime facade + Worker | agents/runtime/ |
RunContext (the ctx) | agents/runtime/context.py |
Next: Durability — how a run survives a crash.