Agent Policy: Supervision, Context, Middleware¶
What this is¶
An agent that can spawn other agents, run for minutes, and burn real money needs rules wrapped around it: who it reports to, how many helpers the whole job may hire, how much each helper is allowed to spend, when to give up, and what to do around every model call. None of those rules are the agent's reasoning — they are policy. The kernel (layer L0) defines that policy as frozen contracts: a handful of dataclasses, two enums, and three Protocols, with zero I/O.
Think of one agent run as a small company spun up to finish one project:
| Real-world thing | Kernel type |
|---|---|
| Your spot in the org chart (who's your manager, what project) | Supervision |
| Company-wide hiring cap ("no more than 50 people on this project") | SpawnBudget |
| Each employee's spending allowance (tokens, dollars, hours) | ExecutionBudget |
| How important your branch of work is | Priority |
| Whether your notes are shredded, kept for the project, or filed forever | HistoryRetention |
| The clerk who trims a fat email thread before you read it | CompactionStrategy |
| Your employee ID badge with a shift deadline | RunMeta |
| Airport-style security layers wrapping every model call | Middleware |
This is the contract-level companion to two higher-level pages
For the story of how these are enforced at runtime — the SpawnTracker, the ExecutionTracker, the MiddlewarePipeline — read Supervision & Budgets and Middleware. This page stays inside kernel/agent/ and documents only the frozen types. Everything that actually tracks a budget or runs a pipeline lives one layer up, in agents/.
The kernel ships four small files, and we cover each:
supervision.py— the org chart (Supervision) and the two budgets.runtime_context.py— the ID badge with a deadline (RunMeta) and the stop button (CancellationToken).context.py— how an agent's prompt window is assembled (AgentContextProtocol,CompactionStrategy).middleware.py— the interceptor contract (Middleware).
Supervision — the org chart¶
A Supervision node is one agent's formal position in an execution hierarchy: who its manager is, which project (run) and conversation (session) it belongs to, what level it sits at, and what resource limits the project operates under. The same policy is passed top-down when spawning children, so every agent in the tree shares the same identifiers and the same hiring cap.
@dataclass(frozen=True, slots=True)
class Supervision:
run_id: str # one execution tree (one run() call)
session_id: str # one conversation thread (many runs)
root_id: AgentId # the top of this tree
parent_id: AgentId | None # your manager (None at the root)
depth: int = 0 # informational only — UI nesting
spawn_budget: SpawnBudget = ... # shared across the WHOLE tree
execution_budget: ExecutionBudget = ...# per-agent resource limits
retention: HistoryRetention = HistoryRetention.RUN
priority: Priority = Priority.NORMAL
Two of those fields are IDs that are easy to confuse, so pin them down on first use:
session_id— the conversation thread. Long-lived; one session spans many runs. History is always keyed bysession_id.run_id— one execution tree, the result of a singlerun()call. Short-lived. It scopes the budget, the supervision tree, resume/replay, and the progress pub/sub topic.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TD
classDef root fill:#E3F2FD,stroke:#1565C0,color:#0D47A1,font-weight:bold
classDef child fill:#E8EAF6,stroke:#3949AB,color:#1A237E
ROOT["root agent<br/>depth 0 — parent_id None<br/>is_root True"]:::root
ROOT --> A["researcher<br/>depth 1 — parent_id root"]:::child
ROOT --> B["analyst<br/>depth 1 — parent_id root"]:::child
A --> A1["sub-fetcher<br/>depth 2 — parent_id researcher"]:::child
ROOT -.->|"same run_id, session_id,<br/>SAME SpawnBudget object"| A
ROOT -.->|"shared"| B
A -.->|"shared"| A1 Building the tree: root() then spawn_child()¶
You never hand-build a Supervision. You call Supervision.root() once on the top-level agent, then thread the result into every child via spawn_child().
# Called ONCE on the user-facing agent — mints a fresh run_id.
sup = Supervision.root(
agent_id,
session_id=None, # None → fresh uuid; pass an id to resume a session
spawn_budget=None, # None → SpawnBudget() (50 agents, preempt on)
execution_budget=None, # None → ExecutionBudget() (all unlimited)
retention=HistoryRetention.PERMANENT, # default for the root
priority=Priority.NORMAL,
)
# Called by a parent for each helper it hires.
child_sup = sup.spawn_child(
parent_id=sup.root_id,
retention=HistoryRetention.RUN, # default for children
priority=Priority.HIGH,
execution_budget=None, # None → inherit the parent's budget
)
A child created by spawn_child() inherits:
- the same
run_id(one execution tree) andsession_id(one conversation), - the same
SpawnBudgetinstance — so the headcount cap is global to the tree, not per-branch, - the parent's
execution_budgetby default (pass an override to give a child tighter limits), depth + 1— purely informational for UI indentation. There is no depth limit;SpawnBudgetis the single structural constraint.
Two handy accessors
sup.is_root→Truewhenparent_id is None.sup.progress_topic→TopicId("agent.progress", run_id). Every agent in the run publishes progress to this one topic, so a UI subscribes once and watches the entire tree.
Supervision is frozen
Every supervision type on this page is frozen=True. You do not mutate a node to add a child — you derive a new node with spawn_child(). The mutable counters that actually enforce the budgets live one layer up, in the agents/ trackers.
Two orthogonal budgets¶
Ravi deliberately separates "how many agents may exist" from "how much each agent may spend." They are different questions enforced by different trackers, so they live in two different dataclasses.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TB
classDef budget fill:#FFF3E0,stroke:#E65100,color:#BF360C,font-weight:bold
classDef scope fill:#FAFAFA,stroke:#BDBDBD,color:#424242
SB["SpawnBudget<br/>the hiring cap"]:::budget
EB["ExecutionBudget<br/>the spending allowance"]:::budget
SB --- SB1["ONE shared object for the whole run tree<br/>max_agents — allow_preempt"]:::scope
EB --- EB1["a SEPARATE object per agent<br/>max_tokens — max_cost_usd — max_turns — deadline_s"]:::scope | Budget | Scope | Fields | Default | A field of None means |
|---|---|---|---|---|
SpawnBudget | Run-wide — one shared instance for the entire tree | max_agents, allow_preempt | max_agents=50, allow_preempt=True | (no nullable fields) |
ExecutionBudget | Per-agent — each agent carries its own | max_tokens, max_cost_usd, max_turns, deadline_s | all None (unlimited) | unlimited for that one dimension |
SpawnBudget — the hiring cap¶
@dataclass(frozen=True, slots=True)
class SpawnBudget:
max_agents: int = 50 # total agents allowed in the run (root counts as 1)
allow_preempt: bool = True # may HIGH/CRITICAL agents pause lower ones for a slot?
It is a project-wide limit. The same SpawnBudget object is propagated to every node, so a fork bomb is impossible: when the count hits max_agents, no branch can hire. If allow_preempt is on, an important agent that needs a slot can cooperatively pause a lower-priority one to claim it instead of being denied outright (the full preemption flow lives in the Supervision concept page).
ExecutionBudget — the per-employee allowance¶
@dataclass(frozen=True, slots=True)
class ExecutionBudget:
max_tokens: int | None = None # total LLM tokens (prompt + completion)
max_cost_usd: float | None = None # cumulative LLM spend in USD
max_turns: int | None = None # LLM round-trips (one tool call = one turn)
deadline_s: float | None = None # wall-clock seconds from run start
This is one employee's allowance. Each agent gets its own. The agents/-layer ExecutionTracker counts real consumption against it and raises BudgetExhaustedError the instant any limit is breached. The deadline_s dimension is the only one enforced at the kernel boundary — by the deadline on RunMeta (next section).
Priority and HistoryRetention — two small enums¶
Priority — how important this branch is¶
An integer weight used for proportional pool allocation and to decide who gets preempted. A CRITICAL agent gets 8x the default share; BACKGROUND is best-effort.
| Priority | Weight | Plain English |
|---|---|---|
BACKGROUND | 0 | best-effort, first to be paused |
LOW | 1 | nice-to-have |
NORMAL | 2 | the default lane |
HIGH | 4 | jump the queue, can preempt lower lanes |
CRITICAL | 8 | top of the food chain |
HistoryRetention — how long the notes survive¶
How long an agent's conversation history is kept after the run ends.
class HistoryRetention(str, Enum):
NONE = "none" # stateless worker — nothing persisted
RUN = "run" # kept for this run (scoped by run_id), then deleted
PERMANENT = "permanent"# kept forever — for top-level user-facing agents
| Policy | Lifetime | Use it for |
|---|---|---|
NONE | not persisted at all | a throwaway stateless sub-task |
RUN | scoped to run_id, deleted after the run | most spawned helpers (the spawn_child() default) |
PERMANENT | kept forever | the top-level, user-facing agent (the root() default) |
The supervision classes at a glance¶
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
classDiagram
class Supervision {
+str run_id
+str session_id
+AgentId root_id
+AgentId parent_id
+int depth
+SpawnBudget spawn_budget
+ExecutionBudget execution_budget
+HistoryRetention retention
+Priority priority
+root(agent_id)$ Supervision
+spawn_child(parent_id) Supervision
+progress_topic() TopicId
+is_root() bool
}
class SpawnBudget {
+int max_agents
+bool allow_preempt
}
class ExecutionBudget {
+int max_tokens
+float max_cost_usd
+int max_turns
+float deadline_s
}
class Priority {
<<enum>>
BACKGROUND 0
LOW 1
NORMAL 2
HIGH 4
CRITICAL 8
}
class HistoryRetention {
<<enum>>
NONE
RUN
PERMANENT
}
Supervision *-- SpawnBudget : shared tree-wide
Supervision *-- ExecutionBudget : per-agent
Supervision --> Priority : carries
Supervision --> HistoryRetention : carries RunMeta — the ID badge with a shift deadline¶
Where Supervision is the org chart, RunMeta is the employee ID badge carried into every kernel call: it says which run you're on, lets anyone hit a stop button, names your shift deadline, and carries the tracing and tenant tags so observability and multi-tenant scoping work without adding a parameter to every function.
@dataclass(frozen=True, slots=True)
class RunMeta:
run_id: str # globally unique id for this run
cancellation: CancellationToken # the cooperative stop button
supervision: Supervision | None = None # org-chart position (None if standalone)
deadline: datetime | None = None # wall-clock expiry — agents/tools honour it
trace_id: str = ... # distributed-trace id (auto-generated)
tenant_id: str | None = None # tenant namespace (None = single-tenant)
RunMeta is immutable — you thread it down the call stack rather than mutating it. Two ways to get one:
RunMeta.standalone(...)— a fresh badge with a brand-newCancellationToken, for a run that has no supervision tree.- Built per-run by the runtime, where
run_idis populated fromsupervision.run_id.
The stop button: CancellationToken¶
A CancellationToken is a pure-asyncio cooperative cancellation signal — no threads, no global state. Someone on the outside (an orchestrator, a timeout handler, the user clicking "stop") calls token.cancel(); code on the inside politely checks at safe points.
token.cancel("user stopped") # from outside — idempotent
token.check() # inside: raises CancellationError if cancelled
await token.wait() # inside: block until cancelled
child = token.child() # a child token cancelled when this one is
The one method you call in the loop: meta.check()¶
This is how the per-agent deadline_s budget and a manual cancel both get enforced. Call it at every cooperative yield point — before an LLM call, before a tool runs, between loop iterations:
def check(self) -> None:
self.cancellation.check() # cancelled? -> CancellationError
if self.deadline and now() > self.deadline:
raise CancellationError("deadline exceeded")
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E3F2FD','primaryTextColor': '#0D47A1','primaryBorderColor': '#1565C0','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TD
classDef agent fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
classDef decision fill:#FFF3E0,stroke:#E65100,color:#BF360C,font-weight:bold
classDef ok fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
classDef stop fill:#FFEBEE,stroke:#C62828,color:#B71C1C,font-weight:bold
LOOP["loop iteration begins"]:::agent --> CHECK["meta.check()"]:::agent
CHECK --> C{"cancelled?"}:::decision
C -->|"yes"| RAISE["raise CancellationError"]:::stop
C -->|"no"| D{"past deadline?"}:::decision
D -->|"yes"| RAISE
D -->|"no"| GO["proceed — LLM call or tool"]:::ok Deadlines compose with the budget
ExecutionBudget.deadline_s is seconds from run start. The runtime turns that into the absolute RunMeta.deadline, and meta.check() is what actually trips it. Same idea, two layers: the kernel states the policy, the runtime enforces it.
CompactionStrategy and AgentContextProtocol — assembling the prompt¶
A long conversation does not fit in a model's context window, and stuffing the whole transcript in would be slow and expensive. CompactionStrategy is the clerk who trims the fat email thread before you read it: it turns the full raw history into a manageable window. The kernel only fixes the shape — the actual trimming (sliding window, token truncation, LLM summarisation) is an implementation up in agents/.
class CompactionStrategy(Protocol):
async def compact(self, raw_history: list[ChatMessage]) -> list[ChatMessage]:
"""Return the optimised sequence ready for LLM generation."""
...
Both input and output are list[ChatMessage] — the same type LLMClient.generate already consumes, so compaction slots in transparently.
AgentContextProtocol is the minimal surface the agent loop sees of its own runtime context. It exposes only two things: the agent's own id, and a way to get the already-compacted prompt window for a session. Storage details (which history provider, which compaction strategy) are deliberately hidden behind it.
class AgentContextProtocol(Protocol):
@property
def agent_id(self) -> AgentId: ...
async def get_prompt_window(self, session_id: str) -> list[ChatMessage]: ...
Why so small?
The agent loop should not know how its history is stored or trimmed — only that it can ask for "the messages I should send to the model right now." Keeping the Protocol tiny is what lets the storage and compaction internals change without touching agent code.
Middleware — the airport-security layers¶
Plenty of things should happen around a model call that are not the agent's reasoning: cache identical requests, retry on a blip, validate the output, redact PII, log for audit, rate-limit a tenant. Middleware pulls each of those into a composable layer — like the layered checkpoints at airport security: each layer can inspect you on the way in, wave you through to the next, then inspect you again on the way out.
The kernel defines one generic interceptor Protocol. All three middleware levels (agent-run, chat, tool) share this exact shape:
CtxT = TypeVar("CtxT")
class Middleware(Protocol[CtxT]):
async def process(
self,
context: CtxT,
call_next: Callable[[], Awaitable[None]],
) -> None: ...
The pattern is always: do work before, await call_next() to go inward, do work after as control unwinds. Skip call_next() to short-circuit (a cache hit). Raise to abort — typically MiddlewareTermination for a guardrail block.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TB
classDef mw fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
classDef core fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
subgraph L1["RateLimiter — before/after"]
subgraph L2["Cache — may short-circuit"]
subgraph L3["Retry — before/after"]
CORE["the real model call"]:::core
end
end
end The kernel also defines three minimal context protocols — one per level — describing only the attributes a middleware at that level actually reads. They live in the kernel (not agents/) so the pipeline can be type-checked without importing any concrete context class; the agents/ layer narrows CtxT to its real dataclasses when wiring a pipeline.
| Context protocol | Wraps | Readable fields |
|---|---|---|
AgentRunContextProtocol | one agent.run() call | agent_name, run_id, session_id |
ChatContextProtocol | each model.generate() call | agent_name, run_id, system_instructions |
FunctionContextProtocol | each tool.execute() call | agent_name, run_id, function_name, arguments |
The module also exports three aliases — AgentMiddleware, ChatMiddleware, FunctionMiddleware — purely for documentation clarity; each is Middleware[Any] whose runtime context is the matching protocol above.
Contracts here, behaviour next door
This page documents only the interceptor shape. The MiddlewarePipeline that threads layers together, and the built-ins (Cache, Retry, RateLimiter, guardrails …), live in agents/middleware/ — see Middleware for the full onion and the catalogue.
Summary¶
| Type | Kind | One-liner |
|---|---|---|
Supervision | frozen dataclass | an agent's position in the run tree; threads ids + policy down via spawn_child() |
SpawnBudget | frozen dataclass | run-wide headcount cap (max_agents, allow_preempt) — one shared object |
ExecutionBudget | frozen dataclass | per-agent spend cap (max_tokens, max_cost_usd, max_turns, deadline_s) |
Priority | int enum | branch weight BACKGROUND(0)…CRITICAL(8) for allocation + preemption |
HistoryRetention | str enum | NONE / RUN / PERMANENT history lifetime |
RunMeta | frozen dataclass | per-run badge: run_id, cancellation, deadline, trace, tenant; check() enforces stop + deadline |
CancellationToken | class (pure asyncio) | cooperative stop button: cancel(), check(), wait(), child() |
CompactionStrategy | Protocol | trims raw history into a prompt window |
AgentContextProtocol | Protocol | minimal context the loop sees: agent_id + get_prompt_window() |
Middleware | Protocol | generic process(context, call_next) interceptor for all three levels |
The one rule to remember
The kernel only states policy — frozen shapes with no behaviour. The agents/ layer holds the mutable trackers and pipelines that enforce it. That split is why you can reason about budgets, retention, and cancellation as plain data, and swap the enforcement machinery without touching the contracts.
Where this lives¶
| Piece | Location |
|---|---|
Supervision, SpawnBudget, ExecutionBudget, Priority, HistoryRetention | kernel/agent/supervision.py |
RunMeta, CancellationToken | kernel/agent/runtime_context.py |
CompactionStrategy, AgentContextProtocol | kernel/agent/context.py |
Middleware + per-level context protocols + aliases | kernel/agent/middleware.py |
AgentId, TopicId (ids carried by Supervision) | kernel/core/identity.py |
ChatMessage (compaction payload) | kernel/core/content.py |
CancellationError, BudgetExhaustedError, MiddlewareTermination | kernel/core/errors.py |
SpawnTracker (headcount + preemption) | agents/supervision/budget.py |
ExecutionTracker (per-agent spend) | agents/resources/budget.py |
MiddlewarePipeline + built-in middlewares | agents/middleware/ |
Next: The Durable Runtime — the contracts that make a run survivable: the inbox, the scheduler, the event log, and the journal that lets a crashed run resume without re-paying for what it already did.