Tools (the agents layer)¶
What this is¶
The kernel wrote the rules of tools — what a tool is, how risky it is, what a result looks like. This page is the agents layer (L1) that finally puts those rules to work. Two concrete classes live here:
Toolbox— a name-keyed bag that holds every tool an agent owns.ToolInvoker— the one place a chained (code-mode) tool call must pass through before and after it runs.
Picture a workshop. Toolbox is the labelled toolbox on the bench: every tool has a slot with its name on it, you reach in by name and pull one out. ToolInvoker is the safety inspector standing at the door: before any tool is used inside a script it checks the permit, the danger rating, and the budget — and after the tool runs it inspects what came back, shrinking anything too big to carry by hand. No tool gets used in a chain without walking past the inspector.
This is the runtime, not the contract
The frozen types these classes consume — Tool, ToolRisk, ToolExecutionResult, ChainPolicy, InvocationResult — are documented in Kernel: Tool Contracts. The story-level tour of how a model picks a tool and gets a result is in Concepts: Tools, which already walks the invoker's seven enforcement steps at a high level. This page stays inside agents/tools/ and shows the real code that implements all of it.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart LR
classDef agent fill:#E8EAF6,stroke:#3949AB,color:#1A237E,font-weight:bold
classDef runtime fill:#E3F2FD,stroke:#1565C0,color:#0D47A1,font-weight:bold
classDef tool fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
classDef store fill:#F3E5F5,stroke:#6A1B9A,color:#4A148C
AG["ReActAgent"]:::agent -->|"owns"| TB["Toolbox<br/>(name to tool)"]:::runtime
SCRIPT["Code-mode script<br/>calls a tool by name"]:::agent -->|"each call"| INV["ToolInvoker<br/>(the inspector)"]:::runtime
INV -->|"looks up by name"| TB
INV -->|"execute(**args)"| TOOL["Real tool"]:::tool
INV -->|"large result offloaded"| STORE[("BlobStore")]:::store
TOOL -->|"ToolExecutionResult"| INV
INV -->|"InvocationResult"| SCRIPT Toolbox — the labelled toolbox¶
What it is: the concrete implementation of the kernel ToolRegistry Protocol — an in-memory dict[str, AnyTool] keyed by tool name. Why it exists: something has to hold the tools and let the rest of the framework look one up by name. Toolbox is that something.
AnyTool is the kernel union Tool | HostedTool | ProviderDefinedTool — so a Toolbox happily stores all three kinds side by side (see the taxonomy).
The core surface is three small methods:
class Toolbox:
def __init__(self) -> None:
self._tools: dict[str, AnyTool] = {}
def add(self, tool: AnyTool) -> None: # put a tool in its slot
self._tools[tool.name] = tool
def get(self, name: str) -> AnyTool | None: # reach in by name
return self._tools.get(name)
def all(self) -> list[AnyTool]: # tip the whole box out
return list(self._tools.values())
That is the whole registry. add is keyed by tool.name, so adding two tools with the same name keeps the last one. Beyond the three core methods there are a few conveniences — names(), by_risk(risk), __len__, and name in toolbox membership — plus the schema helpers (schemas(), schema_for(), deferred_schemas()) that turn each tool into the wire shape an LLM expects (a function schema for local tools, the first provider_specs entry for hosted/provider-defined ones, and a respect for per-tool defer_loading).
A list of tools becomes a Toolbox for free¶
You rarely build a Toolbox by hand. When you hand a plain list of tools to a ReActAgent, the constructor wraps it for you:
# in ReActAgent.__init__
if isinstance(tools, list):
from ravi.agents.tools.toolbox import Toolbox
tb = Toolbox()
for t in tools:
tb.add(t)
self.tools = tb
else:
self.tools = tools # already a ToolRegistry — used as-is
So both of these are valid, and end up identical inside the agent:
agent = ReActAgent("bot", model=model, tools=[CalculatorTool(), WebSearchTool()])
box = Toolbox(); box.add(CalculatorTool()); box.add(WebSearchTool())
agent = ReActAgent("bot", model=model, tools=box)
When to build the box yourself
Pass a list for one-off agents. Build a shared Toolbox once (e.g. in the monolith lifespan) when several agents must share the same tool collection — wire once, reuse everywhere.
ToolInvoker — the safety inspector¶
What it is: the single enforcement chokepoint for programmatic / chained tool calls. Why it exists: when the model writes a small Python script that calls tools as functions (code-mode chaining), those calls bypass the normal agent loop. The ToolInvoker is the gate that makes them obey exactly the same risk, approval, timeout, and budget rules the loop enforces — no shortcuts.
Scope: the invoker is for chains, not every tool call
A tool the model calls directly in a normal turn is dispatched by the agent loop. The ToolInvoker governs the other path: tool calls that come from inside a sandboxed chain script via ToolChainTool. Both paths honour the same kernel ToolRisk / ApprovalHandler contracts.
One invoker is built per process lifespan and shared across all chains. It holds four collaborators, all kernel-typed:
class ToolInvoker:
def __init__(
self,
registry: ToolRegistry, # where tools are looked up (a Toolbox)
approval_handler: ApprovalHandler | None = None, # HITL; absent = deny all non-SAFE
artifact_store: BlobStore | None = None, # large-data backend; absent = always inline
policy: ChainPolicy | None = None, # timeouts, budget, inline threshold
hooks: HookManager | None = None, # lifecycle hooks (TOOL_START / TOOL_END)
) -> None: ...
Two design choices are worth reading twice: with no approval_handler, any non-SAFE tool is denied outright (the inspector with no manager to call says "no"); with no artifact_store, every result is returned inline (nowhere to offload large data, so nothing gets offloaded).
The enforcement pipeline¶
Every chained call enters through invoke(), which is a thin wrapper:
async def invoke(self, call, *, session, ctx=None, progress_sink=None) -> InvocationResult:
start_ms = ...
if self._hooks: # (a) hooks: TOOL_START
await self._hooks.dispatch(HookEvent.TOOL_START, {"tool_name": call.name})
try:
result = await self._invoke_inner(call, session=session, ctx=ctx, ...)
status = result.status
return result
except Exception as exc: # any blow-up becomes a clean error result
return InvocationResult(status="error", text=f"Invoker error: ...")
finally:
if self._hooks: # (b) hooks: TOOL_END (always)
await self._hooks.dispatch(HookEvent.TOOL_END, {...})
session._trace.append(ChainCallRecord(...)) # (c) trace: always recorded
The two things invoke() guarantees on every outcome (success, denial, crash, timeout): the TOOL_END hook fires, and a ChainCallRecord is appended to the session trace. That trace is what gives at-most-once safety — on a retry, the model can see that step 2 already sent its email and won't send it twice.
The real work is _invoke_inner(). Here is its pipeline, in execution order:
| # | Gate | What happens | Outcome on failure |
|---|---|---|---|
| 1 | Budget | session._call_count >= policy.max_tool_calls? then increment the counter | status="error" — budget exhausted |
| 2 | Registry lookup | tool = registry.get(name) | status="error" — unknown tool |
| 3 | Type gate | reject is_hosted_tool, is_provider_defined_tool, and the recursive tool_chain name | status="error" — can't call from a chain |
| 4 | Risk / approval | if tool.risk exceeds policy.max_risk_unapproved, ask the ApprovalHandler with a bounded approval_timeout_s | status="denied" — no handler, timeout, or refused |
| 5 | Inbound ref resolution | replace any {"$artifact": "<ref>"} arg with real bytes fetched from the BlobStore | warn and keep the ref on resolve failure |
| 6 | ctx check | ctx.check() to honour cancellation before dispatch | raises if the run was cancelled |
| 7 | Progress | emit a TOOL_CALL step to the progress_sink so UIs see inside the chain | — |
| 8 | Execute | tool.execute(ctx=ctx, **args) wrapped in asyncio.wait_for(call_timeout_s) | status="error" — timed out |
| 9 | Progress | emit a TOOL_RESULT step | — |
| 10 | Result shaping | inline small results, offload large ones, turn media into files | always returns an InvocationResult |
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E3F2FD','primaryTextColor': '#0D47A1','primaryBorderColor': '#1565C0','lineColor': '#546E7A','fontSize': '13px'}}}%%
flowchart TD
classDef process fill:#E3F2FD,stroke:#1565C0,color:#0D47A1
classDef decision fill:#FFF3E0,stroke:#E65100,color:#BF360C,font-weight:bold
classDef deny fill:#FFEBEE,stroke:#C62828,color:#B71C1C,font-weight:bold
classDef tool fill:#E8F5E9,stroke:#2E7D32,color:#1B5E20,font-weight:bold
classDef store fill:#F3E5F5,stroke:#6A1B9A,color:#4A148C
START(["invoke(call, session, ctx)"]) --> BUD{"call_count over<br/>max_tool_calls?"}:::decision
BUD -->|"yes"| E1["error — budget exhausted"]:::deny
BUD -->|"no, increment"| LOOK{"registry.get(name)<br/>found?"}:::decision
LOOK -->|"no"| E2["error — unknown tool"]:::deny
LOOK -->|"yes"| TYPE{"hosted / provider-defined<br/>/ recursive tool_chain?"}:::decision
TYPE -->|"yes"| E3["error — not callable<br/>from a chain"]:::deny
TYPE -->|"no"| RISK{"risk over<br/>max_risk_unapproved?"}:::decision
RISK -->|"yes"| APP{"ApprovalHandler<br/>within timeout?"}:::decision
APP -->|"denied / none / timeout"| E4["denied"]:::deny
APP -->|"approved"| REF["resolve dollar-artifact refs<br/>from BlobStore"]:::store
RISK -->|"no"| REF
REF --> CHK["ctx.check() — cancellation"]:::process
CHK --> EXEC["execute(**args)<br/>within call_timeout_s"]:::tool
EXEC -->|"timeout"| E5["error — timed out"]:::deny
EXEC --> SHAPE["shape result<br/>(inline / offload / files)"]:::process
SHAPE --> OK(["InvocationResult"]):::process Step 4 in detail — the approval gate with a bounded wait¶
Risk is compared with an integer ordering (SAFE 0, HIGH 1, CRITICAL 2) against the policy's max_risk_unapproved:
tool_risk = ToolRisk(getattr(tool, "risk", ToolRisk.SAFE))
if _RISK_ORDER[tool_risk] > _RISK_ORDER[policy.max_risk_unapproved]:
if self._approval is None:
return InvocationResult(status="denied", text="... no ApprovalHandler ...")
try:
decision = await asyncio.wait_for(
self._approval.request(ApprovalRequest(call=call, risk=tool_risk, ...)),
timeout=policy.approval_timeout_s,
)
except TimeoutError:
return InvocationResult(status="denied", text="... timed out ...")
if decision != ApprovalDecision.APPROVED:
return InvocationResult(status="denied", text="... approval denied ...")
A slow human yields 'denied', never a hang
approval_timeout_s (55 s default) is deliberately below call_timeout_s (60 s). A human who takes too long simply produces status="denied" with guidance to call the tool directly outside the chain — the sandbox is never blocked indefinitely. This is the same ChainPolicy invariant noted in the kernel chain contracts.
Step 5 — inbound ref resolution (big data never enters the sandbox)¶
When one chained tool returns something large, it comes back as an artifact_ref rather than the bytes themselves. When the next tool is called with that ref, _resolve_inbound_refs() swaps the ref for real bytes on the server side, before execute():
for k, v in arguments.items():
if isinstance(v, dict) and "$artifact" in v:
ref = str(v["$artifact"])
resolved[k] = await self._store.resolve(ref) # bytes — not the model's
else:
resolved[k] = v
The 200 MB CSV never round-trips through the model or the sandbox heap — only the short ref does.
Step 10 — result shaping (inline, offload, or file)¶
_shape_result() decides how the result travels back to the script:
- Media blocks (
ImageBlockwith data) → stored in theBlobStore, pinned, recorded on the session, and surfaced as aChainFileat a real workspace path like/workspace/media/<tool>_0.png. - Small text (
len(text_bytes) <= max_inline_result_bytes, or no store) → returned inline inInvocationResult.text. - Large text → stored in the
BlobStore, pinned, recorded on the session, and returned as apreviewplus anartifact_refthe next call can pass along.
InvocationResult statuses¶
Every call comes back as exactly one of these (the kernel InvocationResult only defines three; ChainCallRecord in the trace can additionally read timeout):
status | Means | Set when |
|---|---|---|
"ok" | Tool ran, result returned | execute() succeeded and is_error was false |
"error" | The call failed | unknown / wrong-type tool, budget exhausted, execution timeout, is_error result, or an invoker crash |
"denied" | Policy refused the call | risk above threshold with no handler, approval timeout, or an explicit DENIED/SKIPPED decision |
InvokerSession — per-chain scratchpad¶
What it is: the small bundle of mutable state for one chain run. Why it exists: the ToolInvoker is shared by every chain, so it must stay stateless. All the per-chain counters live in the session instead. You get one from open_session() and pass it into every invoke() of that chain:
class InvokerSession:
def __init__(self, invoker: ToolInvoker) -> None:
self._invoker = invoker
self._call_count: int = 0 # drives the budget gate
self._trace: list[ChainCallRecord] = [] # one record per call, every outcome
self._pinned_refs: list[str] = [] # artifacts to unpin at the end
It exposes read-only views (session.trace, session.call_count) and — most importantly — it is an async context manager. On exit, close() unpins every artifact the chain pinned, so large intermediate results don't leak:
session = invoker.open_session()
async with session:
r1 = await invoker.invoke(call_1, session=session, ctx=ctx)
r2 = await invoker.invoke(call_2, session=session, ctx=ctx)
# ... budget counted across both, trace holds both records
# on exit: every pinned artifact is unpinned
Always use async with for the session
Forgetting to close() leaks pinned artifacts in the BlobStore. The async with form guarantees the unpin even if the chain raises.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#E8EAF6','primaryTextColor': '#1A237E','primaryBorderColor': '#3949AB','lineColor': '#546E7A','fontSize': '13px'}}}%%
stateDiagram-v2
[*] --> Open: open_session()
Open --> Open: invoke() — count plus 1, append trace, maybe pin ref
Open --> Closed: __aexit__ — unpin all refs
Closed --> [*]
note right of Open
call_count grows toward max_tool_calls
trace holds one ChainCallRecord per call
pinned_refs lists artifacts to release
end note A full chained call, end to end¶
%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#E8EAF6','actorBorder': '#3949AB','actorTextColor': '#1A237E','signalColor': '#546E7A','noteBkgColor': '#FFFDE7','noteBorderColor': '#F57F17','fontSize': '12px'}}}%%
sequenceDiagram
autonumber
participant S as Sandbox script
participant I as ToolInvoker
participant H as ApprovalHandler
participant B as BlobStore
participant T as Real tool
S->>I: invoke(call, session, ctx)
Note over I: budget plus type gate pass
alt risk above threshold
I->>H: request(ApprovalRequest) within approval_timeout_s
H-->>I: APPROVED
end
I->>B: resolve dollar-artifact refs into bytes
B-->>I: raw bytes
I->>T: execute(**args) within call_timeout_s
T-->>I: ToolExecutionResult
alt result is large
I->>B: store plus pin — session tracks the ref
B-->>I: artifact_ref
end
I-->>S: InvocationResult(status, text, artifact_ref, files)
Note over I,S: ChainCallRecord appended to session trace — always How this connects to the rest of the framework¶
-
Kernel contracts (L0). Everything the invoker touches is a frozen kernel type: it gates on
ToolRisk, buildsApprovalRequests for theApprovalHandlerProtocol, returns theInvocationResultvalue type, and recordsChainCallRecords — all defined inkernel/tools/.Toolboxis the concrete implementation of the kernelToolRegistryProtocol. -
Capabilities (L2):
ToolChainTool. Code-mode chaining is driven byToolChainToolincapabilities/tools/chain/. It runs the model's script in a sandbox and routes every in-script tool call back across a bridge to this invoker. That round-trip is why a script can't dodge risk, approval, timeout, or budget — the inspector is unavoidable. The kernel only describes the chain value types; the running machinery isToolChainTool(L2) talking toToolInvoker(L1).
InvokerSession lives in agents, not in the kernel
The kernel chain.py deliberately stays pure data (ChainPolicy, InvocationResult, ChainFile, ChainCallRecord, ChainRunResult). The session abstraction — the mutable counter, trace, and pin list — is a runtime concern, so it lives here in agents/tools/invoker.py, not in kernel/tools/chain.py. The kernel side stays replayable and shippable; the moving parts stay one layer up.
Where this lives¶
| Piece | Location |
|---|---|
Toolbox (the ToolRegistry impl) | agents/tools/toolbox.py |
ToolInvoker, InvokerSession | agents/tools/invoker.py |
ChainPolicy, InvocationResult, ChainFile, ChainCallRecord (contracts) | kernel/tools/chain.py |
ToolRisk, ToolRegistry, is_hosted_tool, is_provider_defined_tool | kernel/tools/tools.py |
ApprovalHandler, ApprovalRequest, ApprovalDecision | kernel/tools/approval.py |
HookManager, HookEvent (TOOL_START / TOOL_END) | agents/hooks/ |
ToolChainTool + bridge + prelude (the chain driver) | capabilities/tools/chain/ |
List-to-Toolbox wrapping in the agent | agents/core/react.py |
Next: Supervision, Budgets & Hooks — the headcount, spend, and lifecycle-hook machinery that wraps an agent run (and that feeds the HookManager the invoker fires TOOL_START / TOOL_END into).