Skip to content

tools/ — What Agents Can Do

Source: kernel/tools/tools.py · kernel/tools/chain.py · kernel/tools/approval.py · kernel/tools/skills.py

Defines the complete tool taxonomy: three execution modes across two axes, wire declarations for LLM providers, sandboxed code chaining, human-in-the-loop approval, and prompt skills.


The Tool Taxonomy

Three Protocols govern all tools. The right one depends on who executes and who declares:

Tool Protocol Execution Mode Key Fields Core Execution Method Examples
Tool Local name, description, input_schema execute(ctx, **kwargs) WebSearchTool, CalculatorTool
HostedTool Provider name, description, provider_specs: dict Executed by provider OpenAI web_search_preview, Anthropic web_search
ProviderDefinedTool Provider-declared,
Local execution
name, description, provider_specs: dict, call_types handle_call(call, ctx) OpenAI computer_use, apply_patch

Core Data & Specs

  • ToolExecutionResult: Represents execution outputs. Fields:
    • content: list[ContentBlock]: The multimodal output blocks.
    • is_error: bool: Flag indicating execution failure.
    • structured_content: dict: Parsed dictionary response.
  • FunctionSpec: Local schema specification: name, description, parameters: dict, lazy_schema: bool.
  • ProviderSpec: Provider-specific spec: name, provider: str, spec: dict.

Dispatch pattern at runtime

flowchart TD
    LLM["LLM returns tool call"] --> CHECK1{"is_provider_defined_tool(tool)?"}
    CHECK1 -->|"yes"| PROV_RUN["await tool.handle_call(call, ctx)<br/>&rarr; ToolExecutionResult"]
    CHECK1 -->|"no"| CHECK2{"is_hosted_tool(tool)?"}
    CHECK2 -->|"yes"| SKIP["Skip local execution<br/>(Result arrives in next LLM turn)"]
    CHECK2 -->|"no — plain Tool"| LOCAL_RUN["await tool.execute(ctx, **args)<br/>&rarr; ToolExecutionResult"]

Always check is_provider_defined_tool before is_hosted_tool — both have provider_specs, but only ProviderDefinedTool has handle_call.


ToolRisk and Approval

Tools declare a risk level. High and critical risk tools pause execution and ask a human before proceeding.

sequenceDiagram
    autonumber
    participant Agent
    participant Invoker as ToolInvoker (L1)
    participant Handler as ApprovalHandler
    participant Human

    Agent->>Invoker: execute tool "send_email" (ToolRisk.HIGH)
    Invoker->>Invoker: check tool.risk

    alt risk == SAFE
        Invoker->>Invoker: execute immediately
    else risk == HIGH or CRITICAL
        Invoker->>Handler: request(ApprovalRequest)
        Handler->>Human: "send_email wants to run. Approve?"
        Human-->>Handler: decision
        Handler-->>Invoker: ApprovalDecision

        alt APPROVED
            Invoker->>Invoker: execute tool
        else DENIED
            Invoker-->>Agent: ToolExecutionResult(is_error=True, "denied by user")
        end
    end

    Invoker-->>Agent: ToolExecutionResult

ApprovalRequest is immutable and fully serializable — it can be stored in Postgres and resumed after a restart. ApprovalHandler implementations: WebApprovalHandler (the ravi-ui HITL card), AutoApprovalHandler (tests), CliApprovalHandler (terminal).


Sandboxed Code-Mode Chaining

Tool chaining lets the LLM write a Python script that calls multiple tools and pipes results between them. The script runs in a Firecracker/K8s sandbox; each tool call crosses the bridge back to the framework-side ToolInvoker.

sequenceDiagram
    autonumber
    participant LLM
    participant Chain as ToolChainTool (L2)
    participant Sandbox as Firecracker Sandbox
    participant Invoker as ToolInvoker (L1)
    participant Tool

    LLM->>Chain: execute(code="...")
    Chain->>Sandbox: run(code, prelude_with_bridge)

    loop Each tool call in script (max 50)
        Sandbox->>Invoker: bridge call: {tool, args}
        Invoker->>Tool: execute(ctx, **args)
        Tool-->>Invoker: ToolExecutionResult

        alt result <= 4096 bytes
            Invoker-->>Sandbox: InvocationResult(text, structured)
        else large result
            Invoker->>Invoker: store in ArtifactStore
            Invoker-->>Sandbox: InvocationResult(artifact_ref, preview)
        end
    end

    Sandbox-->>Chain: ChainRunResult(output_text, call_trace)
    Chain-->>LLM: ToolExecutionResult(content)

    Note over Chain: call_trace lists every tool that ran<br/>even on crash — LLM avoids re-sending emails

Key types from chain.py:

Type Purpose
ChainPolicy Limits: max_tool_calls=50, call_timeout_s=60, total_timeout_s=300, max_inline_result_bytes=4096
InvocationResult What the sandbox receives back: status, text, structured, artifact_ref, files
ChainRunResult Final outcome: status, output_text, call_trace, duration_ms
ChainCallRecord One entry per bridged call in the trace: tool, args_digest, status, duration_ms

Skills — Prompt Packages

A Skill behaves as a prompt package attached to an agent to inject custom instructions and constrain available tools:

Field Type Description
name str Unique skill name.
instructions str Prompt text appended to the agent's system prompt during runs.
description str Short explanation of the skill's purpose.
allowed_tools tuple[str, ...] List of tool names that the agent is restricted to using while this skill is active.
path str \| None Filepath if the skill was loaded dynamically from disk.
version str Version string (e.g. '1.0.0').

A Skill extends an agent's behaviour without modifying its code. When attached to an agent, instructions are appended to the effective system prompt and allowed_tools limits which tools the skill can use.