Skip to content

Guardrails

Guardrails are checks that fire at three points in every agent turn. They protect against prompt injection, PII leakage, runaway tool calls, and toxic output — before any damage is done.


Three injection points

flowchart LR
    IN([User input]) --> IG["🛡 INPUT\nguardrails"]
    IG -->|pass| LLM["LLM call"]
    IG -->|tripwire| STOP1(["GuardrailTripwireError\nrun stops immediately"])

    LLM -->|tool_calls| TG["🛡 TOOL_CALL\nguardrails"]
    TG -->|pass| EX["tool.run()"]
    TG -->|tripwire| STOP2(["Tool denied"])

    EX --> LLM
    LLM -->|text answer| OG["🛡 OUTPUT\nguardrails"]
    OG -->|pass| OUT([Result returned])
    OG -->|tripwire| STOP3(["GuardrailTripwireError"])

    style IG   fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style TG   fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style OG   fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style STOP1 fill:#2b0d0d,stroke:#f87171,color:#e2ecff
    style STOP2 fill:#2b0d0d,stroke:#f87171,color:#e2ecff
    style STOP3 fill:#2b0d0d,stroke:#f87171,color:#e2ecff

pass / fail / tripwire

Every guardrail returns a GuardrailResult. The tripwire field is the danger switch.

flowchart LR
    CHECK["guardrail.check(ctx)"] --> R["GuardrailResult\n{passed, tripwire, message}"]
    R --> P{passed?}
    P -->|true| OK(["Continue"])
    P -->|false + tripwire=False| WARN(["Log warning\nContinue"])
    P -->|false + tripwire=True| ERR(["GuardrailTripwireError\nStop run"])

    style OK   fill:#1a2b1a,stroke:#4ade80,color:#e2ecff
    style WARN fill:#2b2710,stroke:#facc15,color:#e2ecff
    style ERR  fill:#3b1a1a,stroke:#f87171,color:#e2ecff

Prebuilt guardrails

PromptInjectionGuardrail

Blocks 13 patterns: DAN prompts, role-override attempts, jailbreaks, "ignore previous instructions", etc.

from ravi.core.guardrails.prebuilt import PromptInjectionGuardrail

guard = PromptInjectionGuardrail(
    tripwire=True,               # raise on detection
    extra_patterns=["my_org_secret"],   # add your own regex patterns
)

PIIDetectionGuardrail

Detects email, US phone, SSN, credit card, and IP address by default. pii_types=None activates all patterns.

from ravi.core.guardrails.prebuilt import PIIDetectionGuardrail

guard = PIIDetectionGuardrail(
    pii_types=["email", "credit_card"],  # None = all
    custom_patterns={"employee_id": r"EMP-\d{6}"},
    tripwire=True,
)

ContentFilterGuardrail

Blocks messages matching regex patterns or keyword lists. Works at INPUT or OUTPUT.

from ravi.core.guardrails.prebuilt import ContentFilterGuardrail
from ravi.core.guardrails.base_guardrail import GuardrailType

# As output filter
guard = ContentFilterGuardrail(
    guardrail_type=GuardrailType.OUTPUT,
    blocked_keywords=["internal_project_name", "api_key_here"],
    blocked_patterns=[r"sk-[A-Za-z0-9]{48}"],  # OpenAI key pattern
    tripwire=True,
)

MaxTokenGuardrail

Rejects input that's too long before it ever reaches the LLM. Uses tiktoken when available.

from ravi.core.guardrails.prebuilt import MaxTokenGuardrail

guard = MaxTokenGuardrail(
    max_tokens=4096,
    model="gpt-4o",
    tripwire=True,
)

ToolCallValidationGuardrail

Checks which tools the LLM is allowed to call and validates argument patterns before execution.

from ravi.core.guardrails.prebuilt import ToolCallValidationGuardrail

guard = ToolCallValidationGuardrail(
    allowed_tools=["web_search", "code_interpreter"],  # None = all allowed
    blocked_tools=["delete_file"],
    blocked_argument_patterns={
        "run_sql": {
            "query": [r"DROP\s+TABLE", r"DELETE\s+FROM"],  # block destructive SQL
        },
    },
    tripwire=True,
)

LLMJudgeGuardrail

Uses a second LLM call to evaluate the content. The judge must respond with {"safe": bool, "reason": str}.

from ravi.core.guardrails.prebuilt import LLMJudgeGuardrail
from ravi.core.guardrails.base_guardrail import GuardrailType

guard = LLMJudgeGuardrail(
    model_client=client,
    judge_prompt="Is this response safe, factual, and non-toxic? Respond with {\"safe\": bool, \"reason\": str}",
    guardrail_type=GuardrailType.OUTPUT,
    tripwire=True,
)

Full example — hardened agent

graph TB
    IN([Input]) --> G1["PromptInjectionGuardrail"] --> G2["PIIDetectionGuardrail"] --> G3["MaxTokenGuardrail"]
    G3 --> LLM["LLM call"]
    LLM --> G4["ToolCallValidationGuardrail"]
    G4 --> EX["tool.run()"]
    EX --> LLM
    LLM --> G5["ContentFilterGuardrail (output)"]
    G5 --> OUT([Result])

    style G1 fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style G2 fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style G3 fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style G4 fill:#3b1a1a,stroke:#f87171,color:#e2ecff
    style G5 fill:#3b1a1a,stroke:#f87171,color:#e2ecff
from ravi.core.agents.react_agent import ReActAgent
from ravi.core.guardrails.prebuilt import (
    PromptInjectionGuardrail,
    PIIDetectionGuardrail,
    MaxTokenGuardrail,
    ToolCallValidationGuardrail,
    ContentFilterGuardrail,
)
from ravi.core.guardrails.base_guardrail import GuardrailType

agent = ReActAgent(
    name="safe_agent",
    model_client=client,
    model_context=ctx,
    tools=[WebSearchTool(), CodeInterpreterTool()],
    input_guardrails=[
        PromptInjectionGuardrail(tripwire=True),
        PIIDetectionGuardrail(pii_types=["email", "ssn", "credit_card"]),
        MaxTokenGuardrail(max_tokens=4096),
    ],
    output_guardrails=[
        ContentFilterGuardrail(
            guardrail_type=GuardrailType.OUTPUT,
            blocked_patterns=[r"sk-[A-Za-z0-9]{48}"],
        ),
    ],
    # TOOL_CALL guardrails go inside ReActAgent via tools_requiring_approval
    # or use ToolCallValidationGuardrail via a custom config
)

Writing a custom guardrail

Subclass BaseGuardrail, set name and guardrail_type at the class level, implement check().

from ravi.core.guardrails.base_guardrail import (
    BaseGuardrail, GuardrailContext, GuardrailResult, GuardrailType
)

class LengthGuardrail(BaseGuardrail):
    name           = "length_check"
    guardrail_type = GuardrailType.INPUT

    def __init__(self, max_chars: int = 2000, tripwire: bool = True):
        self.max_chars = max_chars
        self.tripwire  = tripwire

    async def check(self, ctx: GuardrailContext) -> GuardrailResult:
        text = ctx.input_text or ""
        if len(text) > self.max_chars:
            return self._fail(
                f"Input too long: {len(text)} chars (max {self.max_chars})",
                tripwire=self.tripwire,
                length=len(text),
            )
        return self._pass()

Source

File What it owns
core/guardrails/base_guardrail.py BaseGuardrail, GuardrailContext, GuardrailResult, GuardrailType
core/guardrails/prebuilt.py All five prebuilt guardrails
core/guardrails/runner.py Guardrail pipeline runner (used by agents internally)