Skip to main content
def evaluate(
    self,
    tool_name: str,
    *,
    session_id: str,
    agent_id: str | None = None,
    metadata: EvaluationMetadata | None = None,
    framework: str | None = None,
    trace: TraceContext | None = None,
) -> EvaluationResult:

Parameters

ParamTypeRequiredNotes
tool_namestryesMatch against tool_name in your policies.
session_idstryesGroups events in the audit log. Pick a granularity (per-conversation is common).
agent_idstr | NonenoLocked to the JWT’s sub. Mismatched values are silently ignored.
metadataEvaluationMetadata | NonenoTool input + framework-specific extras. See below.
frameworkstr | NonenoTag for the audit event. Adapters set this.
traceTraceContext | NonenoIf set, uploaded and attached to the audit event.

EvaluationMetadata

class EvaluationMetadata(BaseModel):
    input: dict[str, Any] | None = None
    args: list[Any] | None = None
    kwargs: dict[str, Any] | None = None
    tool_use_id: str | None = None
Pick the field that matches how the call’s arguments are structured:
  • MCP, Claude Agent: input={...} — the dict the tool was invoked with.
  • LangChain: args=[...], kwargs={...} — positional and keyword.
  • Raw SDK: whatever shape your tools take.
These flow into the policy evaluator’s request as input.<key>, args.<index>, kwargs.<key> respectively.

Returns: EvaluationResult

class EvaluationResult(BaseModel):
    decision: Literal["allow", "deny"]
    reason: str | None
    code: str | None
    matchedPolicyId: str | None
    matchedPolicyVersion: int | None
    matchedRuleId: str | None
    latencyMs: float
For decision == "allow" with no policy matching, all the matched* fields are None. For deny, the matched rule’s id becomes code and its description becomes reason.

Examples

Plain

result = gov.evaluate("delete_file", session_id="s1")
if result.decision == "deny":
    raise PermissionError(result.reason)

With input

from rubric import EvaluationMetadata

result = gov.evaluate(
    "shell.exec",
    session_id="s1",
    metadata=EvaluationMetadata(input={"command": "rm -rf /tmp"}),
)
The policy can match on input.command:
- field: input.command
  operator: contains
  value: "rm -rf"

With trace

from rubric import TraceContext, UserMessage

trace = TraceContext()
trace.append(UserMessage(content="Delete the staging logs"))

result = gov.evaluate(
    "delete_file",
    session_id="s1",
    metadata=EvaluationMetadata(input={"path": "/var/log/old.log"}),
    trace=trace,
)
The trace is uploaded to Rubric and the returned traceId is attached to the audit event. See Traces.

Performance

Synchronous portion (returns to caller):
  • Pure-Python evaluator: ~50–200µs.
  • Native (Rust) evaluator: ~5–20µs (pip install rubric-app[runtime]).
  • DLP scan: ~50µs (regex), ~1ms (Presidio).
  • Trace upload (if attached): 2–5ms.
Asynchronous (does not block the caller):
  • Audit event flush: batched, runs every 1s or every 100 events.

Error handling

evaluate() does not raise. If anything goes wrong internally, you get an EvaluationResult(decision="allow", ...) and the error is logged. This is deliberate — a misconfigured Rubric should never block your agent. To enforce default-deny, set spec.defaultEffect: deny in your policy.