evaluate() means every audit row in the dashboard has a one-click drawer showing exactly what the model was thinking when it asked to do something.
This is what turns “we denied a deletion at 14:32” from a line in a log into a meaningful incident review.
TraceContext
The SDK ships aTraceContext you build up alongside your model loop:
ToolCallMessage automatically when you pass trace=trace — you only need to handle user/assistant turns and the result.
What you’ll see in the dashboard
Every audit row with a trace shows a small file icon. Click anywhere on the row and a drawer slides in with:- The full transcript — user messages, assistant replies, tool calls and results, in order.
- The tool call highlighted — the message that this audit row was emitted for.
- DLP highlights inline — secret/PII matches are visually flagged in the message body.
When to attach traces
Always, in production agents. The cost is small and the operational value is large:- Cost. One extra trace upload per
evaluate()(2–5ms in parallel with the audit batch). Storage: a few KB per call, kept for 30 days. - Value. When a deny happens at 3 AM, you want to know why the model wanted to do it. The trace is your only evidence.
trace= for low-stakes paths (synthetic load tests, throwaway batch jobs).
Roll-your-own message types
TraceContext accepts any object with a model_dump() returning { "role": ..., "content": ... }-shaped data. The four built-in types cover the common case:
| Type | Role | Content |
|---|---|---|
UserMessage(content="…") | user | string |
AssistantMessage(content="…") | assistant | string |
ToolCallMessage(toolName, args, toolUseId?) | tool_use | structured |
ToolResultMessage(content, isError, toolUseId?) | tool_result | string + flag |
Privacy
Traces include user messages — which may contain PII. Two protections:- Per-org isolation. Trace blobs are scoped to your workspace and only readable by signed-in operators through your dashboard. There is no other access path.
- DLP redaction (opt-in). When DLP is enabled, Rubric redacts detected secrets/PII before persisting. The full unredacted trace never touches disk.
trace= to evaluate(). You’ll keep the audit log; you’ll lose the conversational context.