Behavioral Tracing
See exactly what happened, step by step.
Behavioral Tracing captures a detailed record of every action your agent takes during a test case. Every user input, tool call, parameter, result, and final output is logged with timestamps — giving you a full audit trail for debugging and analysis.
Behavioral Tracing data models exist in the platform, but the dashboard trace visualization is coming soon. Currently, trace data is captured from your synced results and stored for future analysis.
Why tracing?
When a test fails, you need to know more than just “it failed.” You need to know:
- What did the agent do first?
- Which tools did it call, and with what parameters?
- What did those tools return?
- How did the agent interpret the results?
- Where exactly did the behavior diverge from what was expected?
Behavioral Tracing answers all of these questions. It is the difference between “the agent hallucinated” and “the agent called search_kb with the right query, got zero results, and then fabricated an answer instead of saying it didn’t know.”
What is captured
Each trace records a sequence of timestamped events:
| Event type | What it captures |
|---|---|
| user_input | The message or trigger sent to the agent |
| agent_reasoning | The agent’s internal reasoning or chain-of-thought (when available) |
| tool_call | The tool name, parameters passed, and the raw result returned |
| tool_error | Any errors from tool execution, including error type and message |
| agent_response | The final response the agent produced |
| metadata | Timing information, token counts, model used |
Trace format
Traces are stored as JSON with the following structure:
{
"trace_id": "tr_abc123",
"scenario_id": "sc_xyz789",
"agent_name": "customer-support-agent",
"timestamp": "2026-03-13T14:30:00Z",
"duration_ms": 2340,
"events": [
{
"type": "user_input",
"timestamp": "2026-03-13T14:30:00.000Z",
"data": {
"message": "What is your refund policy for digital products?"
}
},
{
"type": "tool_call",
"timestamp": "2026-03-13T14:30:00.450Z",
"data": {
"tool": "search_knowledge_base",
"parameters": {
"query": "refund policy digital products"
},
"result": {
"articles": [
{
"title": "Refund Policy",
"content": "Digital products are eligible for a refund within 14 days of purchase..."
}
]
},
"duration_ms": 120
}
},
{
"type": "agent_response",
"timestamp": "2026-03-13T14:30:02.340Z",
"data": {
"message": "Our refund policy for digital products allows refunds within 14 days of purchase. You can request a refund by contacting support.",
"tokens_used": 245
}
}
]
}Reading a trace
Successful trace
A successful trace shows a clean path from input to output:
- user_input — The test scenario’s message
- tool_call (one or more) — The agent correctly calls the appropriate tools
- agent_response — The agent synthesizes tool results into a correct answer
Failed trace
A failed trace reveals where things went wrong. Common patterns:
Hallucination pattern:
- user_input -> tool_call (returns empty results) -> agent_response (fabricates an answer)
The trace shows that the tool returned no data, but the agent responded as if it had information.
Tool misuse pattern:
- user_input -> tool_call (wrong tool or wrong parameters) -> agent_response (based on irrelevant results)
The trace shows the exact parameters the agent chose and why they were wrong.
Missing tool call pattern:
- user_input -> agent_response (no tool calls at all)
The trace shows the agent skipped tool calls entirely and answered from its training data.
Loop pattern:
- user_input -> tool_call -> tool_call -> tool_call (same tool, same params, repeated)
The trace shows repeated identical tool calls, making the loop obvious.
How to use traces
View in the dashboard
When the trace viewer is available, traces will be viewable in the Test Runs section of the dashboard:
- Navigate to a test run
- Click on any individual test case
- The trace view shows the full sequence of events with a timeline
Each event is displayed on a timeline with:
- Elapsed time from the start
- Event type with an icon
- Expandable details showing parameters, results, and metadata
Compare traces
When debugging a regression, compare traces across test runs:
- Find the same test scenario in two different runs
- Open both traces side by side
- Look for the point where behavior diverges
Traces from different runs of the same scenario are linked by scenario_id, making it easy to track how your agent’s behavior changes over time.
Export for analysis
Trace export from the dashboard is coming soon. Currently, trace data is stored from your synced results.
Trace and test results
Traces are connected to test results. When you sync results with invarium_sync_results, the trace data enriches the test result:
{
"scenario_id": "sc_xyz789",
"user_message": "What is your refund policy for digital products?",
"agent_response": "Our refund policy allows refunds within 14 days...",
"tools_called": [
{
"name": "search_knowledge_base",
"parameters": { "query": "refund policy digital products" }
}
],
"passed": true
}The tools_called field in your synced results maps directly to tool_call events in the trace, allowing Invarium to verify that the correct tools were used with the correct parameters.
Privacy and data handling
Traces may contain sensitive data from your agent’s interactions. Invarium applies PII redaction to trace data before storage. Review the data handling section in your workspace settings to configure redaction rules.
Trace data handling:
- PII redaction — Email addresses, phone numbers, and other PII patterns are automatically redacted in stored traces
- Retention — Traces are retained for the duration of your plan’s retention period
- Access — Only workspace members with the appropriate role can view traces
- Deletion — You can delete traces from the dashboard at any time
FAQ
Q: Are traces generated automatically?
A: Traces are generated from the data you provide in invarium_sync_results. The more detail you include (especially tools_called), the richer the trace.
Q: Can I trace production interactions? A: Behavioral Tracing is designed for test scenarios run through Invarium. For production monitoring, integrate your agent’s logging system and use Invarium for periodic regression testing.
Q: How much data do traces store? A: Trace size depends on the number of events and the size of tool results. Invarium truncates very large tool results (over 10KB per event) to keep traces manageable.
Q: Can I add custom events to a trace? A: Not currently. Traces are built from the standard event types (user_input, tool_call, agent_response). Custom event support is on the roadmap.