Behavioral Tracing

See exactly what happened, step by step.

Behavioral Tracing captures a detailed record of every action your agent takes during a test case. Every user input, tool call, parameter, result, and final output is logged with timestamps — giving you a full audit trail for debugging and analysis.

⚠️

Behavioral Tracing data models exist in the platform, but the dashboard trace visualization is coming soon. Currently, trace data is captured from your synced results and stored for future analysis.

Why tracing?

When a test fails, you need to know more than just “it failed.” You need to know:

What did the agent do first?
Which tools did it call, and with what parameters?
What did those tools return?
How did the agent interpret the results?
Where exactly did the behavior diverge from what was expected?

Behavioral Tracing answers all of these questions. It is the difference between “the agent hallucinated” and “the agent called search_kb with the right query, got zero results, and then fabricated an answer instead of saying it didn’t know.”

What is captured

Each trace records a sequence of timestamped events:

Event type	What it captures
user_input	The message or trigger sent to the agent
agent_reasoning	The agent’s internal reasoning or chain-of-thought (when available)
tool_call	The tool name, parameters passed, and the raw result returned
tool_error	Any errors from tool execution, including error type and message
agent_response	The final response the agent produced
metadata	Timing information, token counts, model used

Trace format

Traces are stored as JSON with the following structure:

{
  "trace_id": "tr_abc123",
  "scenario_id": "sc_xyz789",
  "agent_name": "customer-support-agent",
  "timestamp": "2026-03-13T14:30:00Z",
  "duration_ms": 2340,
  "events": [
    {
      "type": "user_input",
      "timestamp": "2026-03-13T14:30:00.000Z",
      "data": {
        "message": "What is your refund policy for digital products?"
      }
    },
    {
      "type": "tool_call",
      "timestamp": "2026-03-13T14:30:00.450Z",
      "data": {
        "tool": "search_knowledge_base",
        "parameters": {
          "query": "refund policy digital products"
        },
        "result": {
          "articles": [
            {
              "title": "Refund Policy",
              "content": "Digital products are eligible for a refund within 14 days of purchase..."
            }
          ]
        },
        "duration_ms": 120
      }
    },
    {
      "type": "agent_response",
      "timestamp": "2026-03-13T14:30:02.340Z",
      "data": {
        "message": "Our refund policy for digital products allows refunds within 14 days of purchase. You can request a refund by contacting support.",
        "tokens_used": 245
      }
    }
  ]
}

Reading a trace

Successful trace

A successful trace shows a clean path from input to output:

user_input — The test scenario’s message
tool_call (one or more) — The agent correctly calls the appropriate tools
agent_response — The agent synthesizes tool results into a correct answer

Failed trace

A failed trace reveals where things went wrong. Common patterns:

Hallucination pattern:

user_input -> tool_call (returns empty results) -> agent_response (fabricates an answer)

The trace shows that the tool returned no data, but the agent responded as if it had information.

Tool misuse pattern:

user_input -> tool_call (wrong tool or wrong parameters) -> agent_response (based on irrelevant results)

The trace shows the exact parameters the agent chose and why they were wrong.

Missing tool call pattern:

user_input -> agent_response (no tool calls at all)

The trace shows the agent skipped tool calls entirely and answered from its training data.

Loop pattern:

user_input -> tool_call -> tool_call -> tool_call (same tool, same params, repeated)

The trace shows repeated identical tool calls, making the loop obvious.

How to use traces

View in the dashboard

When the trace viewer is available, traces will be viewable in the Test Runs section of the dashboard:

Navigate to a test run
Click on any individual test case
The trace view shows the full sequence of events with a timeline

Each event is displayed on a timeline with:

Elapsed time from the start
Event type with an icon
Expandable details showing parameters, results, and metadata

Compare traces

When debugging a regression, compare traces across test runs:

Find the same test scenario in two different runs
Open both traces side by side
Look for the point where behavior diverges

Traces from different runs of the same scenario are linked by scenario_id, making it easy to track how your agent’s behavior changes over time.

Export for analysis

Trace export from the dashboard is coming soon. Currently, trace data is stored from your synced results.

Trace and test results

Traces are connected to test results. When you sync results with invarium_sync_results, the trace data enriches the test result:

{
  "scenario_id": "sc_xyz789",
  "user_message": "What is your refund policy for digital products?",
  "agent_response": "Our refund policy allows refunds within 14 days...",
  "tools_called": [
    {
      "name": "search_knowledge_base",
      "parameters": { "query": "refund policy digital products" }
    }
  ],
  "passed": true
}

The tools_called field in your synced results maps directly to tool_call events in the trace, allowing Invarium to verify that the correct tools were used with the correct parameters.

Privacy and data handling

⚠️

Traces may contain sensitive data from your agent’s interactions. Invarium applies PII redaction to trace data before storage. Review the data handling section in your workspace settings to configure redaction rules.

Trace data handling:

PII redaction — Email addresses, phone numbers, and other PII patterns are automatically redacted in stored traces
Retention — Traces are retained for the duration of your plan’s retention period
Access — Only workspace members with the appropriate role can view traces
Deletion — You can delete traces from the dashboard at any time

FAQ

Q: Are traces generated automatically? A: Traces are generated from the data you provide in invarium_sync_results. The more detail you include (especially tools_called), the richer the trace.

Q: Can I trace production interactions? A: Behavioral Tracing is designed for test scenarios run through Invarium. For production monitoring, integrate your agent’s logging system and use Invarium for periodic regression testing.

Q: How much data do traces store? A: Trace size depends on the number of events and the size of tool results. Invarium truncates very large tool results (over 10KB per event) to keep traces manageable.

Q: Can I add custom events to a trace? A: Not currently. Traces are built from the standard event types (user_input, tool_call, agent_response). Custom event support is on the roadmap.

Was this page helpful?

Agent Intelligence Graph Test Your First Agent