Behavioral Traces

An audit trail of every action your agent takes during a test.

Key Takeaways

✓Traces capture the full picture: tool calls, LLM interactions, timing, and decision paths
✓Auto-patching supports OpenAI, Anthropic, Google Gemini, and LangChain -- no code changes needed
✓Traces power failure classification, quality scoring, and regression detection

Why It Matters

A behavioral trace is a detailed audit trail of every action your AI agent takes during a test — enabling you to understand exactly how your agent reached its answer, not just what it answered.

Traditional testing checks outputs. Behavioral tracing captures the process — which tools were called, in what order, how long each step took, and which tools were available but not used. This is the data that powers Invarium’s failure classification, quality scoring, and regression detection.

What Traces Capture

For every test case execution, Invarium records:

Tool calls — which tools were called, with what arguments, what they returned, and how long each took
LLM interactions — which model was used, tokens consumed, and which tools the model chose to invoke
Tool call sequence — the ordered path your agent took, compared against the expected sequence
Timing — total duration and per-step timing for performance analysis
Cost estimate — estimated LLM cost based on token usage across supported models
Expected vs. actual comparison — highlights missing steps, extra steps, and reordered operations

Supported Frameworks

The trace library auto-patches your agent’s LLM SDK with no code changes required:

OpenAI (sync and async)
Anthropic (sync and async)
Google Gemini (sync and async)
LangChain (via callback handler)

For LangChain agents, a callback handler is attached during test setup. Your coding agent handles this automatically when you set up tracing via MCP.

Privacy and PII Redaction

Before any trace data leaves your machine, Invarium automatically scrubs sensitive information — including social security numbers, credit card numbers, email addresses, phone numbers, and common sensitive field names (passwords, tokens, API keys). Redaction is enabled by default.

Viewing Traces

Navigate to a test run and click on any individual test result. The trace viewer shows:

The full tool call sequence as a timeline
Each tool call with arguments, result, and duration
LLM call details with token counts and model
The expected vs. actual comparison (if the scenario defines expected tools)
Cost breakdown per LLM call and total

Understanding Trace Results

Traces reveal distinct patterns that help you diagnose issues:

Successful execution — tool call sequence matches the expected sequence, no errors, reasonable timing
Hallucination — agent responds with factual-sounding content but the tool call sequence is empty or missing the expected lookup tool
Tool misuse — agent called the wrong tool, skipped a required step, or passed incorrect parameters
Loops — tool call sequence contains repeated tool names, indicating the agent got stuck in a cycle

These patterns map directly to the failure taxonomy categories, helping you prioritize which issues to fix first.

FAQ

Does tracing add latency to my agent?

The tracer adds negligible overhead. It records data without blocking your agent’s execution flow.

What if my agent uses an unsupported framework?

You can still sync results manually. The trace data will include whatever tool call information you provide in the result objects.

How long are traces retained?

During the beta period, traces are retained for 90 days.

Failure Taxonomy Agent Intelligence Graph