Behavioral Traces
An audit trail of every action your agent takes during a test.
- ✓Traces capture the full picture: tool calls, LLM interactions, timing, and decision paths
- ✓Auto-patching supports OpenAI, Anthropic, Google Gemini, and LangChain -- no code changes needed
- ✓Traces power failure classification, quality scoring, and regression detection
Why It Matters
A behavioral trace is a detailed audit trail of every action your AI agent takes during a test — enabling you to understand exactly how your agent reached its answer, not just what it answered.
Traditional testing checks outputs. Behavioral tracing captures the process — which tools were called, in what order, how long each step took, and which tools were available but not used. This is the data that powers Invarium’s failure classification, quality scoring, and regression detection.
What Traces Capture
For every test case execution, Invarium records:
- Tool calls — which tools were called, with what arguments, what they returned, and how long each took
- LLM interactions — which model was used, tokens consumed, and which tools the model chose to invoke
- Tool call sequence — the ordered path your agent took, compared against the expected sequence
- Timing — total duration and per-step timing for performance analysis
- Cost estimate — estimated LLM cost based on token usage across supported models
- Expected vs. actual comparison — highlights missing steps, extra steps, and reordered operations
Supported Frameworks
The trace library auto-patches your agent’s LLM SDK with no code changes required:
- OpenAI (sync and async)
- Anthropic (sync and async)
- Google Gemini (sync and async)
- LangChain (via callback handler)
For LangChain agents, a callback handler is attached during test setup. Your coding agent handles this automatically when you set up tracing via MCP.
Privacy and PII Redaction
Before any trace data leaves your machine, Invarium automatically scrubs sensitive information — including social security numbers, credit card numbers, email addresses, phone numbers, and common sensitive field names (passwords, tokens, API keys). Redaction is enabled by default.
Viewing Traces
Navigate to a test run and click on any individual test result. The trace viewer shows:
- The full tool call sequence as a timeline
- Each tool call with arguments, result, and duration
- LLM call details with token counts and model
- The expected vs. actual comparison (if the scenario defines expected tools)
- Cost breakdown per LLM call and total
Understanding Trace Results
Traces reveal distinct patterns that help you diagnose issues:
- Successful execution — tool call sequence matches the expected sequence, no errors, reasonable timing
- Hallucination — agent responds with factual-sounding content but the tool call sequence is empty or missing the expected lookup tool
- Tool misuse — agent called the wrong tool, skipped a required step, or passed incorrect parameters
- Loops — tool call sequence contains repeated tool names, indicating the agent got stuck in a cycle
These patterns map directly to the failure taxonomy categories, helping you prioritize which issues to fix first.
FAQ
Does tracing add latency to my agent?
The tracer adds negligible overhead. It records data without blocking your agent’s execution flow.
What if my agent uses an unsupported framework?
You can still sync results manually. The trace data will include whatever tool call information you provide in the result objects.
How long are traces retained?
During the beta period, traces are retained for 90 days.