Agent Intelligence Graph
Understand not just IF your agent fails, but WHERE in its decision tree.
The Agent Intelligence Graph is a visual map of your agent’s behavioral patterns across test scenarios. It represents your agent’s tools, decision paths, and failure clusters as an interactive graph — giving you a structural view of how your agent works and where it breaks.
The Agent Intelligence Graph is currently being developed. The concepts described on this page reflect the planned functionality. Dashboard visualization is coming soon.
Why a graph?
Pass/fail metrics tell you that something is wrong. The Agent Intelligence Graph tells you where in your agent’s behavior the problem lives.
Consider an agent with three tools: search_kb, create_ticket, and escalate. A traditional test report might say “3 out of 10 tests failed.” The graph shows you:
- All three failures involved the
search_kb->create_ticketpath - The
search_kb->escalatepath has never been tested - The agent has a strong preference for
search_kbover direct responses
This structural insight is impossible to get from a flat list of test results.
How it works
The graph is built from your agent’s blueprint and test results:
Nodes
Nodes represent behavioral states in your agent’s decision tree:
| Node type | What it represents | Visual indicator |
|---|---|---|
| Tool node | A tool your agent can call | Rounded rectangle |
| Decision node | A branching point in the agent’s logic | Diamond |
| Input node | The user’s message or trigger | Circle (left edge) |
| Output node | The agent’s final response | Circle (right edge) |
| Failure node | A point where failures cluster | Red highlighted |
Edges
Edges represent transitions between states:
| Edge type | What it represents | Visual indicator |
|---|---|---|
| Transition | A path the agent takes between states | Solid line |
| Tested path | A transition that has been exercised by tests | Thick line |
| Untested path | A transition that has not been tested | Dashed line |
| Failure edge | A transition where failures occur | Red line |
Metrics on the graph
Each node and edge carries metrics:
- Hit count — How many test cases exercised this node or edge
- Failure rate — Percentage of tests that failed at this point
- Dominant failure type — The most common failure category at this node
- Coverage — Whether this node has been tested at all
Reading the graph
Identify failure clusters
Failure clusters are groups of nodes where failures concentrate. They appear as red-highlighted areas on the graph. Common patterns:
- Tool-specific cluster — Failures concentrate around one tool (e.g.,
search_kb). This usually means the tool description or parameters need improvement. - Path-specific cluster — Failures occur on a specific path (e.g.,
search_kb->create_ticket). This means the agent struggles with a particular workflow sequence. - Complexity-specific cluster — Failures appear at decision nodes that involve multi-step reasoning. This suggests the agent needs better planning or context management.
Find coverage gaps
Dashed edges indicate paths that have never been tested. These are blind spots in your testing:
- A tool that exists in the blueprint but was never called in any test
- A decision path that is theoretically possible but was never exercised
- An edge case transition that tests have not covered
Coverage gaps are not necessarily problems — they are unknowns. Generate tests with higher complexity to exercise deeper paths in the graph.
Trace decision paths
Click on any node to see the full decision path leading to it. This shows you the sequence of steps the agent took to arrive at a particular state, including:
- Which tool was called before this one
- What input triggered this path
- Whether the agent considered alternative paths
How to use the graph
Navigate the dashboard
The Agent Intelligence Graph is available in the dashboard for each agent:
- Select an agent from the sidebar
- Click Agent Graph in the navigation
- The graph loads with your most recent test run data
Interact with the graph
- Pan — Click and drag the background to move around the graph
- Zoom — Scroll to zoom in and out. Zoom into failure clusters for detail.
- Select node — Click a node to see its metrics, failure types, and connected test cases
- Filter — Use the toolbar to filter by failure category, severity, or complexity
- Highlight path — Click an edge to highlight the full path it belongs to
Use the graph to prioritize fixes
- Start with red clusters — These are your highest-impact areas. Fix failures here first.
- Check dashed edges — Generate tests targeting untested paths.
- Compare across runs — Toggle between test runs to see if failure clusters are shrinking.
Graph data sources
The graph is built from two data sources:
- Blueprint — Provides the structure: tools, workflows, constraints, and their relationships. This defines the possible nodes and edges.
- Test results — Provides the metrics: which paths were tested, where failures occurred, and at what severity. This adds color and weight to the structure.
The graph is only as complete as your blueprint. If your agent has tools or workflows not described in the blueprint, they will not appear in the graph. Keep your blueprint up to date.
FAQ
Q: Is the graph updated automatically?
A: Yes. The graph updates every time you sync new test results with invarium_sync_results. Each test run adds new coverage and failure data to the existing graph.
Q: Can I export the graph? A: Currently the graph is viewable in the dashboard only. Export functionality is on the roadmap.
Q: What if my agent has no tools? A: Agents without tools still produce a graph, but it will be simpler — primarily showing input/output nodes and decision paths based on constraint handling and response patterns.
Q: How does the graph handle multi-agent systems? A: Each agent gets its own graph. If you have agents that call other agents, each one is graphed independently based on its own blueprint.
Q: Does the graph show real-time agent behavior? A: No. The graph is built from test results synced through Invarium, not from live production traces. It shows how your agent behaved during testing.