DocumentationAgent Intelligence Graph

Agent Intelligence Graph

Understand not just IF your agent fails, but WHERE in its decision tree.

The Agent Intelligence Graph is a visual map of your agent’s behavioral patterns across test scenarios. It represents your agent’s tools, decision paths, and failure clusters as an interactive graph — giving you a structural view of how your agent works and where it breaks.

⚠️

The Agent Intelligence Graph is currently being developed. The concepts described on this page reflect the planned functionality. Dashboard visualization is coming soon.


Why a graph?

Pass/fail metrics tell you that something is wrong. The Agent Intelligence Graph tells you where in your agent’s behavior the problem lives.

Consider an agent with three tools: search_kb, create_ticket, and escalate. A traditional test report might say “3 out of 10 tests failed.” The graph shows you:

  • All three failures involved the search_kb -> create_ticket path
  • The search_kb -> escalate path has never been tested
  • The agent has a strong preference for search_kb over direct responses

This structural insight is impossible to get from a flat list of test results.


How it works

The graph is built from your agent’s blueprint and test results:

Nodes

Nodes represent behavioral states in your agent’s decision tree:

Node typeWhat it representsVisual indicator
Tool nodeA tool your agent can callRounded rectangle
Decision nodeA branching point in the agent’s logicDiamond
Input nodeThe user’s message or triggerCircle (left edge)
Output nodeThe agent’s final responseCircle (right edge)
Failure nodeA point where failures clusterRed highlighted

Edges

Edges represent transitions between states:

Edge typeWhat it representsVisual indicator
TransitionA path the agent takes between statesSolid line
Tested pathA transition that has been exercised by testsThick line
Untested pathA transition that has not been testedDashed line
Failure edgeA transition where failures occurRed line

Metrics on the graph

Each node and edge carries metrics:

  • Hit count — How many test cases exercised this node or edge
  • Failure rate — Percentage of tests that failed at this point
  • Dominant failure type — The most common failure category at this node
  • Coverage — Whether this node has been tested at all

Reading the graph

Identify failure clusters

Failure clusters are groups of nodes where failures concentrate. They appear as red-highlighted areas on the graph. Common patterns:

  • Tool-specific cluster — Failures concentrate around one tool (e.g., search_kb). This usually means the tool description or parameters need improvement.
  • Path-specific cluster — Failures occur on a specific path (e.g., search_kb -> create_ticket). This means the agent struggles with a particular workflow sequence.
  • Complexity-specific cluster — Failures appear at decision nodes that involve multi-step reasoning. This suggests the agent needs better planning or context management.

Find coverage gaps

Dashed edges indicate paths that have never been tested. These are blind spots in your testing:

  • A tool that exists in the blueprint but was never called in any test
  • A decision path that is theoretically possible but was never exercised
  • An edge case transition that tests have not covered

Coverage gaps are not necessarily problems — they are unknowns. Generate tests with higher complexity to exercise deeper paths in the graph.

Trace decision paths

Click on any node to see the full decision path leading to it. This shows you the sequence of steps the agent took to arrive at a particular state, including:

  • Which tool was called before this one
  • What input triggered this path
  • Whether the agent considered alternative paths

How to use the graph

The Agent Intelligence Graph is available in the dashboard for each agent:

  1. Select an agent from the sidebar
  2. Click Agent Graph in the navigation
  3. The graph loads with your most recent test run data

Interact with the graph

  • Pan — Click and drag the background to move around the graph
  • Zoom — Scroll to zoom in and out. Zoom into failure clusters for detail.
  • Select node — Click a node to see its metrics, failure types, and connected test cases
  • Filter — Use the toolbar to filter by failure category, severity, or complexity
  • Highlight path — Click an edge to highlight the full path it belongs to

Use the graph to prioritize fixes

  1. Start with red clusters — These are your highest-impact areas. Fix failures here first.
  2. Check dashed edges — Generate tests targeting untested paths.
  3. Compare across runs — Toggle between test runs to see if failure clusters are shrinking.

Graph data sources

The graph is built from two data sources:

  • Blueprint — Provides the structure: tools, workflows, constraints, and their relationships. This defines the possible nodes and edges.
  • Test results — Provides the metrics: which paths were tested, where failures occurred, and at what severity. This adds color and weight to the structure.
⚠️

The graph is only as complete as your blueprint. If your agent has tools or workflows not described in the blueprint, they will not appear in the graph. Keep your blueprint up to date.


FAQ

Q: Is the graph updated automatically? A: Yes. The graph updates every time you sync new test results with invarium_sync_results. Each test run adds new coverage and failure data to the existing graph.

Q: Can I export the graph? A: Currently the graph is viewable in the dashboard only. Export functionality is on the roadmap.

Q: What if my agent has no tools? A: Agents without tools still produce a graph, but it will be simpler — primarily showing input/output nodes and decision paths based on constraint handling and response patterns.

Q: How does the graph handle multi-agent systems? A: Each agent gets its own graph. If you have agents that call other agents, each one is graphed independently based on its own blueprint.

Q: Does the graph show real-time agent behavior? A: No. The graph is built from test results synced through Invarium, not from live production traces. It shows how your agent behaved during testing.

Was this page helpful?