DocumentationGlossary

Glossary

A reference of terms used throughout the Invarium platform and documentation.

Key Takeaways
  • Definitions for all Invarium-specific terms, scores, and concepts
  • Terms are listed alphabetically for quick lookup
  • Each entry links to the relevant documentation page where applicable

Agent Intelligence Graph

An interactive visualization of an AI agent’s architecture, auto-discovered via runtime introspection. Node types: Tool, Chain, Guard, External Service, Policy Constraint. Edge types: CAN_INVOKE, CHAINS_TO, GUARDED_BY, READS, WRITES. Used to identify unguarded paths, untested paths, and failure clusters. The graph auto-updates after each test run with runtime-discovered tools and dead path detection. See Agent Intelligence Graph.

Agent Quality Score (AQS)

A 0-100 score that quantifies how reliably an AI agent handles behavioral edge cases. Evaluates four dimensions: pass rate, failure severity, coverage breadth, and consistency. Ranges: 90-100 Excellent, 70-89 Good, 50-69 Degraded, 0-49 Critical. See Agent Quality Score.

Agent Readiness Score (ARS)

A 0-100 score produced by the Static Reliability Audit that summarizes your agent’s architectural reliability posture. Computed before any tests are run, based on 11 audit check categories across security, reliability, system design, and tool quality. Higher scores indicate safer architecture. Unlike AQS (which measures actual test behavior), ARS measures what could go wrong based on your agent’s design. Ranges: 0-30 (needs attention), 31-60 (moderate), 61-100 (production-ready). See Agent Readiness Audit.

Behavioral Fingerprint

A unique signature derived from an agent’s behavioral patterns across test runs. Captures the agent’s characteristic tool call sequences, response patterns, and decision paths. Used to detect behavioral drift — when an agent starts behaving differently from its established patterns even if individual tests still pass.

Behavioral Trace

A timestamped record of every action an AI agent takes during a test case, captured via the unified TraceEvent protocol. Events include LLM_START/END, TOOL_START/END, RETRIEVAL, DECISION_POINT, MEMORY_READ/WRITE, and ERROR — each with trace_id, span_id, and parent_span_id for full span hierarchy. PII is redacted locally before traces leave your environment. See Behavioral Traces.

Blueprint

A YAML document that describes an AI agent to Invarium — its name, framework, tools, constraints, and workflows. Invarium uses the blueprint to generate targeted behavioral test cases. Blueprints have a maximum size of 500 KB. See Create and Upload a Blueprint.

Complexity Level

A classification applied to test scenarios that indicates difficulty: simple (single-step, clear intent), moderate (multi-step, some ambiguity), complex (multi-step with constraints and edge cases), adversarial (designed to exploit weaknesses), or edge_case (unusual or boundary conditions).

Constraint

A rule that an AI agent must follow, defined in the blueprint. Constraints describe what the agent should not do (e.g., “Never fabricate information”) or what it must always do (e.g., “Always cite sources”). Used to generate guardrail violation tests.

Coverage Analysis

A breakdown of which failure categories have been tested and which remain untested. Coverage analysis feeds into the AQS coverage breadth component. An agent tested across all nine failure categories has broader coverage and a more meaningful AQS score than one tested on only one or two categories.

Coverage Breadth

A component of AQS that measures how many of the nine failure categories have been tested. An agent tested across all categories has broader coverage and a more meaningful AQS score than one tested on only one or two categories.

Deduplication

The automatic detection and removal of duplicate or near-duplicate test scenarios. When generating tests, Invarium checks for semantic overlap with existing scenarios to avoid redundant coverage. Deduplicated scenarios are flagged but not deleted, allowing you to review and confirm the deduplication decision.

Failure Taxonomy

A structured classification system for AI agent failures. Nine categories: Knowledge, Reasoning, Context, Instruction, Tool Usage, Safety, Communication, Operational, and Coordination. Each category has subtypes and severity levels. See Failure Taxonomy.

Guardrail

A constraint or safety mechanism that prevents an AI agent from producing unwanted behavior. In Invarium, guardrails are defined as constraints in the blueprint and tested through guardrail violation scenarios.

MCP (Model Context Protocol)

An open protocol for connecting AI assistants to external tools and data sources. Invarium uses an MCP server to integrate with IDEs like Claude Desktop, Cursor, and Claude Code. See Installation and Setup.

Multi-Layer Discovery

Invarium’s agent architecture detection system with three layers: Layer 1 (Runtime Memory Introspection) — inspects live Python objects for known frameworks like LangChain with 100% accuracy. Layer 2 (LLM Payload Interceptor) — captures the tools array from outgoing LLM API calls for custom agents with 95% accuracy. Layer 3 (IDE LLM Analysis) — fallback that uses the IDE’s LLM to analyze code structure.

Path Comparison

A visualization showing the expected behavioral path versus the actual path your agent took during a test. Deviations are classified as: skipped guard, unexpected tool call, wrong sequence, or missing step. Displayed as an overlay on the Agent Intelligence Graph with expected steps in green and actual steps in red.

Persona

A simulated user archetype used during test generation to create more realistic and varied test scenarios. Available personas: novice (unfamiliar with the domain), expert (technically sophisticated), frustrated (impatient or upset), confused (unclear about what they need), and adversarial (actively trying to break the agent). Specifying a persona changes the tone, vocabulary, and complexity of generated test inputs.

PII Redaction

Automatic scrubbing of personally identifiable information from behavioral traces before they leave your local environment. Configurable regex patterns for SSNs, credit card numbers, emails, plus field-level redaction for passwords, secrets, tokens, and API keys. Applied in TraceEvent.to_json() before any data is shipped to the cloud.

Regression Alert

A notification that tests which previously passed are now failing. Invarium automatically flags regressions when comparing current test runs against previous baselines, helping you catch reliability degradation early.

Scenario

A test scenario describes a specific situation an AI agent might encounter. Each scenario includes a user message, expected tools to call, and expected behavior. Scenarios are generated by Invarium based on the agent’s blueprint or Agent Intelligence Graph.

Severity Level

A classification of the potential impact of a failure: Critical (immediate harm, 4x penalty), High (significant incorrect behavior, 3x), Medium (incorrect but limited impact, 2x), Low (cosmetic issues, 1x). Severity contributes to the AQS severity weighting component.

A read-only URL for any test run result. Share with team leads, VP Eng, or security reviewers — they see the full report without needing an Invarium account.

Static Reliability Audit

An automatic analysis of your agent’s architecture that runs the moment you register an agent. Checks 11 categories: tool definitions, tool permissions, error handling, system prompt quality, guardrail instructions, output constraints, input validation, secret exposure, timeout configuration, retry logic, and fallback behavior. Each finding includes severity and a specific recommendation. Produces an Agent Readiness Score (ARS). See Agent Readiness Audit.

Streaming Mode

A test execution mode where results are streamed incrementally rather than returned as a single batch. When using invarium_sync_results, results are processed and scored as they arrive, allowing you to see partial results before the full sync completes. Useful for large test runs where waiting for all results would introduce significant delay.

Test Case

A single unit of behavioral testing. Consists of a scenario (user message + expected behavior), the agent’s actual response, and a pass/fail determination. Multiple test cases make up a test run.

Test Run

A collection of test case results synced to Invarium at one time. Each call to invarium_sync_results creates a test run. Test runs are tracked over time and used to calculate AQS scores and populate the Agent Intelligence Graph.

Tool (Blueprint)

A function or API that an AI agent can call, as described in the blueprint. Each tool has a name, description, parameters, return value, and optional side effects. Invarium uses tool definitions to generate tool misuse and hallucination tests.

Trace Library

A collection of behavioral traces from past test runs, organized by agent and indexed by scenario type. The trace library enables pattern analysis across runs — identifying which tool call sequences correlate with failures and which decision paths are consistently reliable. Access traces from the agent detail page in the dashboard.

Workflow

A multi-step process that an AI agent performs, as described in the blueprint. Each workflow has a name, trigger condition, and ordered list of steps. Used to generate tests that verify the agent follows the correct sequence of actions.

Workspace

Your Invarium account’s organizational context. Contains your agents, scenarios, test runs, and API keys.