Glossary
A reference of terms used throughout the Invarium platform and documentation.
Agent Intelligence Graph
A visual map of an AI agent’s behavioral patterns across test scenarios. Nodes represent behavioral states (tools, decisions, inputs, outputs), and edges represent transitions between states. Used to identify failure clusters, untested paths, and decision hotspots. See Agent Intelligence Graph.
Behavioral Safety Score (BSS)
A 0-100 score that quantifies how reliably an AI agent handles behavioral edge cases. Composed of four weighted components: pass rate (40%), severity weighting (25%), coverage breadth (20%), and consistency (15%). Ranges: 90-100 Excellent, 70-89 Good, 50-69 Fair, 0-49 Poor. See Behavioral Safety Score.
Behavioral Trace
A timestamped record of every action an AI agent takes during a test case. Captures user input, tool calls (with parameters and results), agent reasoning, and final output. Used for debugging and audit trails. See Behavioral Tracing.
Blueprint
A JSON document that describes an AI agent to Invarium — its name, framework, tools, constraints, and workflows. Invarium uses the blueprint to generate targeted behavioral test cases. See Upload a Blueprint and Blueprint Schema.
Constraint
A rule that an AI agent must follow, defined in the blueprint. Constraints describe what the agent should not do (e.g., “Never fabricate information”) or what it must always do (e.g., “Always cite sources”). Used to generate guardrail violation tests.
Coverage Breadth
A component of BSS (20% weight) that measures how many of the nine failure categories have been tested. An agent tested across all categories has broader coverage and a more meaningful BSS score than one tested on only one or two categories.
Failure Taxonomy
A structured classification system for AI agent failures. Nine categories: Hallucination, Wrong Tool Called, Missing Tool Call, Incorrect Parameters, Unexpected Tool Call, Tool Execution Error, Constraint Violation, Timeout, and Invalid Response. Each category has subtypes and severity levels. See Failure Taxonomy.
Guardrail
A constraint or safety mechanism that prevents an AI agent from producing unwanted behavior. In Invarium, guardrails are defined as constraints in the blueprint and tested through guardrail violation scenarios.
MCP (Model Context Protocol)
An open protocol for connecting AI assistants to external tools and data sources. Invarium uses an MCP server to integrate with IDEs like Claude Desktop, Cursor, and Claude Code. See MCP Reference.
Quality Gate
A set of pass/fail rules applied to test runs to determine whether an agent meets reliability standards for deployment. Rules can check BSS score, failure counts, pass rates, and specific failure categories. Used in CI/CD pipelines to block deploys. See Quality Gates.
Scenario
A test scenario describes a specific situation an AI agent might encounter. Each scenario includes a user message, expected tools to call, and expected behavior. Scenarios are generated by Invarium based on the agent’s blueprint.
Severity Level
A classification of the potential impact of a failure: Critical (immediate harm, 4x penalty), High (significant incorrect behavior, 3x), Medium (incorrect but limited impact, 2x), Low (cosmetic issues, 1x). Severity contributes to the BSS severity weighting component.
Test Case
A single unit of behavioral testing. Consists of a scenario (user message + expected behavior), the agent’s actual response, and a pass/fail determination. Multiple test cases make up a test run.
Test Run
A collection of test case results synced to Invarium at one time. Each call to invarium_sync_results creates a test run. Test runs are tracked over time and used to calculate BSS scores, populate the Agent Intelligence Graph, and evaluate quality gates.
Tool (Blueprint)
A function or API that an AI agent can call, as described in the blueprint. Each tool has a name, description, parameters, return value, and optional side effects. Invarium uses tool definitions to generate tool misuse and hallucination tests.
Workflow
A multi-step process that an AI agent performs, as described in the blueprint. Each workflow has a name, trigger condition, and ordered list of steps. Used to generate tests that verify the agent follows the correct sequence of actions.
Workspace
The organizational context for your Invarium account. Contains your agents, test runs, and API keys. Team features and workspace management are coming soon.