Overview — What is Invarium?
Behavioral QA testing platform for AI agents
Invarium generates behavioral test cases for your AI agents, runs them against your agent, and produces a reliability score — so you can find failures before your users do. Think of it as pytest for agentic workflows.
Instead of writing test cases by hand, you describe your agent as a JSON blueprint and Invarium’s Scenario Generator creates targeted test cases that probe for real failure modes: hallucinations, tool misuse, safety violations, instruction drift, and more.
What Invarium does:
- Generate behavioral tests — Automatically create test scenarios targeting known failure categories
- Score reliability with BSS — Get a Behavioral Safety Score (0-100) that quantifies how safe your agent is to deploy
- Classify failures — Every failure is categorized using a structured failure taxonomy so you know exactly what went wrong
- Visualize with the Agent Intelligence Graph (coming soon) — See how your agent’s tools, workflows, and constraints connect
- Gate deployments with CI/CD (coming soon) — Set quality thresholds and block deploys that don’t meet your standards
How it works
Upload a blueprint
Describe your agent as a JSON blueprint — its tools, workflows, constraints, and expected behaviors. This tells Invarium what your agent is supposed to do.
Generate test cases
Invarium’s Scenario Generator analyzes your blueprint and creates behavioral test cases. Each test targets a specific failure type (hallucination, tool misuse, safety violation, etc.) at a specific complexity level.
Run tests against your agent
Execute the generated test cases against your agent — either manually from your IDE or automatically in CI/CD. Send each test’s user message to your agent and collect the response.
Get your BSS score
Sync results back to Invarium. You get a Behavioral Safety Score, a failure breakdown by category, and actionable insights about what to fix.
How to use Invarium
There are two ways to interact with the platform:
MCP Server
For developers. Connect from Claude Desktop, Cursor, or any MCP-compatible IDE. Upload blueprints, generate tests, and sync results — all without leaving your editor.
↗Dashboard
For teams. View test results, BSS scores, failure breakdowns, and the Agent Intelligence Graph. Manage blueprints, configure quality gates, and track reliability over time.
↗Most developers start with the MCP server for day-to-day testing, then use the dashboard to review results and share with the team.
Use cases
AI Safety teams — Validate that agents handle adversarial inputs, PII exposure, and harmful content correctly before deployment.
Agent developers — Catch regressions early by running behavioral tests during development. Know exactly which failure types your agent is vulnerable to.
QA teams — Replace manual testing with automated behavioral test generation. Get structured failure reports instead of vague bug descriptions.
Platform teams — Set up CI/CD quality gates that block deploys when the BSS score drops below a threshold. Enforce reliability standards across all agents in your organization.
Get started
Quickstart
Test your first agent in under 5 minutes. Set up the MCP server, upload a blueprint, and generate your first test cases.
↗Dashboard Guide
Learn how to navigate the Invarium dashboard, interpret your BSS score, and manage your agents.
↗