DocumentationOverview

Overview — What is Invarium?

Behavioral QA testing platform for AI agents

Invarium generates behavioral test cases for your AI agents, runs them against your agent, and produces a reliability score — so you can find failures before your users do. Think of it as pytest for agentic workflows.

Instead of writing test cases by hand, you describe your agent as a JSON blueprint and Invarium’s Scenario Generator creates targeted test cases that probe for real failure modes: hallucinations, tool misuse, safety violations, instruction drift, and more.

What Invarium does:

  • Generate behavioral tests — Automatically create test scenarios targeting known failure categories
  • Score reliability with BSS — Get a Behavioral Safety Score (0-100) that quantifies how safe your agent is to deploy
  • Classify failures — Every failure is categorized using a structured failure taxonomy so you know exactly what went wrong
  • Visualize with the Agent Intelligence Graph (coming soon) — See how your agent’s tools, workflows, and constraints connect
  • Gate deployments with CI/CD (coming soon) — Set quality thresholds and block deploys that don’t meet your standards

How it works

Upload Blueprint
Generate Tests
Run Tests
Get BSS Score
1

Upload a blueprint

Describe your agent as a JSON blueprint — its tools, workflows, constraints, and expected behaviors. This tells Invarium what your agent is supposed to do.

2

Generate test cases

Invarium’s Scenario Generator analyzes your blueprint and creates behavioral test cases. Each test targets a specific failure type (hallucination, tool misuse, safety violation, etc.) at a specific complexity level.

3

Run tests against your agent

Execute the generated test cases against your agent — either manually from your IDE or automatically in CI/CD. Send each test’s user message to your agent and collect the response.

4

Get your BSS score

Sync results back to Invarium. You get a Behavioral Safety Score, a failure breakdown by category, and actionable insights about what to fix.


How to use Invarium

There are two ways to interact with the platform:

Most developers start with the MCP server for day-to-day testing, then use the dashboard to review results and share with the team.


Use cases

AI Safety teams — Validate that agents handle adversarial inputs, PII exposure, and harmful content correctly before deployment.

Agent developers — Catch regressions early by running behavioral tests during development. Know exactly which failure types your agent is vulnerable to.

QA teams — Replace manual testing with automated behavioral test generation. Get structured failure reports instead of vague bug descriptions.

Platform teams — Set up CI/CD quality gates that block deploys when the BSS score drops below a threshold. Enforce reliability standards across all agents in your organization.


Get started

Was this page helpful?