Roadmap
Track what we’ve shipped and what’s coming next.
- ✓Invarium ships weekly with continuous improvements to the behavioral QA platform
- ✓Upcoming features focus on multi-turn testing, user simulation, and fault injection
- ✓All shipped features are available now at app.invarium.dev
Planned Features upcoming
Features currently in development or on the near-term roadmap.
-
Multi-Turn Testing — Test agents across full multi-turn conversations with 5+ turns, validating behavioral consistency across an entire session rather than single exchanges.
-
User Simulator — LLM-driven engine that generates the user side of multi-turn conversations, producing realistic follow-ups, clarifications, and edge-case inputs automatically.
-
Fault Injection Engine — Inject controlled failures into agent execution: tool mocking, latency injection, error simulation, and degraded-response scenarios to test agent resilience.
-
Architecture Validation — Automated best-practice checks and anti-pattern detection applied to the Agent Intelligence Graph, surfacing structural reliability risks before tests even run.
-
Shareable Agent Health Report — Generate a public HTML link for any agent’s health report, including AQS score, failure breakdown, and audit findings. Share with stakeholders without requiring an Invarium login.
-
Blueprint Version History — Track and compare changes to agent blueprints over time, with diffing and rollback support.
Released Features shipped
March 2026
Behavioral Tracing — Captures every action your agent takes during a test, including tool calls, timing, and decision paths. Learn more
Agent Intelligence Graph — Interactive visualization of your agent’s architecture, mapping tools, chains, guards, and services with relationship-aware audit checks. Learn more
Automatic Agent Discovery — Invarium extracts your agent’s architecture automatically for popular frameworks. No manual configuration needed. Learn more
Agent Readiness Audit — Static analysis of your agent’s blueprint that catches structural reliability issues before any tests run. Learn more
Agent Quality Score (AQS) — A single 0-100 reliability metric for your agent, computed from test results. Learn more
Targeted Test Generation — Test scenarios target specific failure patterns from a structured taxonomy, producing more meaningful coverage than generic edge cases. Learn more
MCP Server Integration — Test agents directly from Cursor, Claude Code, or Windsurf without leaving your IDE. Learn more
Web Dashboard — View test runs, AQS scores, failure breakdowns, and the Agent Intelligence Graph in a single interface.
Multi-Step Wizards — Guided agent registration and scenario creation for non-developer users. Power users use JSON import or MCP tools. Learn more
Google and GitHub OAuth — One-click sign-in alongside email/password authentication.
Persona-Based Testing — Built-in user personas (novice, expert, frustrated, confused, adversarial) with configurable behavioral parameters that shape how test scenarios interact with your agent. Learn more
Multi-Framework Support — Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and custom agents. Learn more