RoadmapRoadmap

Roadmap

Track what we’ve shipped and what’s coming next.

Key Takeaways
  • Invarium ships weekly with continuous improvements to the behavioral QA platform
  • Upcoming features focus on multi-turn testing, user simulation, and fault injection
  • All shipped features are available now at app.invarium.dev

Planned Features upcoming

Features currently in development or on the near-term roadmap.

  • Multi-Turn Testing — Test agents across full multi-turn conversations with 5+ turns, validating behavioral consistency across an entire session rather than single exchanges.

  • User Simulator — LLM-driven engine that generates the user side of multi-turn conversations, producing realistic follow-ups, clarifications, and edge-case inputs automatically.

  • Fault Injection Engine — Inject controlled failures into agent execution: tool mocking, latency injection, error simulation, and degraded-response scenarios to test agent resilience.

  • Architecture Validation — Automated best-practice checks and anti-pattern detection applied to the Agent Intelligence Graph, surfacing structural reliability risks before tests even run.

  • Shareable Agent Health Report — Generate a public HTML link for any agent’s health report, including AQS score, failure breakdown, and audit findings. Share with stakeholders without requiring an Invarium login.

  • Blueprint Version History — Track and compare changes to agent blueprints over time, with diffing and rollback support.


Released Features shipped

March 2026

Behavioral Tracing — Captures every action your agent takes during a test, including tool calls, timing, and decision paths. Learn more

Agent Intelligence Graph — Interactive visualization of your agent’s architecture, mapping tools, chains, guards, and services with relationship-aware audit checks. Learn more

Automatic Agent Discovery — Invarium extracts your agent’s architecture automatically for popular frameworks. No manual configuration needed. Learn more

Agent Readiness Audit — Static analysis of your agent’s blueprint that catches structural reliability issues before any tests run. Learn more

Agent Quality Score (AQS) — A single 0-100 reliability metric for your agent, computed from test results. Learn more

Targeted Test Generation — Test scenarios target specific failure patterns from a structured taxonomy, producing more meaningful coverage than generic edge cases. Learn more

MCP Server Integration — Test agents directly from Cursor, Claude Code, or Windsurf without leaving your IDE. Learn more

Web Dashboard — View test runs, AQS scores, failure breakdowns, and the Agent Intelligence Graph in a single interface.

Multi-Step Wizards — Guided agent registration and scenario creation for non-developer users. Power users use JSON import or MCP tools. Learn more

Google and GitHub OAuth — One-click sign-in alongside email/password authentication.

Persona-Based Testing — Built-in user personas (novice, expert, frustrated, confused, adversarial) with configurable behavioral parameters that shape how test scenarios interact with your agent. Learn more

Multi-Framework Support — Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and custom agents. Learn more