MCP Referenceinvarium_generate_tests

invarium_generate_tests

Generate a behavioral test scenario with test cases for your agent. Creates exactly one scenario containing multiple test cases, then returns a generation_id. Use invarium_get_tests to check status and retrieve results.

When to Use

Call invarium_generate_tests after uploading a blueprint with invarium_upload_blueprint. Before calling this tool, review the agent’s blueprint to understand which scenarios would be most valuable, then confirm the parameters with the user.

This tool starts asynchronous generation and returns immediately. Use the returned generation_id with invarium_get_tests to poll for results.

See Generate Test Scenarios for the full workflow.

Parameters

invarium_generate_tests

Generate a behavioral test scenario with test cases for your agent. Creates one scenario containing multiple test cases and returns a generation_id. Use invarium_get_tests to check status and retrieve results.

Parameters

NameTypeRequiredDescription
agent_namestringrequiredName of the agent to generate tests for. Must have a blueprint uploaded first.
test_descriptionstringrequiredDescription of what behavior to test. Tells the scenario generator what to focus on (e.g., 'Test refund workflow error handling and edge cases').
test_casesintdefault: 5Number of test cases to include in the scenario. Minimum 1, maximum 25.
complexitystringdefault: moderateScenario complexity level. One of: simple, moderate, complex, adversarial, edge_case.
failure_categorystring | nulldefault: nullOptional failure category to target. One of: knowledge_failure, reasoning_failure, context_failure, instruction_failure, tool_usage_failure, safety_failure, communication_failure, operational_failure, coordination_failure.
personastring | nulldefault: nullOptional user persona for the test scenario. One of: novice, expert, frustrated, confused, adversarial.

Returns

Confirmation string with scenario details and a generation_id for tracking.

Example

"Generate 10 complex test cases for customer-support-agent targeting tool usage failures with a frustrated user persona"

Response

On success, the tool returns the generation parameters and a tracking ID:

Scenario generation started for 'customer-support-agent'.
  Description: Test refund workflow error handling and edge cases
  Complexity: complex
  Test cases: 10
  Failure category: tool_usage_failure
  Persona: frustrated
Generation ID: gen_a1b2c3d4e5f6

Use invarium_get_tests with the generation_id above to check results.
FieldDescription
DescriptionThe test description you provided, describing what behavior to test.
ComplexityThe complexity level used for generation.
Test casesNumber of test cases requested in the scenario.
Failure categoryThe targeted failure category, if specified.
PersonaThe user persona applied, if specified.
Generation IDUnique identifier to track this generation. Pass it to invarium_get_tests.

Examples

Basic — Generate Default Tests

Generate 5 test cases at moderate complexity:

"Generate tests for customer-support-agent to test basic customer inquiry handling"

Advanced — Targeted Failure Testing

Generate adversarial tests targeting tool usage failures with a frustrated user persona:

"Generate 15 adversarial test cases for order-processing-agent targeting tool usage failures with a frustrated persona, focusing on payment tool parameter validation and error recovery"

Edge Case Generation

Generate edge case scenarios focused on safety:

"Generate 20 edge case tests for data-access-agent targeting safety failures with an adversarial persona, testing guardrails against prompt injection and unauthorized data access"

Complexity Levels

LevelDescriptionWhen to Use
simpleStraightforward scenarios with clear inputs and expected outputs.Smoke testing, initial validation.
moderateScenarios with ambiguous inputs, multi-step workflows, or nuanced constraints.Regular development testing. Default level.
complexMulti-tool chains, conflicting constraints, and boundary conditions.Pre-deployment validation.
adversarialAdversarial inputs designed to exploit weaknesses in the agent.Security audits, red-team testing.
edge_caseUnusual inputs, rare conditions, and corner cases.Comprehensive coverage, regression testing.

Failure Categories

Each failure category targets a specific class of agent behavior issues:

CategoryWhat It Tests
knowledge_failureHallucinations, outdated information, self-contradictions.
reasoning_failureLogic errors, calculation mistakes, planning failures.
context_failureLost conversation context, positional bias, misinterpreted references.
instruction_failureConstraint violations, partial execution, priority conflicts.
tool_usage_failureWrong tool selection, parameter errors, sequence violations.
safety_failurePrompt injection, guardrail bypass, unauthorized actions.
communication_failureUnhelpful, unclear, or inappropriate responses.
operational_failureTimeouts, rate limits, non-determinism.
coordination_failureMulti-agent deadlocks, lost handoffs, conflicting actions.

Personas

PersonaBehavior
noviceSimple language, unclear requests, may not know the right terminology.
expertTechnical language, expects precise answers, tests depth of knowledge.
frustratedImpatient, may repeat requests, expects fast resolution.
confusedContradictory inputs, changes topic mid-conversation, unclear intent.
adversarialDeliberately tries to break the agent, uses social engineering and prompt injection.

Generation typically takes 10—30 seconds. For larger test counts (30+), it may take up to a minute. The tool runs a pre-flight quota check and returns an error if you have no remaining test generations for the month. Check invarium_usage to see your remaining quota.

Error Responses

ErrorCauseFix
Agent not found: 'my-agent'No blueprint has been uploaded for this agent name.Upload a blueprint first with invarium_upload_blueprint.
test_description is requiredThe test_description parameter is missing or empty.Provide a description of what behavior to test.
Invalid test_cases: must be a positive integerThe test_cases value is zero or negative.Use a value between 1 and 25.
Invalid test_cases: maximum is 25 per scenarioThe test_cases value exceeds 25.Use a value between 1 and 25.
Invalid complexityThe complexity parameter is not a recognized value.Use one of: simple, moderate, complex, adversarial, edge_case.
Invalid failure_categoryThe failure_category is not a recognized value.Use one of the nine categories listed above, or omit it.
Invalid personaThe persona is not a recognized value.Use one of: novice, expert, frustrated, confused, adversarial.
Not enough quota: no test generations remainingMonthly test generation limit reached.Upgrade your plan at app.invarium.dev/settings or wait for the monthly reset.

See Error Codes for the full error reference.

Was this page helpful?