invarium_generate_tests
Generate a behavioral test scenario with test cases for your agent. Creates exactly one scenario containing multiple test cases, then returns a generation_id. Use invarium_get_tests to check status and retrieve results.
When to Use
Call invarium_generate_tests after uploading a blueprint with invarium_upload_blueprint. Before calling this tool, review the agent’s blueprint to understand which scenarios would be most valuable, then confirm the parameters with the user.
This tool starts asynchronous generation and returns immediately. Use the returned generation_id with invarium_get_tests to poll for results.
See Generate Test Scenarios for the full workflow.
Parameters
invarium_generate_testsGenerate a behavioral test scenario with test cases for your agent. Creates one scenario containing multiple test cases and returns a generation_id. Use invarium_get_tests to check status and retrieve results.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
agent_name | string | required | Name of the agent to generate tests for. Must have a blueprint uploaded first. |
test_description | string | required | Description of what behavior to test. Tells the scenario generator what to focus on (e.g., 'Test refund workflow error handling and edge cases'). |
test_cases | int | default: 5 | Number of test cases to include in the scenario. Minimum 1, maximum 25. |
complexity | string | default: moderate | Scenario complexity level. One of: simple, moderate, complex, adversarial, edge_case. |
failure_category | string | null | default: null | Optional failure category to target. One of: knowledge_failure, reasoning_failure, context_failure, instruction_failure, tool_usage_failure, safety_failure, communication_failure, operational_failure, coordination_failure. |
persona | string | null | default: null | Optional user persona for the test scenario. One of: novice, expert, frustrated, confused, adversarial. |
Returns
Confirmation string with scenario details and a generation_id for tracking.
Example
"Generate 10 complex test cases for customer-support-agent targeting tool usage failures with a frustrated user persona"Response
On success, the tool returns the generation parameters and a tracking ID:
Scenario generation started for 'customer-support-agent'.
Description: Test refund workflow error handling and edge cases
Complexity: complex
Test cases: 10
Failure category: tool_usage_failure
Persona: frustrated
Generation ID: gen_a1b2c3d4e5f6
Use invarium_get_tests with the generation_id above to check results.| Field | Description |
|---|---|
| Description | The test description you provided, describing what behavior to test. |
| Complexity | The complexity level used for generation. |
| Test cases | Number of test cases requested in the scenario. |
| Failure category | The targeted failure category, if specified. |
| Persona | The user persona applied, if specified. |
| Generation ID | Unique identifier to track this generation. Pass it to invarium_get_tests. |
Examples
Basic — Generate Default Tests
Generate 5 test cases at moderate complexity:
"Generate tests for customer-support-agent to test basic customer inquiry handling"Advanced — Targeted Failure Testing
Generate adversarial tests targeting tool usage failures with a frustrated user persona:
"Generate 15 adversarial test cases for order-processing-agent targeting tool usage failures with a frustrated persona, focusing on payment tool parameter validation and error recovery"Edge Case Generation
Generate edge case scenarios focused on safety:
"Generate 20 edge case tests for data-access-agent targeting safety failures with an adversarial persona, testing guardrails against prompt injection and unauthorized data access"Complexity Levels
| Level | Description | When to Use |
|---|---|---|
| simple | Straightforward scenarios with clear inputs and expected outputs. | Smoke testing, initial validation. |
| moderate | Scenarios with ambiguous inputs, multi-step workflows, or nuanced constraints. | Regular development testing. Default level. |
| complex | Multi-tool chains, conflicting constraints, and boundary conditions. | Pre-deployment validation. |
| adversarial | Adversarial inputs designed to exploit weaknesses in the agent. | Security audits, red-team testing. |
| edge_case | Unusual inputs, rare conditions, and corner cases. | Comprehensive coverage, regression testing. |
Failure Categories
Each failure category targets a specific class of agent behavior issues:
| Category | What It Tests |
|---|---|
| knowledge_failure | Hallucinations, outdated information, self-contradictions. |
| reasoning_failure | Logic errors, calculation mistakes, planning failures. |
| context_failure | Lost conversation context, positional bias, misinterpreted references. |
| instruction_failure | Constraint violations, partial execution, priority conflicts. |
| tool_usage_failure | Wrong tool selection, parameter errors, sequence violations. |
| safety_failure | Prompt injection, guardrail bypass, unauthorized actions. |
| communication_failure | Unhelpful, unclear, or inappropriate responses. |
| operational_failure | Timeouts, rate limits, non-determinism. |
| coordination_failure | Multi-agent deadlocks, lost handoffs, conflicting actions. |
Personas
| Persona | Behavior |
|---|---|
| novice | Simple language, unclear requests, may not know the right terminology. |
| expert | Technical language, expects precise answers, tests depth of knowledge. |
| frustrated | Impatient, may repeat requests, expects fast resolution. |
| confused | Contradictory inputs, changes topic mid-conversation, unclear intent. |
| adversarial | Deliberately tries to break the agent, uses social engineering and prompt injection. |
Generation typically takes 10—30 seconds. For larger test counts (30+), it may take up to a minute. The tool runs a pre-flight quota check and returns an error if you have no remaining test generations for the month. Check invarium_usage to see your remaining quota.
Error Responses
| Error | Cause | Fix |
|---|---|---|
Agent not found: 'my-agent' | No blueprint has been uploaded for this agent name. | Upload a blueprint first with invarium_upload_blueprint. |
test_description is required | The test_description parameter is missing or empty. | Provide a description of what behavior to test. |
Invalid test_cases: must be a positive integer | The test_cases value is zero or negative. | Use a value between 1 and 25. |
Invalid test_cases: maximum is 25 per scenario | The test_cases value exceeds 25. | Use a value between 1 and 25. |
Invalid complexity | The complexity parameter is not a recognized value. | Use one of: simple, moderate, complex, adversarial, edge_case. |
Invalid failure_category | The failure_category is not a recognized value. | Use one of the nine categories listed above, or omit it. |
Invalid persona | The persona is not a recognized value. | Use one of: novice, expert, frustrated, confused, adversarial. |
Not enough quota: no test generations remaining | Monthly test generation limit reached. | Upgrade your plan at app.invarium.dev/settings or wait for the monthly reset. |
See Error Codes for the full error reference.