`invarium_generate_tests`

Generate a behavioral test scenario with test cases for your agent. Creates exactly one scenario containing multiple test cases, then returns a generation_id. Use invarium_get_tests to check status and retrieve results.

When to Use

Call invarium_generate_tests after uploading a blueprint with invarium_upload_blueprint. Before calling this tool, review the agent’s blueprint to understand which scenarios would be most valuable, then confirm the parameters with the user.

This tool starts asynchronous generation and returns immediately. Use the returned generation_id with invarium_get_tests to poll for results.

See Generate Test Scenarios for the full workflow.

Parameters

invarium_generate_tests

Generate a behavioral test scenario with test cases for your agent. Creates one scenario containing multiple test cases and returns a generation_id. Use invarium_get_tests to check status and retrieve results.

Parameters

Name	Type	Required	Description
`agent_name`	string	required	Name of the agent to generate tests for. Must have a blueprint uploaded first.
`test_description`	string	required	Description of what behavior to test. Tells the scenario generator what to focus on (e.g., 'Test refund workflow error handling and edge cases').
`test_cases`	int	default: 5	Number of test cases to include in the scenario. Minimum 1, maximum 25.
`complexity`	string	default: moderate	Scenario complexity level. One of: simple, moderate, complex, adversarial, edge_case.
`failure_category`	string \| null	default: null	Optional failure category to target. One of: knowledge_failure, reasoning_failure, context_failure, instruction_failure, tool_usage_failure, safety_failure, communication_failure, operational_failure, coordination_failure.
`persona`	string \| null	default: null	Optional user persona for the test scenario. One of: novice, expert, frustrated, confused, adversarial.

Returns

Confirmation string with scenario details and a generation_id for tracking.

Example

"Generate 10 complex test cases for customer-support-agent targeting tool usage failures with a frustrated user persona"

Response

On success, the tool returns the generation parameters and a tracking ID:

Scenario generation started for 'customer-support-agent'.
  Description: Test refund workflow error handling and edge cases
  Complexity: complex
  Test cases: 10
  Failure category: tool_usage_failure
  Persona: frustrated
Generation ID: gen_a1b2c3d4e5f6

Use invarium_get_tests with the generation_id above to check results.

Field	Description
Description	The test description you provided, describing what behavior to test.
Complexity	The complexity level used for generation.
Test cases	Number of test cases requested in the scenario.
Failure category	The targeted failure category, if specified.
Persona	The user persona applied, if specified.
Generation ID	Unique identifier to track this generation. Pass it to `invarium_get_tests`.

Examples

Basic — Generate Default Tests

Generate 5 test cases at moderate complexity:

"Generate tests for customer-support-agent to test basic customer inquiry handling"

Advanced — Targeted Failure Testing

Generate adversarial tests targeting tool usage failures with a frustrated user persona:

"Generate 15 adversarial test cases for order-processing-agent targeting tool usage failures with a frustrated persona, focusing on payment tool parameter validation and error recovery"

Edge Case Generation

Generate edge case scenarios focused on safety:

"Generate 20 edge case tests for data-access-agent targeting safety failures with an adversarial persona, testing guardrails against prompt injection and unauthorized data access"

Complexity Levels

Level	Description	When to Use
simple	Straightforward scenarios with clear inputs and expected outputs.	Smoke testing, initial validation.
moderate	Scenarios with ambiguous inputs, multi-step workflows, or nuanced constraints.	Regular development testing. Default level.
complex	Multi-tool chains, conflicting constraints, and boundary conditions.	Pre-deployment validation.
adversarial	Adversarial inputs designed to exploit weaknesses in the agent.	Security audits, red-team testing.
edge_case	Unusual inputs, rare conditions, and corner cases.	Comprehensive coverage, regression testing.

Failure Categories

Each failure category targets a specific class of agent behavior issues:

Category	What It Tests
knowledge_failure	Hallucinations, outdated information, self-contradictions.
reasoning_failure	Logic errors, calculation mistakes, planning failures.
context_failure	Lost conversation context, positional bias, misinterpreted references.
instruction_failure	Constraint violations, partial execution, priority conflicts.
tool_usage_failure	Wrong tool selection, parameter errors, sequence violations.
safety_failure	Prompt injection, guardrail bypass, unauthorized actions.
communication_failure	Unhelpful, unclear, or inappropriate responses.
operational_failure	Timeouts, rate limits, non-determinism.
coordination_failure	Multi-agent deadlocks, lost handoffs, conflicting actions.

Personas

Persona	Behavior
novice	Simple language, unclear requests, may not know the right terminology.
expert	Technical language, expects precise answers, tests depth of knowledge.
frustrated	Impatient, may repeat requests, expects fast resolution.
confused	Contradictory inputs, changes topic mid-conversation, unclear intent.
adversarial	Deliberately tries to break the agent, uses social engineering and prompt injection.

Generation typically takes 10—30 seconds. For larger test counts (30+), it may take up to a minute. The tool runs a pre-flight quota check and returns an error if you have no remaining test generations for the month. Check invarium_usage to see your remaining quota.

Error Responses

Error	Cause	Fix
`Agent not found: 'my-agent'`	No blueprint has been uploaded for this agent name.	Upload a blueprint first with `invarium_upload_blueprint`.
`test_description is required`	The `test_description` parameter is missing or empty.	Provide a description of what behavior to test.
`Invalid test_cases: must be a positive integer`	The `test_cases` value is zero or negative.	Use a value between 1 and 25.
`Invalid test_cases: maximum is 25 per scenario`	The `test_cases` value exceeds 25.	Use a value between 1 and 25.
`Invalid complexity`	The `complexity` parameter is not a recognized value.	Use one of: `simple`, `moderate`, `complex`, `adversarial`, `edge_case`.
`Invalid failure_category`	The `failure_category` is not a recognized value.	Use one of the nine categories listed above, or omit it.
`Invalid persona`	The `persona` is not a recognized value.	Use one of: `novice`, `expert`, `frustrated`, `confused`, `adversarial`.
`Not enough quota: no test generations remaining`	Monthly test generation limit reached.	Upgrade your plan at app.invarium.dev/settings or wait for the monthly reset.

See Error Codes for the full error reference.

Was this page helpful?

invarium_setup_tracing invarium_get_tests