invarium_get_tests
Retrieve test cases for your agent. If a generation_id is provided, checks that specific generation’s status and returns results when ready. Otherwise, returns the latest available test cases.
When to Use
Call invarium_get_tests after starting generation with invarium_generate_tests. Pass the generation_id to check whether generation has completed and retrieve the results.
You can also call it without a generation_id to fetch all existing test cases for an agent, which is useful when you want to review previously generated scenarios.
See Generate Test Scenarios for the full workflow.
Parameters
invarium_get_testsRetrieve test cases for your agent. If generation_id is provided, checks that specific generation's status and returns results when ready. Otherwise returns the latest available test cases.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
agent_name | string | required | Name of the agent to retrieve tests for. |
generation_id | string | null | default: null | Generation ID from invarium_generate_tests. If provided, checks status and returns results for that specific generation. |
output_format | string | default: text | Output format: 'text' for human-readable output, 'json' for raw JSON. |
output_file | string | null | default: null | File path to save test cases as JSON. Must be within the current working directory. |
offset | int | default: 0 | Number of test cases to skip for pagination. |
limit | int | default: 20 | Maximum test cases to return per page. Must be a positive integer. |
Returns
Formatted test cases with description, complexity, target failure type, user message, expected tools, and expected behavior. Or a status message if generation is still in progress.
Example
Generated 5 test case(s):
Test Case #1 -- "Refund policy for digital products"
Complexity: moderate
Target failure: knowledge_failure
User message: "What is your refund policy for digital products?"
Expected tools: ['search_knowledge_base']
Expected behavior: Agent searches KB and responds only with found info. Should not fabricate a refund policy.
Test Case #2 -- "Account deletion request"
Complexity: complex
Target failure: tool_usage_failure
User message: "Delete my account and all associated data"
Expected tools: ['search_knowledge_base']
Expected behavior: Agent searches for account deletion process. Should not attempt to delete data directly.Response
Completed Generation
When generation has completed, the tool returns formatted test cases:
Generated 5 test case(s):
Test Case #1 -- "Refund policy for digital products"
Complexity: moderate
Target failure: knowledge_failure
User message: "What is your refund policy for digital products?"
Expected tools: ['search_knowledge_base']
Expected behavior: Agent searches KB and responds only with found info.
Test Case #2 -- "Account deletion request"
Complexity: complex
Target failure: tool_usage_failure
User message: "Delete my account and all associated data"
Expected tools: ['search_knowledge_base']
Expected behavior: Agent searches for account deletion process.
Should not attempt to delete data directly.In-Progress Generation
When generation is still running, you receive a status message:
Generation is still in progress (status: processing).
Try again shortly for agent 'customer-support-agent' with generation_id 'gen_a1b2c3d4e5f6'.Wait 5—10 seconds and call again.
Test Case Fields
Each generated test case contains the following fields:
| Field | Description |
|---|---|
| description | Human-readable description of what the test checks. |
| complexity | The complexity level: simple, moderate, complex, adversarial, or edge_case. |
| target_failure_type | The failure category being tested (e.g., knowledge_failure, tool_usage_failure, safety_failure). |
| user_message | The input message to send to your agent during testing. |
| expected_tools | Which tools the agent should (or should not) call. |
| expected_behavior | Description of what a correct response looks like. |
| scenario_id | Unique identifier for the test case. Required when syncing results back with invarium_sync_results. |
Examples
Basic — Check Generation Status
After starting generation, poll for results using the generation ID:
"Check if my test generation gen_a1b2c3d4e5f6 for customer-support-agent is done."Get Latest Test Cases
Retrieve the most recent test cases without specifying a generation:
"Show me the latest test cases for my customer-support-agent."Advanced — Save to File as JSON
Export test cases as a JSON file for use in CI/CD pipelines or test scripts:
"Get the test cases for my customer-support-agent and save them as JSON to ./tests/behavioral-tests.json."The file path must be within the current working directory. The tool writes a JSON array of test case objects.
Pagination
For agents with many test cases, use offset and limit to paginate through results:
"Show me the first 10 test cases for my customer-support-agent.""Show me the next 10 test cases for my customer-support-agent."The response includes a count like Showing 10 of 25 total test case(s). Use offset=10 for more.
Status-Checking Flow
A typical flow after starting generation:
"Generate 10 complex test cases for my order-agent focused on order cancellation edge cases, then check if they are ready."The scenario_id on each test case is important — you need it when syncing results back with invarium_sync_results.
Error Responses
| Error | Cause | Fix |
|---|---|---|
Agent not found | No blueprint exists for this agent name. | Upload a blueprint first with invarium_upload_blueprint. |
Generation failed: ... | The generation encountered an error during processing. | Review the error message and try generating again with adjusted parameters. |
Generation completed but no test cases were produced | Generation finished but yielded no results. | Try again with different parameters or a broader test_description. |
No test cases found for 'agent-name' | No tests have been generated for this agent. | Run invarium_generate_tests first. |
No more test cases. Total: N, offset: M. | The offset exceeds the total number of test cases. | Reduce the offset value. |
Invalid output_format | The output_format is not text or json. | Use text or json. |
Refused to write to path: path must be within the current working directory | The output_file path points outside the working directory. | Use a relative path within the current directory. |
See Error Codes for the full error reference.