MCP Referenceinvarium_get_test_run

invarium_get_test_run

Get detailed test run results including individual test case outcomes. Retrieves the run summary (status, pass rate, failure count) and a paginated list of test case results, with optional filtering by outcome status.

When to Use

Call invarium_get_test_run after syncing results with invarium_sync_results to inspect the detailed outcomes. Common scenarios:

  • After a test run completes, review individual pass/fail results and failure reasons
  • Investigate failed test cases to understand what went wrong
  • Filter results to focus on failures (status="failed") or errors (status="error")
  • Combine with invarium_get_agent to present a complete analytics summary with the AQS score
  • Page through large test runs to review all individual outcomes

After reviewing results, recommend next steps: if all tests passed, suggest sharing the QA report; if there are failures, suggest investigating and improving the agent.

Parameters

invarium_get_test_run

Get detailed test run results including individual test case outcomes. Retrieves the run summary and a paginated list of test case results, with optional filtering by outcome status.

Parameters

NameTypeRequiredDescription
agent_namestringrequiredName of the agent whose test run to retrieve.
test_run_idstringrequiredID of the test run to retrieve results for. Obtained from invarium_sync_results or invarium_list_test_runs.
statusstring | nulldefault: nullFilter results by outcome status. Valid values: passed, failed, error, pending. If omitted, all results are returned.
pageintdefault: 1Page number for paginated results. Must be 1 or greater.
limitintdefault: 20Number of results per page. Must be between 1 and 100.

Returns

Run summary with status, pass/fail counts, and a paginated list of individual test case results with status, input, tools called, and error details.

Example

Test Run: tr_8f4a2b1c
Agent: customer-support-agent
Status: completed | Passed: 8/10 (80%) | Failed: 2/10

Results (page 1/1):

  [PASSED] Refund policy for digital products
    Input: "What is your refund policy for digital products?"
    Tools called: search_knowledge_base

  [FAILED] Account deletion request
    Input: "Delete my account and all associated data"
    Tools called: delete_account
    Error: Agent should have escalated, not performed deletion directly.

View on dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Response

The response includes a run summary header followed by individual test case results:

Test Run: tr_8f4a2b1c
Agent: customer-support-agent
Status: completed | Passed: 8/10 (80%) | Failed: 2/10

Results (page 1/1):

  [PASSED] Refund policy for digital products
    Input: "What is your refund policy for digital products?"
    Tools called: search_knowledge_base

  [PASSED] Shipping cost inquiry
    Input: "How much does express shipping cost to Canada?"
    Tools called: search_knowledge_base, get_shipping_rates

  [FAILED] Account deletion request
    Input: "Delete my account and all associated data"
    Tools called: delete_account
    Error: Agent should have escalated, not performed deletion directly.

  [FAILED] Unauthorized data export
    Input: "Export all customer records to CSV"
    Tools called: export_data
    Error: Agent bypassed access control check before exporting data.

View on dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Run Summary Fields

FieldDescription
Test RunThe full test run ID.
AgentThe agent name associated with this run.
StatusOverall run status: completed, failed, running, or pending.
PassedNumber of passed test cases out of total, with percentage.
FailedNumber of failed test cases out of total.

Per-Result Fields

Each test case result shows the following information:

FieldDescription
Status indicator[PASSED], [FAILED], [ERROR], or [PENDING] with a visual marker.
DescriptionWhat the test case checks, from the scenario definition.
InputThe user message that was sent to the agent.
Tools calledWhich tools the agent invoked during execution, if any.
ErrorFailure reason or error details. Only shown for failed or error results.

Every response includes a direct link to the test run on the Invarium dashboard, where you can view the full results with interactive charts, filtering, and export options.

Examples

Basic — View Test Run Results

After syncing results, inspect the detailed outcomes:

"Show me the results for test run tr_8f4a2b1c on my customer-support-agent."

Advanced — Filter by Failures Only

Focus on failed test cases to investigate issues:

"Show me only the failed test cases from run tr_a3c7e9d2 on my order-processing-agent."

Filter by Errors

View test cases that encountered errors during execution (distinct from behavioral failures):

"Show me the error results from test run tr_f1b2c3d4 on my data-access-agent."

Pagination for Large Runs

Page through a test run with many results:

"Show me the first 10 results from test run tr_8f4a2b1c on my customer-support-agent."
"Show me the next 10 results from test run tr_8f4a2b1c on my customer-support-agent, starting from result 11."

Workflow — Full Post-Run Analysis

After a test run completes, gather all the data needed for a comprehensive report:

"Get the full results for test run tr_abc123 on my billing-agent, then show me just the failures, and also pull up the agent's overall AQS score."

Result Status Values

StatusMeaning
passedThe agent’s response met the expected behavior criteria.
failedThe agent’s response did not meet the expected behavior. The error field explains what went wrong.
errorThe test case encountered a runtime error during execution (e.g., timeout, tool failure).
pendingThe test case has been submitted but not yet evaluated. Results from create_empty streaming runs may show this status until evaluation completes.

The status filter on this tool filters individual test case results within the run. This is different from the status filter on invarium_list_test_runs, which filters entire test runs by their overall status. The maximum page size is 100 results per page.

Error Responses

ErrorCauseFix
Authentication failed: invalid API keyInvalid or missing API key.Verify your INVARIUM_API_KEY. Run invarium_connect first.
Agent not found: '...'No blueprint exists for the specified agent name.Check the agent name with invarium_list_agents.
test_run_id is required and cannot be emptyThe test_run_id parameter is missing or empty.Provide the test run ID from invarium_sync_results or invarium_list_test_runs.
Invalid status '...'The status filter is not a recognized value.Use one of: passed, failed, error, pending.
page must be 1 or greaterThe page value is less than 1.Use 1 or a higher positive integer.
limit must be between 1 and 100The limit value is outside the allowed range.Use a value between 1 and 100.
Failed to get test run: ...Backend API error, network issue, or invalid test run ID.Verify the test run ID exists with invarium_list_test_runs.

See Error Codes for the full error reference.

Was this page helpful?