invarium_sync_results
Sync test results from your IDE to the Invarium dashboard. After running test cases locally, use this tool to upload the results. Creates a new test run or appends to an existing one.
When to Use
Call invarium_sync_results after running test cases against your agent. This is typically called automatically during test execution. Common usage patterns:
- Batch mode — collect all results and sync them in a single call to create a new test run
- Incremental mode — sync results in batches by passing the
test_run_idfrom the first sync to append results to the same run - Streaming mode — call with
create_empty=truefirst to get a test run ID and dashboard link, then stream results as tests complete
After syncing, use invarium_get_test_run to fetch detailed results and invarium_get_agent for the updated AQS score.
See Run Tests and Sync Results for the full workflow.
Parameters
invarium_sync_resultsSync test results from your IDE to the Invarium dashboard. Results are automatically evaluated against expected behavior. Creates a new test run or appends to an existing one.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
agent_name | string | required | Name of the agent whose tests were run. |
results | string | required | JSON array of test result objects. Each must contain: scenario_id, user_message, agent_response. See Result Object Schema below for the full structure. |
test_run_id | string | null | default: null | Existing test run ID for incremental sync. If provided, results are appended to that run. If omitted, a new test run is created. |
name | string | null | default: null | Human-readable name for the test run (e.g., 'Smoke Test -- Mixed Complexity'). Helps identify runs on the dashboard. |
source | string | default: mcp | Label for where tests were executed. Valid values: mcp, ci, vscode, cli, api. |
create_empty | boolean | default: false | If true, create an empty test run with no results. Returns a test_run_id and dashboard URL for streaming mode. The results parameter is ignored. |
expected_total | int | null | default: null | Expected number of test results for streaming mode. Tells the backend how many results to expect before marking the run as completed. Use with create_empty. |
Returns
Summary string with test run ID, agent name, result count, and dashboard link.
Example
"Sync the test results for customer-support-agent to the dashboard"Response
Batch Sync Response
After syncing results, the tool returns a confirmation with the dashboard link:
Results synced to Invarium
Test run: tr_8f4a2b1c
Agent: customer-support-agent
Results: 10 test case(s) synced
View on dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1cEmpty Test Run Response (Streaming Mode)
When create_empty is true, the response provides a test run ID for streaming:
Empty test run created.
Test Run ID: tr_8f4a2b1c
Dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c
Results will stream in as test cases complete.Examples
Basic — Sync a Batch of Results
Sync all test results at once to create a new test run:
"Sync the test results for customer-support-agent to the dashboard"Advanced — Incremental Sync
Run tests in batches and append results to the same test run:
# First batch -- creates a new test run
"Sync the first batch of test results for order-agent as 'Regression Suite -- v2.1'"
# Returns: Test run: tr_abc123
# Second batch -- appends to the same run
"Sync additional results for order-agent to test run tr_abc123"Streaming Mode
For long-running test suites, create an empty test run first to get the dashboard link, then stream results as each test completes:
# 1. Create empty test run
"Create an empty test run for customer-support-agent expecting 25 results, named 'Full Suite -- Streaming'"
# Returns: Test Run ID: tr_xyz789, Dashboard URL
# 2. Stream results as tests complete (via trace library or incremental sync)
"Sync results for customer-support-agent to test run tr_xyz789"CI/CD Pipeline
Label results from CI with the ci source:
"Sync test results for billing-agent from CI, named 'PR #142 -- Billing Refactor'"Source Labels
Use the source parameter to indicate where the tests were executed. This helps track result origins on the dashboard:
| Source | When to Use |
|---|---|
mcp | Running from an MCP client (Claude Desktop, Cursor, etc.). Default value. |
ci | Running in a CI/CD pipeline (GitHub Actions, etc.). |
vscode | Running from VS Code. |
cli | Running from a command-line script. |
api | Running via the REST API directly. |
Result Object Schema
Each object in the results JSON array must follow this structure:
Required Fields
| Field | Type | Description |
|---|---|---|
scenario_id | string | The ID of the test case (from invarium_get_tests). Must be non-empty. |
user_message | string | The input message that was sent to the agent. Must be non-empty. |
agent_response | string | The agent’s actual output/response. Must be non-empty. |
Optional Fields
| Field | Type | Description |
|---|---|---|
tools_called | array | Array of tool call objects. Each object must have a name field (string) and may include parameters (object). |
passed | boolean | Whether the test passed. If omitted, the result status is set to “pending” and Invarium evaluates it automatically. |
notes | string | Free-text notes about the test execution (e.g., observations, context). |
latency_ms | number | Execution time in milliseconds. Used for performance tracking. |
error_message | string | Error details for failed test cases. |
Example Result Object
{
"scenario_id": "sc_001",
"user_message": "What is your refund policy for digital products?",
"agent_response": "Based on our knowledge base, digital products can be refunded within 14 days of purchase. Source: KB article #1234.",
"tools_called": [
{
"name": "search_knowledge_base",
"parameters": { "query": "refund policy digital products" }
}
],
"passed": true,
"notes": "Agent correctly searched KB and cited source.",
"latency_ms": 1250
}If you omit the passed field, Invarium will automatically evaluate the agent response against the expected behavior defined in the test case. Manual passed values take precedence over automatic evaluation. The maximum number of results per sync call is 1,000.
Error Responses
| Error | Cause | Fix |
|---|---|---|
Agent not found | No blueprint exists for this agent name. | Upload a blueprint first with invarium_upload_blueprint. |
Results are not valid JSON: ... | The results string is not valid JSON. | Ensure the results parameter is a valid JSON array string. |
results must be a JSON array | The results value is not an array. | Wrap result objects in a JSON array [...]. |
results array is empty | The results array has no items (and create_empty is false). | Provide at least one test result object. |
Too many results (N). Maximum is 1000 per sync. | The results array exceeds 1,000 items. | Split results into multiple sync calls using test_run_id for incremental sync. |
Result #N: missing or empty required field 'field' | A result object is missing scenario_id, user_message, or agent_response. | Ensure all three required fields are present and non-empty. |
Result #N: tools_called must be an array | The tools_called field is not an array. | Provide tools_called as an array of objects with a name field. |
Result #N: 'passed' must be a boolean | The passed field is not true or false. | Use a boolean value, or omit the field for automatic evaluation. |
Invalid source 'xyz' | The source value is not recognized. | Use one of: mcp, ci, vscode, cli, api. |
See Error Codes for the full error reference.