`invarium_sync_results`

Sync test results from your IDE to the Invarium dashboard. After running test cases locally, use this tool to upload the results. Creates a new test run or appends to an existing one.

When to Use

Call invarium_sync_results after running test cases against your agent. This is typically called automatically during test execution. Common usage patterns:

Batch mode — collect all results and sync them in a single call to create a new test run
Incremental mode — sync results in batches by passing the test_run_id from the first sync to append results to the same run
Streaming mode — call with create_empty=true first to get a test run ID and dashboard link, then stream results as tests complete

After syncing, use invarium_get_test_run to fetch detailed results and invarium_get_agent for the updated AQS score.

See Run Tests and Sync Results for the full workflow.

Parameters

invarium_sync_results

Sync test results from your IDE to the Invarium dashboard. Results are automatically evaluated against expected behavior. Creates a new test run or appends to an existing one.

Parameters

Name	Type	Required	Description
`agent_name`	string	required	Name of the agent whose tests were run.
`results`	string	required	JSON array of test result objects. Each must contain: scenario_id, user_message, agent_response. See Result Object Schema below for the full structure.
`test_run_id`	string \| null	default: null	Existing test run ID for incremental sync. If provided, results are appended to that run. If omitted, a new test run is created.
`name`	string \| null	default: null	Human-readable name for the test run (e.g., 'Smoke Test -- Mixed Complexity'). Helps identify runs on the dashboard.
`source`	string	default: mcp	Label for where tests were executed. Valid values: mcp, ci, vscode, cli, api.
`create_empty`	boolean	default: false	If true, create an empty test run with no results. Returns a test_run_id and dashboard URL for streaming mode. The results parameter is ignored.
`expected_total`	int \| null	default: null	Expected number of test results for streaming mode. Tells the backend how many results to expect before marking the run as completed. Use with create_empty.

Returns

Summary string with test run ID, agent name, result count, and dashboard link.

Example

"Sync the test results for customer-support-agent to the dashboard"

Response

Batch Sync Response

After syncing results, the tool returns a confirmation with the dashboard link:

Results synced to Invarium
Test run: tr_8f4a2b1c
Agent: customer-support-agent
Results: 10 test case(s) synced
View on dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Empty Test Run Response (Streaming Mode)

When create_empty is true, the response provides a test run ID for streaming:

Empty test run created.

  Test Run ID: tr_8f4a2b1c
  Dashboard:   https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Results will stream in as test cases complete.

Examples

Basic — Sync a Batch of Results

Sync all test results at once to create a new test run:

"Sync the test results for customer-support-agent to the dashboard"

Advanced — Incremental Sync

Run tests in batches and append results to the same test run:

# First batch -- creates a new test run
"Sync the first batch of test results for order-agent as 'Regression Suite -- v2.1'"
# Returns: Test run: tr_abc123

# Second batch -- appends to the same run
"Sync additional results for order-agent to test run tr_abc123"

Streaming Mode

For long-running test suites, create an empty test run first to get the dashboard link, then stream results as each test completes:

# 1. Create empty test run
"Create an empty test run for customer-support-agent expecting 25 results, named 'Full Suite -- Streaming'"
# Returns: Test Run ID: tr_xyz789, Dashboard URL

# 2. Stream results as tests complete (via trace library or incremental sync)
"Sync results for customer-support-agent to test run tr_xyz789"

CI/CD Pipeline

Label results from CI with the ci source:

"Sync test results for billing-agent from CI, named 'PR #142 -- Billing Refactor'"

Source Labels

Use the source parameter to indicate where the tests were executed. This helps track result origins on the dashboard:

Source	When to Use
`mcp`	Running from an MCP client (Claude Desktop, Cursor, etc.). Default value.
`ci`	Running in a CI/CD pipeline (GitHub Actions, etc.).
`vscode`	Running from VS Code.
`cli`	Running from a command-line script.
`api`	Running via the REST API directly.

Result Object Schema

Each object in the results JSON array must follow this structure:

Required Fields

Field	Type	Description
`scenario_id`	string	The ID of the test case (from `invarium_get_tests`). Must be non-empty.
`user_message`	string	The input message that was sent to the agent. Must be non-empty.
`agent_response`	string	The agent’s actual output/response. Must be non-empty.

Optional Fields

Field	Type	Description
`tools_called`	array	Array of tool call objects. Each object must have a `name` field (string) and may include `parameters` (object).
`passed`	boolean	Whether the test passed. If omitted, the result status is set to “pending” and Invarium evaluates it automatically.
`notes`	string	Free-text notes about the test execution (e.g., observations, context).
`latency_ms`	number	Execution time in milliseconds. Used for performance tracking.
`error_message`	string	Error details for failed test cases.

Example Result Object

{
  "scenario_id": "sc_001",
  "user_message": "What is your refund policy for digital products?",
  "agent_response": "Based on our knowledge base, digital products can be refunded within 14 days of purchase. Source: KB article #1234.",
  "tools_called": [
    {
      "name": "search_knowledge_base",
      "parameters": { "query": "refund policy digital products" }
    }
  ],
  "passed": true,
  "notes": "Agent correctly searched KB and cited source.",
  "latency_ms": 1250
}

If you omit the passed field, Invarium will automatically evaluate the agent response against the expected behavior defined in the test case. Manual passed values take precedence over automatic evaluation. The maximum number of results per sync call is 1,000.

Error Responses

Error	Cause	Fix
`Agent not found`	No blueprint exists for this agent name.	Upload a blueprint first with `invarium_upload_blueprint`.
`Results are not valid JSON: ...`	The `results` string is not valid JSON.	Ensure the `results` parameter is a valid JSON array string.
`results must be a JSON array`	The `results` value is not an array.	Wrap result objects in a JSON array `[...]`.
`results array is empty`	The `results` array has no items (and `create_empty` is false).	Provide at least one test result object.
`Too many results (N). Maximum is 1000 per sync.`	The results array exceeds 1,000 items.	Split results into multiple sync calls using `test_run_id` for incremental sync.
`Result #N: missing or empty required field 'field'`	A result object is missing `scenario_id`, `user_message`, or `agent_response`.	Ensure all three required fields are present and non-empty.
`Result #N: tools_called must be an array`	The `tools_called` field is not an array.	Provide `tools_called` as an array of objects with a `name` field.
`Result #N: 'passed' must be a boolean`	The `passed` field is not `true` or `false`.	Use a boolean value, or omit the field for automatic evaluation.
`Invalid source 'xyz'`	The `source` value is not recognized.	Use one of: `mcp`, `ci`, `vscode`, `cli`, `api`.

See Error Codes for the full error reference.

Was this page helpful?

invarium_get_tests invarium_list_test_runs