MCP Referenceinvarium_sync_results

invarium_sync_results

Sync test results from your IDE to the Invarium dashboard. After running test cases locally, use this tool to upload the results. Creates a new test run or appends to an existing one.

When to Use

Call invarium_sync_results after running test cases against your agent. This is typically called automatically during test execution. Common usage patterns:

  • Batch mode — collect all results and sync them in a single call to create a new test run
  • Incremental mode — sync results in batches by passing the test_run_id from the first sync to append results to the same run
  • Streaming mode — call with create_empty=true first to get a test run ID and dashboard link, then stream results as tests complete

After syncing, use invarium_get_test_run to fetch detailed results and invarium_get_agent for the updated AQS score.

See Run Tests and Sync Results for the full workflow.

Parameters

invarium_sync_results

Sync test results from your IDE to the Invarium dashboard. Results are automatically evaluated against expected behavior. Creates a new test run or appends to an existing one.

Parameters

NameTypeRequiredDescription
agent_namestringrequiredName of the agent whose tests were run.
resultsstringrequiredJSON array of test result objects. Each must contain: scenario_id, user_message, agent_response. See Result Object Schema below for the full structure.
test_run_idstring | nulldefault: nullExisting test run ID for incremental sync. If provided, results are appended to that run. If omitted, a new test run is created.
namestring | nulldefault: nullHuman-readable name for the test run (e.g., 'Smoke Test -- Mixed Complexity'). Helps identify runs on the dashboard.
sourcestringdefault: mcpLabel for where tests were executed. Valid values: mcp, ci, vscode, cli, api.
create_emptybooleandefault: falseIf true, create an empty test run with no results. Returns a test_run_id and dashboard URL for streaming mode. The results parameter is ignored.
expected_totalint | nulldefault: nullExpected number of test results for streaming mode. Tells the backend how many results to expect before marking the run as completed. Use with create_empty.

Returns

Summary string with test run ID, agent name, result count, and dashboard link.

Example

"Sync the test results for customer-support-agent to the dashboard"

Response

Batch Sync Response

After syncing results, the tool returns a confirmation with the dashboard link:

Results synced to Invarium
Test run: tr_8f4a2b1c
Agent: customer-support-agent
Results: 10 test case(s) synced
View on dashboard: https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Empty Test Run Response (Streaming Mode)

When create_empty is true, the response provides a test run ID for streaming:

Empty test run created.

  Test Run ID: tr_8f4a2b1c
  Dashboard:   https://app.invarium.dev/agent/ag_xyz/test-runs/tr_8f4a2b1c

Results will stream in as test cases complete.

Examples

Basic — Sync a Batch of Results

Sync all test results at once to create a new test run:

"Sync the test results for customer-support-agent to the dashboard"

Advanced — Incremental Sync

Run tests in batches and append results to the same test run:

# First batch -- creates a new test run
"Sync the first batch of test results for order-agent as 'Regression Suite -- v2.1'"
# Returns: Test run: tr_abc123

# Second batch -- appends to the same run
"Sync additional results for order-agent to test run tr_abc123"

Streaming Mode

For long-running test suites, create an empty test run first to get the dashboard link, then stream results as each test completes:

# 1. Create empty test run
"Create an empty test run for customer-support-agent expecting 25 results, named 'Full Suite -- Streaming'"
# Returns: Test Run ID: tr_xyz789, Dashboard URL

# 2. Stream results as tests complete (via trace library or incremental sync)
"Sync results for customer-support-agent to test run tr_xyz789"

CI/CD Pipeline

Label results from CI with the ci source:

"Sync test results for billing-agent from CI, named 'PR #142 -- Billing Refactor'"

Source Labels

Use the source parameter to indicate where the tests were executed. This helps track result origins on the dashboard:

SourceWhen to Use
mcpRunning from an MCP client (Claude Desktop, Cursor, etc.). Default value.
ciRunning in a CI/CD pipeline (GitHub Actions, etc.).
vscodeRunning from VS Code.
cliRunning from a command-line script.
apiRunning via the REST API directly.

Result Object Schema

Each object in the results JSON array must follow this structure:

Required Fields

FieldTypeDescription
scenario_idstringThe ID of the test case (from invarium_get_tests). Must be non-empty.
user_messagestringThe input message that was sent to the agent. Must be non-empty.
agent_responsestringThe agent’s actual output/response. Must be non-empty.

Optional Fields

FieldTypeDescription
tools_calledarrayArray of tool call objects. Each object must have a name field (string) and may include parameters (object).
passedbooleanWhether the test passed. If omitted, the result status is set to “pending” and Invarium evaluates it automatically.
notesstringFree-text notes about the test execution (e.g., observations, context).
latency_msnumberExecution time in milliseconds. Used for performance tracking.
error_messagestringError details for failed test cases.

Example Result Object

{
  "scenario_id": "sc_001",
  "user_message": "What is your refund policy for digital products?",
  "agent_response": "Based on our knowledge base, digital products can be refunded within 14 days of purchase. Source: KB article #1234.",
  "tools_called": [
    {
      "name": "search_knowledge_base",
      "parameters": { "query": "refund policy digital products" }
    }
  ],
  "passed": true,
  "notes": "Agent correctly searched KB and cited source.",
  "latency_ms": 1250
}

If you omit the passed field, Invarium will automatically evaluate the agent response against the expected behavior defined in the test case. Manual passed values take precedence over automatic evaluation. The maximum number of results per sync call is 1,000.

Error Responses

ErrorCauseFix
Agent not foundNo blueprint exists for this agent name.Upload a blueprint first with invarium_upload_blueprint.
Results are not valid JSON: ...The results string is not valid JSON.Ensure the results parameter is a valid JSON array string.
results must be a JSON arrayThe results value is not an array.Wrap result objects in a JSON array [...].
results array is emptyThe results array has no items (and create_empty is false).Provide at least one test result object.
Too many results (N). Maximum is 1000 per sync.The results array exceeds 1,000 items.Split results into multiple sync calls using test_run_id for incremental sync.
Result #N: missing or empty required field 'field'A result object is missing scenario_id, user_message, or agent_response.Ensure all three required fields are present and non-empty.
Result #N: tools_called must be an arrayThe tools_called field is not an array.Provide tools_called as an array of objects with a name field.
Result #N: 'passed' must be a booleanThe passed field is not true or false.Use a boolean value, or omit the field for automatic evaluation.
Invalid source 'xyz'The source value is not recognized.Use one of: mcp, ci, vscode, cli, api.

See Error Codes for the full error reference.

Was this page helpful?