KeyValueSoftwareSystems · Milansuman · Apr 10, 2026
diff --git a/skills/netra-mcp-usage/SKILL.md b/skills/netra-mcp-usage/SKILL.md
@@ -12,134 +12,24 @@ Use this skill when you need to inspect traces through Netra MCP tools and want
 - Query traces in a time range with filtering, sorting, and cursor pagination.
 - Retrieve full span trees for a selected trace id.
 - Guide incident/debug workflows from trace search to root-cause analysis.
+- Run MCP-driven evaluation workflows for single-turn and multi-turn datasets.
 
 ## Primary MCP Tools
 - `netra_query_traces`
 - `netra_get_trace_by_id`
+- `netra_list_provider_configs`
+- `netra_create_dataset`
+- `netra_add_dataset_test_case`
+- `netra_list_evaluators`
+- `netra_add_evaluator`
+- `netra_get_test_run_details`
 
-## Workflow
-1. Start with a narrow time window and low limit.
-2. Add the minimum filters needed to isolate relevant traces.
-3. Sort for your objective (recent, slowest, most expensive, errors).
-4. Page through results using returned cursor values.
-5. Fetch full spans for one trace id.
-6. Inspect hierarchy, status, latency, and attributes.
+## Use-Case specific references
 
-## query_traces Input Schema
-Required:
-- `startTime` (string, ISO 8601)
-- `endTime` (string, ISO 8601)
+- Querying traces, filters, sort options, pagination, and incident triage: `references/traces.md`
+- Single-turn evaluation flow (providers -> datasets -> test cases -> evaluators -> test run): `references/evaluations-single-turn.md`
+- Multi-turn simulation flow (scenario-driven test cases and evaluator config handling): `references/simulation-multi-turn.md`
 
-Optional:
-- `limit` (number, 1-100, default 20)
-- `cursor` (string)
-- `direction` (`up` | `down`, default `down`)
-- `sortField`
-- `sortOrder` (`asc` | `desc`, default `desc`)
-- `filters` (array of filter objects)
+## Feedback
 
-### sortField Values
-- `latency_ms`
-- `name`
-- `total_cost`
-- `has_pii`
-- `has_violation`
-- `start_time`
-- `environment`
-- `service`
-- `has_error`
-- `total_tokens`
-
-### Filter Object Schema
-Each filter object must include:
-- `field`
-- `value`
-- `type`
-- `operator`
-
-Optional in filter object:
-- `key` (for nested/object-style filtering)
-
-#### field Values
-- `name`
-- `tenant_id`
-- `user_id`
-- `session_id`
-- `environment`
-- `service`
-- `metadata`
-- `projectIds`
-- `project_id`
-- `parent_span_id`
-- `has_pii`
-- `has_violation`
-- `has_error`
-- `models`
-- `total_cost`
-- `latency`
-
-#### type Values
-- `string`
-- `number`
-- `boolean`
-- `arrayOptions`
-- `attributeKey`
-- `object`
-
-#### operator Values
-- `equals`
-- `greater_than`
-- `less_than`
-- `greater_equal_to`
-- `less_equal_to`
-- `contains`
-- `not_equals`
-- `any_of`
-- `none_of`
-- `not_contains`
-- `starts_with`
-- `ends_with`
-- `is_null`
-- `is_not_null`
-
-## Filter Patterns
-- Error traces only:
-  - `field: has_error`, `type: boolean`, `operator: equals`, `value: true`
-- Specific session:
-  - `field: session_id`, `type: string`, `operator: equals`, `value: <session-id>`
-- High latency:
-  - `field: latency`, `type: number`, `operator: greater_than`, `value: 3000`
-- Service scoped:
-  - `field: service`, `type: string`, `operator: equals`, `value: <service-name>`
-- Metadata key/value:
-  - `field: metadata`, `type: object`, `key: <metadata-key>`, `operator: equals`, `value: <value>`
-
-## Pagination Pattern
-1. Run `query_traces` without `cursor`.
-2. Capture a `cursor` from returned trace items.
-3. Re-run `query_traces` with the cursor and `direction: down`.
-4. Continue while `pageInfo.hasNextPage` is true.
-
-## get_trace_by_id Input Schema
-Required:
-- `traceId` (string)
-
-Behavior:
-- Returns complete span array for the trace id.
-- Use this after `query_traces` to inspect one trace deeply.
-- Invalid ids return a not-found style error.
-
-## Incident Triage Recipe
-1. Query for failing traces (`has_error=true`) in the incident window.
-2. Sort by `latency_ms` desc to identify worst requests.
-3. Pull one trace via `get_trace_by_id`.
-4. Validate root span presence and parent-child span flow.
-5. Check slow spans and tool/model metadata.
-6. Compare with a nearby successful trace if needed.
-
-## Practical Tips
-- Keep initial windows short (5-30 minutes) for faster narrowing.
-- Use one or two filters first, then add more only if needed.
-- Prefer exact-match IDs (`session_id`, `user_id`, `tenant_id`) when available.
-- Use `sortField=total_cost` to find expensive traces quickly.
-- If no results: widen time range first, then relax filters.
+If the user is unhappy with the results, ask them to open an issue at https://github.com/KeyValueSoftwareSystems/netra-skills/issues/new.
diff --git a/skills/netra-mcp-usage/references/evaluations-single-turn.md b/skills/netra-mcp-usage/references/evaluations-single-turn.md
@@ -0,0 +1,168 @@
+---
+name: netra-mcp-evaluations-single-turn
+description: End-to-end single-turn evaluation workflow in Netra MCP from provider selection to test run details.
+---
+
+# Netra MCP Evaluations (Single-Turn)
+
+Use this reference for a schema-correct single-turn evaluation flow using Netra MCP tools.
+
+## End-To-End Flow
+
+1. List provider configurations.
+2. Create a single-turn dataset.
+3. Add single-turn test cases.
+4. List evaluators.
+5. If project evaluators are missing (or you only see library evaluators), create evaluators in the Netra dashboard first.
+6. Attach evaluators to dataset or test cases.
+7. Execute a test run.
+8. Fetch run results using test run id.
+
+## Step 1: List Provider Configurations
+
+Tool: `netra_list_provider_configs`
+
+Purpose:
+- Find valid `provider_id` and `model` values for dataset items.
+- Confirm the provider/model is available for your use case.
+
+Example:
+
+```json
+{}
+```
+
+## Step 2: Create A Single-Turn Dataset
+
+Tool: `netra_create_dataset`
+
+Required choices:
+- `turnType`: `single`
+- `datasetType`: usually `text`
+
+Example:
+
+```json
+{
+  "name": "support-quality-single-turn",
+  "turnType": "single",
+  "datasetType": "text",
+  "tags": ["support", "regression"]
+}
+```
+
+## Step 3: Add Single-Turn Test Cases
+
+Tool: `netra_add_dataset_test_case`
+
+Important:
+- For single-turn datasets, `input` is required.
+- `providerConfig` is required in practice. Always pass `provider_id` and `model` from Step 1.
+
+Example:
+
+```json
+{
+  "datasetId": "<dataset-id>",
+  "input": "User asks for a refund after 45 days",
+  "expectedOutput": "Assistant explains policy and offers next best options",
+  "contextData": {
+    "policy": "30-day refund window",
+    "region": "US"
+  },
+  "providerConfig": {
+    "provider_id": "<provider-id>",
+    "model": "<model-name>"
+  },
+  "tags": ["refund"]
+}
+```
+
+## Step 4: List Evaluators
+
+Tool: `netra_list_evaluators`
+
+Purpose:
+- Discover project evaluators available for attachment.
+- Inspect available library evaluators in `libraryData`.
+
+Example:
+
+```json
+{
+  "turnType": "single",
+  "page": 1,
+  "limit": 20
+}
+```
+
+Decision rule:
+- If project evaluator results are empty and only `libraryData` has entries, stop and instruct the user to create evaluators in the Netra dashboard before continuing.
+
+Suggested instruction to user:
+- "No project evaluators are available yet. Please create/select evaluators in the Netra dashboard for this project, then rerun `netra_list_evaluators`."
+
+## Step 5: Attach Evaluators
+
+Tool: `netra_add_evaluator`
+
+Options:
+- Attach at dataset level (`targetType: dataset`).
+- Attach at test-case level (`targetType: test_case`, requires `datasetItemId`).
+
+Example (dataset-level):
+
+```json
+{
+  "targetType": "dataset",
+  "datasetId": "<dataset-id>",
+  "evaluatorId": "<evaluator-id>",
+  "isActive": true
+}
+```
+
+Example (test-case-level):
+
+```json
+{
+  "targetType": "test_case",
+  "datasetId": "<dataset-id>",
+  "datasetItemId": "<dataset-item-id>",
+  "evaluatorId": "<evaluator-id>"
+}
+```
+
+## Step 6: Execute Test Run
+
+Use your workspace test-run execution tool (commonly named `netra_execute_test_run`) to run the dataset against the target system.
+
+Expected output:
+- A `testRunId` used for retrieval and analysis.
+
+## Step 7: Get Test Run Details
+
+Tool: `netra_get_test_run_details`
+
+Required:
+- `testRunId`
+
+Optional:
+- `page`, `limit`, `filters`
+
+Example:
+
+```json
+{
+  "testRunId": "<test-run-id>",
+  "page": 1,
+  "limit": 20
+}
+```
+
+## Practical Checks
+
+1. Always resolve `provider_id` and `model` before adding test cases.
+2. For single-turn cases, verify `input` is present for every item.
+3. Treat missing project evaluators as a setup blocker, not a runtime failure.
+4. Attach evaluators before running test executions to avoid incomplete scoring.
+5. Store and reuse `testRunId` for iterative detail queries.