[OPIK-6599] [SDK] fix: opik migrate fails loud on skipped items by JetoPistola · Pull Request #6782 · comet-ml/opik

JetoPistola · 2026-05-20T06:00:15Z

Details

Fix silent data loss in opik migrate dataset. Previously the experiment cascade tallied per-item skip counters but recorded them only on a cascade_experiments_summary audit entry with status="ok", and the CLI exited 0 with no stderr signal. Operators relying on the audit log or the exit code had no machine-readable indication that data was lost.

cascade_one_experiment now emits one skip audit record per (experiment, reason) pair with count and sample_source_ids (capped at 20 ids during collection — counts continue to come from the always-incremented result.items_skipped_* counters, so the audit row carries the true total even when the sample is truncated).
cascade_experiments_summary flips to status="failed" when any skip counter is non-zero.
main.py inspects skip records after execute_plan, prints a SKIP_SUMMARY: experiments_skipped=… items_skipped_missing_trace=… items_skipped_missing_item=… line to stderr, finalizes the audit to failed, and exits 1.
Stderr also prints a high-level rollback hint naming the destination dataset, experiments, optimizations, traces, and spans the operator must delete in --to-project, plus the rename-back step on the source (<name>_v1 → <name>). Destination state is intentionally not auto-rolled-back — the audit log is the source of truth for what made it across.

Scope: only the data-loss-class skip counters tallied at the cascade summary (experiments_skipped, items_skipped_missing_trace, items_skipped_missing_item). Cosmetic-degradation paths in _resolver.py are out of scope (no data lost). No new flag — fail loud by default per AC. Auto-rollback (a --rollback-on-skip style UX) is tracked as a follow-up under OPIK-6635.

What the user sees

Success (unchanged from before this PR):

$ opik migrate dataset MyDataset --to-project=DestProject
Workspace: my-workspace
<plan table + per-experiment progress bars>
Migrated 'MyDataset' into project 'DestProject' as 'MyDataset'. Took 2m 14s. Audit log: ./opik-migrate-20260520T060114Z.json
$ echo $?
0

Audit log: "status": "ok". No skip records under actions[].

Failure (new):

$ opik migrate dataset MyDataset --to-project=DestProject
Workspace: my-workspace
<plan table + per-experiment progress bars>
<stdout from the cascade, including the existing yellow `Skipped N items due to missing data:` line from imports/experiment.py>
[stderr] opik migrate: 2500 items skipped — destination state was NOT rolled back; see audit log: ./opik-migrate-20260520T060114Z.json
[stderr] SKIP_SUMMARY: experiments_skipped=0 items_skipped_missing_trace=0 items_skipped_missing_item=2500
[stderr] To roll back manually: in project 'DestProject', delete the destination dataset 'MyDataset' along with any experiments, optimizations, traces, and spans that were cascaded into it (the audit log lists each created entity id); then rename the source 'MyDataset_v1' back to 'MyDataset'.
[stderr] (after 2m 14s)
$ echo $?
1

Audit log: "status": "failed", plus one {"type": "skip", "status": "skipped", "reason": "items_missing_dataset_item_remap", ...} row per affected experiment. The pre-existing cascade_experiments_summary action also flips to "status": "failed". CI pipelines that capture stderr (2>&1 / 2>) can grep '^SKIP_SUMMARY:' to gate on specific skip reasons without parsing JSON.

Diagram: diagrams/opik-6599-diagram.html (locally generated; before/after flow, files changed, design decisions).

Change checklist

User facing
Documentation update

Issues

OPIK-6599
Follow-up: OPIK-6635 (auto-rollback UX on partial migrate)

AI-WATERMARK

AI-WATERMARK: yes

Tools: Claude Code
Model(s): Claude Opus 4.7
Scope: full implementation
Human verification: code review + targeted unit tests

Testing

pytest sdks/python/tests/unit/cli/test_migrate_dataset_experiments_cascade.py — 27 tests, including 4 new TestSkipAuditRecords cases (skip-record shape, sample-id cap, missing-trace negative path, no-skip happy path)
pytest sdks/python/tests/unit/cli/test_migrate_dataset_planner.py — 16 tests, including 3 new TestFinalizeWithSkipsOrOk cases (exit code 1, stderr SKIP_SUMMARY: line, rollback hint includes target project + _v1 source name, totals aggregated by reason across multiple skip records)
Full migrate suite: pytest sdks/python/tests/unit/cli/test_migrate_* — 100 tests pass
pre-commit (sdks/python config): ruff + ruff-format + mypy all green on changed files
E2E (tests/e2e/cli/test_migrate_dataset_*) not run — requires real backend; covered separately

Documentation

N/A — internal CLI behavior change; the AC explicitly says no new flag. The existing opik migrate dataset --help output is unchanged. Audit-log readers gain new skip record entries (additive, cascade_experiments_summary retains its existing fields for back-compat). The new stderr rollback hint is self-documenting for operators who hit the failure path.

Previously the experiment cascade tallied skip counters but recorded them only on the cascade_experiments_summary audit entry (status="ok"), and the CLI exited 0 with no stderr signal. Operators relying on the audit log or the exit code had no machine-readable indication that data was lost. Now: - cascade_one_experiment emits one "skip" audit record per (experiment, reason) with count + sample source ids (capped at 20) - cascade_experiments_summary flips to status="failed" when any skip counter is non-zero - main.py inspects skip records after execute_plan, prints a SKIP_SUMMARY line to stderr, finalizes the audit to "failed", and exits 1 Destination state is intentionally NOT rolled back -- operators read the audit log to know what made it across. The failure message calls this out explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

baz-reviewer · 2026-05-20T06:05:11Z

+    #
+    # Per-(experiment, reason) audit records are emitted at the end with
+    # the affected source ids (capped at ``_SKIP_SAMPLE_LIMIT``) so the
+    # CLI can fail loud with a machine-readable breakdown -- see OPIK-6599.
+    missing_trace_ids: List[str] = []
+    missing_item_ids: List[str] = []
    for item in items:
        if item.trace_id and item.trace_id not in result.trace_id_remap:
            result.items_skipped_missing_trace += 1
+            missing_trace_ids.append(item.trace_id)
        if item.dataset_item_id and item.dataset_item_id not in item_id_remap:
            result.items_skipped_missing_item += 1


missing_trace_ids/missing_item_ids grow unbounded before being trimmed in _record_skip — should we cap them at _SKIP_SAMPLE_LIMIT during collection? Also, should we add a test for the missing-trace negative path asserting a skip audit with reason == "items_missing_trace_remap" and the expected count/sample?

_{Finding types: Reliable test patterns Batch and stream | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Two related improvements in `cascade_one_experiment` (experiments.py ~lines 472-503): 1. Cap sample lists during collection: instead of appending every missing id and trimming later in `_record_skip`, append only while `len(missing_trace_ids) < _SKIP_SAMPLE_LIMIT` (same for `missing_item_ids`). Track total miss counts independently (or via already-incremented counters like `result.items_skipped_missing_trace`) and pass those as `count=` to `_record_skip`. 2. In `test_migrate_dataset_experiments_cascade.py`, add a test for the missing-trace negative path: simulate an experiment cascade where `dataset_item_id` remap is present but `trace_id` is NOT in `result.trace_id_remap`. Assert the audit log receives exactly one `skip` record with `details.reason == "items_missing_trace_remap"`, `details.count` equals the expected number of missing trace ids, and `details.sample_source_ids` contains the expected offending trace ids (respecting `_SKIP_SAMPLE_LIMIT`).

Commit e624694 addressed this comment by capping the missing-trace/item sample lists as they are collected and passing the total counts to the new _record_skip helper so the audit record keeps accurate counts with a bounded sample. It also emits explicit skip audit records for the missing-trace and missing-item reasons, including the newly added _SKIP_SAMPLE_LIMIT. The accompanying tests now cover the missing-trace negative path, asserting the audit reason, count, and sampled source IDs are as expected.

Good catch on the unbounded growth. Fixed in e624694 — missing_trace_sample / missing_item_sample are now capped at _SKIP_SAMPLE_LIMIT during collection (the appends are guarded by len(...) < _SKIP_SAMPLE_LIMIT). count continues to come from the always-fully-incremented result.items_skipped_* counters, so the audit record still carries the true total even when the sample is capped.

Also added the missing-trace negative-path test you asked for: TestSkipAuditRecords.test_missing_trace_remap__emits_skip_record. It monkey-patches _copy_traces_and_spans so the trace_id_remap is deliberately incomplete after the trace-copy phase (mirroring a production BE quirk where a trace gets dropped mid-cascade), then asserts the cascade emits exactly one skip record with reason == "items_missing_trace_remap", the right count, and the orphan trace_id in sample_source_ids.

baz-reviewer · 2026-05-20T06:05:11Z

+    skip_records = [
+        action for action in audit.actions if action.get("status") == "skipped"
+    ]
+    if not skip_records:
+        audit.finalize("ok")
+        audit.write(audit_path)
+        elapsed = _format_elapsed(elapsed_seconds)
+        console.print(
+            f"[green]Migrated '{name}' into project '{target_project}' as "
+            f"'{target_label}'.[/green] Took {elapsed}. Audit log: {audit_path}"
+        )
+        return


SKIP_SUMMARY reads record.get("reason", ""), but _record_skip puts reason under details, so reason_to_total_key.get(...) ends up None and totals/total_skipped stay at 0; should we pull reason and count from record.get("details", {}).get("reason", "") and record.get("details", {}).get("count", 0)?

_{Finding type: Validate nullable inputs | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In sdks/python/src/opik/cli/migrate/main.py around lines 211-242 inside `_finalize_with_skips_or_ok`, fix the SKIP_SUMMARY aggregation logic so it reads skip classification from the actual audit record shape. Update the loop to extract `reason` and `count` from `record.get('details', {})` (e.g., `details.get('reason','')` and `details.get('count',0)`) instead of `record.get('reason', '')`, and then use those values to increment `totals` so `total_skipped` reflects real skips. Ensure the existing mapping in `reason_to_total_key` and the final stderr messages stay the same, but now compute correct counts when skip records are present.

Verified this against the audit-record shape and the bot is mistaken here. AuditLog.record(details=...) in audit.py:51-52 flattens details into the entry via entry.update(details), so record["reason"] and record["count"] are on the top level, not nested under details. Quick repro:

>>> from opik.cli.migrate.audit import AuditLog >>> a = AuditLog(command='test', args={}) >>> a.record(type='skip', status='skipped', details={'reason': 'r', 'count': 42}) >>> a.actions[0] {'type': 'skip', 'status': 'skipped', 'recorded_at': '...', 'reason': 'r', 'count': 42}

The existing test_multiple_skip_records__totals_aggregated_by_reason test asserts the aggregation produces items_skipped_missing_trace=14 and items_skipped_missing_item=6 from skip records carrying count=7 and count=3 respectively — it would fail with =0 =0 if the read sites were broken. So the current code is correct as written.

Thanks for the correction — you're right, I misread the audit record shape here. I'll save this to memory once the PR is merged.

baz-reviewer · 2026-05-20T06:05:11Z

+        "experiment_recreate_returned_false": "experiments_skipped",
+        "items_missing_trace_remap": "items_skipped_missing_trace",
+        "items_missing_dataset_item_remap": "items_skipped_missing_item",
+    }


The skip-aggregation loop reads record.get("reason")/record.get("count"), but _record_skip stores them under details, so total_key ends up None and the SKIP_SUMMARY/total_skipped stays 0 even when skips exist—should we read record.get("details", {})["reason"|"count"] or move those fields to the top level?

_{Finding type: Centralize error handling | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In sdks/python/src/opik/cli/migrate/main.py around lines 193-260, inside the _finalize_with_skips_or_ok function where it builds totals from skip_records, fix the aggregation to read reason and count from the correct location. The current code uses record.get("reason") and record.get("count"), but skip records created elsewhere store those fields under record["details"] (e.g., details={"reason": ..., "count": ...}). Refactor the loop so it pulls details = record.get("details", {}) and then uses details.get("reason", "") and int(details.get("count", 0)) when incrementing totals, so SKIP_SUMMARY and total_skipped accurately reflect real skips. If there are tests asserting the SKIP_SUMMARY fields, update them or add a targeted test to cover this mismatch.

Same as my reply on the sibling thread above — false positive. AuditLog.record(details=...) flattens details into the entry (audit.py:51-52), so top-level record.get("reason") / record.get("count") is correct. Verified with the test_multiple_skip_records__totals_aggregated_by_reason test which asserts non-zero aggregated totals.

Thanks for the correction — that makes sense. I’ll save this to memory once the PR is merged.

baz-reviewer · 2026-05-20T06:05:11Z

+        with pytest.raises(SystemExit) as exc:
+            _finalize_with_skips_or_ok(


These tests call _finalize_with_skips_or_ok directly instead of the public opik migrate dataset CLI, so they bypass the command wiring guidance in .agents/skills/python-sdk/testing.md — should we move them to runner.invoke(cli, ["migrate", "dataset", ...]) and patch planner/executor inputs rather than the internal helper?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In sdks/python/tests/unit/cli/test_migrate_dataset_planner.py around lines 101-114 (and similarly 134-149 and 187-195), update the tests in `TestFinalizeWithSkipsOrOk` to stop calling the private `_finalize_with_skips_or_ok` helper directly. Instead, use `CliRunner` to invoke the public command `cli` with the `migrate dataset` arguments, and patch the underlying migrate/planner/executor inputs so that the command emits at least one `skip` audit record (and in the no-skips case emits none). Keep the same behavioral assertions (non-zero exit code, SKIP_SUMMARY on stderr, and audit.json finalized status/content) but derive them from the CLI invocation results and on-disk audit file produced by the command.

Intentional. The unit tests target _finalize_with_skips_or_ok directly because that helper is the only logic this PR adds in main.py — the surrounding Click wiring (migrate_group, migrate_dataset_command) is unchanged from origin/main and already covered by the existing TestMigrateHelp CliRunner tests and the planner-level tests in this module.

Going through runner.invoke(cli, ["migrate", "dataset", ...]) to reach this helper would require stubbing the planner, the version-replay phase, the cascade-experiments phase, and the workspace banner — a lot of scaffolding to verify branching logic that's already isolated in the helper. The cascade-level audit-record emission (which produces the skip records the helper reads) IS covered end-to-end in test_migrate_dataset_experiments_cascade.py::TestSkipAuditRecords, so the integration boundary is exercised — just not at the CLI surface.

If we add a follow-up test for a regression that only manifests with full CLI wiring, I'll switch then.

Thanks for the context — that makes sense given the unchanged Click wiring and existing integration coverage. I'll save this to memory once the PR is merged.

Addresses PR review feedback on OPIK-6599: - Stderr now prints a high-level rollback hint naming the destination dataset / experiments / optimizations / traces / spans the operator must delete, plus the rename-back step on the source. Audit log remains the source of truth for the per-entity ids. Auto-rollback is tracked as a follow-up under OPIK-6635. - Cap missing_trace_sample / missing_item_sample during collection rather than slicing inside _record_skip. Counts come from the always-fully-incremented result.items_skipped_* counters, so the audit record still carries the true total even when the sample is capped. Bounds peak memory on pathological inputs. - New negative-path test: cascade emits a missing_trace_remap skip record when the trace_id_remap is incomplete after _copy_traces_and_spans. Monkey-patches the helper to leave the remap deliberately partial, matching the production case where a BE quirk drops a trace mid-cascade. - Existing planner test now asserts the rollback hint mentions the target project and the source's _v1 name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

baz-reviewer · 2026-05-20T18:25:52Z

+    # High-level rollback hint. We deliberately don't ship a step-by-step
+    # CLI playbook here -- the audit log is the source of truth for what
+    # was actually created. Each ``ok`` action in ``audit.actions`` carries
+    # the destination entity id; an operator can grep the audit JSON to
+    # see exactly what landed in the destination project before deciding
+    # what to delete. Auto-rollback (a one-flag clean reverse) is tracked
+    # as a follow-up; this PR is the loud-fail mechanic only.
+    _stderr_console.print(
+        f"[yellow]To roll back manually: in project "
+        f"'{target_project}', delete the destination dataset "
+        f"'{target_label}' along with any experiments, optimizations, "
+        f"traces, and spans that were cascaded into it (the audit log "


This rollback hint is printed via _stderr_console.print (stderr), but CI guidance that captures only stdout through tee … won’t include it—should we emit it on stdout instead (e.g., print or a stdout console)?

_{Finding type: Prefer standard logging | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In sdks/python/src/opik/cli/migrate/main.py around lines 259-273 where the “High-level rollback hint” is emitted via `_stderr_console.print`, change it to write to stdout instead. Replace `_stderr_console` with the stdout console (or use `print`) so CI pipelines that only capture stdout via `tee` will include the rollback instructions in summaries/PR bodies. Keep the message content and formatting the same; only the output stream should change.

Skipping — the rollback hint deliberately lives on stderr with the rest of the failure block:

opik migrate: N items skipped — destination state was NOT rolled back; see audit log: <path> (stderr, red)

SKIP_SUMMARY: ... (stderr, machine-parseable)

The rollback hint (stderr, yellow)

(after Xs) (stderr, dim)

These four lines are one cohesive failure summary. Splitting just line 3 off to stdout would scatter the failure narrative across two streams.

CI that captures both streams via 2>&1 (or just redirects stderr via 2>) keeps the hint. CI that captures only stdout via tee was already missing the loud-fail message, the SKIP_SUMMARY line, and the exit-code context — its operators wouldn't even know the migrate failed, so missing the rollback hint isn't the load-bearing gap. Fixing that CI's capture is the right escalation; moving one line to stdout would only paper over it.

🤖 Reply posted via /address-github-pr-comments

Thanks, that makes sense — keeping the rollback hint on stderr preserves the single failure-summary block here. I’ll save this to memory once the PR is merged.

github-actions · 2026-05-20T18:44:51Z

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.42 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch danield/OPIK-6599-opik-migrate-silently-skips-items-and-exits-0
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

CometActions · 2026-05-20T18:53:50Z

✅ Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

URL: https://pr-6782.dev.comet.com
Cluster: comet-ml-development
Namespace: pr-6782
Version: 2.0.42-6782-merge-2133
Application logs: View in Grafana

The deployment has completed successfully and the version has been verified.

CometActions · 2026-05-21T05:34:45Z

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

CometActions · 2026-05-22T05:29:27Z

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

CometActions · 2026-05-23T05:20:27Z

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

github-actions Bot assigned JetoPistola May 20, 2026

github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels May 20, 2026

baz-reviewer Bot reviewed May 20, 2026

View reviewed changes

JetoPistola added the test-environment Deploy Opik adhoc environment label May 20, 2026

JetoPistola requested a review from alexkuzmik May 20, 2026 18:46

JetoPistola marked this pull request as ready for review May 20, 2026 18:46

JetoPistola requested a review from a team as a code owner May 20, 2026 18:46

JetoPistola requested a review from AndreiCautisanu May 20, 2026 19:30

CometActions removed the test-environment Deploy Opik adhoc environment label May 21, 2026

		with pytest.raises(SystemExit) as exc:
		_finalize_with_skips_or_ok(

Conversation

JetoPistola commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

What the user sees

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JetoPistola May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JetoPistola May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JetoPistola May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JetoPistola May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JetoPistola May 20, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

CometActions commented May 20, 2026

Access Information

Uh oh!

CometActions commented May 21, 2026

Uh oh!

CometActions commented May 22, 2026

Uh oh!

CometActions commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JetoPistola commented May 20, 2026 •

edited

Loading