[OPIK-6599] [SDK] fix: opik migrate fails loud on skipped items#6782
[OPIK-6599] [SDK] fix: opik migrate fails loud on skipped items#6782JetoPistola wants to merge 2 commits into
Conversation
Previously the experiment cascade tallied skip counters but recorded them only on the cascade_experiments_summary audit entry (status="ok"), and the CLI exited 0 with no stderr signal. Operators relying on the audit log or the exit code had no machine-readable indication that data was lost. Now: - cascade_one_experiment emits one "skip" audit record per (experiment, reason) with count + sample source ids (capped at 20) - cascade_experiments_summary flips to status="failed" when any skip counter is non-zero - main.py inspects skip records after execute_plan, prints a SKIP_SUMMARY line to stderr, finalizes the audit to "failed", and exits 1 Destination state is intentionally NOT rolled back -- operators read the audit log to know what made it across. The failure message calls this out explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # | ||
| # Per-(experiment, reason) audit records are emitted at the end with | ||
| # the affected source ids (capped at ``_SKIP_SAMPLE_LIMIT``) so the | ||
| # CLI can fail loud with a machine-readable breakdown -- see OPIK-6599. | ||
| missing_trace_ids: List[str] = [] | ||
| missing_item_ids: List[str] = [] | ||
| for item in items: | ||
| if item.trace_id and item.trace_id not in result.trace_id_remap: | ||
| result.items_skipped_missing_trace += 1 | ||
| missing_trace_ids.append(item.trace_id) | ||
| if item.dataset_item_id and item.dataset_item_id not in item_id_remap: | ||
| result.items_skipped_missing_item += 1 |
There was a problem hiding this comment.
missing_trace_ids/missing_item_ids grow unbounded before being trimmed in _record_skip — should we cap them at _SKIP_SAMPLE_LIMIT during collection? Also, should we add a test for the missing-trace negative path asserting a skip audit with reason == "items_missing_trace_remap" and the expected count/sample?
Finding types: Reliable test patterns Batch and stream | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Two related improvements in `cascade_one_experiment` (experiments.py ~lines 472-503): 1.
Cap sample lists during collection: instead of appending every missing id and trimming
later in `_record_skip`, append only while `len(missing_trace_ids) < _SKIP_SAMPLE_LIMIT`
(same for `missing_item_ids`). Track total miss counts independently (or via
already-incremented counters like `result.items_skipped_missing_trace`) and pass those
as `count=` to `_record_skip`. 2. In `test_migrate_dataset_experiments_cascade.py`, add
a test for the missing-trace negative path: simulate an experiment cascade where
`dataset_item_id` remap is present but `trace_id` is NOT in `result.trace_id_remap`.
Assert the audit log receives exactly one `skip` record with `details.reason ==
"items_missing_trace_remap"`, `details.count` equals the expected number of missing
trace ids, and `details.sample_source_ids` contains the expected offending trace ids
(respecting `_SKIP_SAMPLE_LIMIT`).
There was a problem hiding this comment.
Commit e624694 addressed this comment by capping the missing-trace/item sample lists as they are collected and passing the total counts to the new _record_skip helper so the audit record keeps accurate counts with a bounded sample. It also emits explicit skip audit records for the missing-trace and missing-item reasons, including the newly added _SKIP_SAMPLE_LIMIT. The accompanying tests now cover the missing-trace negative path, asserting the audit reason, count, and sampled source IDs are as expected.
There was a problem hiding this comment.
Good catch on the unbounded growth. Fixed in e624694 — missing_trace_sample / missing_item_sample are now capped at _SKIP_SAMPLE_LIMIT during collection (the appends are guarded by len(...) < _SKIP_SAMPLE_LIMIT). count continues to come from the always-fully-incremented result.items_skipped_* counters, so the audit record still carries the true total even when the sample is capped.
Also added the missing-trace negative-path test you asked for: TestSkipAuditRecords.test_missing_trace_remap__emits_skip_record. It monkey-patches _copy_traces_and_spans so the trace_id_remap is deliberately incomplete after the trace-copy phase (mirroring a production BE quirk where a trace gets dropped mid-cascade), then asserts the cascade emits exactly one skip record with reason == "items_missing_trace_remap", the right count, and the orphan trace_id in sample_source_ids.
| skip_records = [ | ||
| action for action in audit.actions if action.get("status") == "skipped" | ||
| ] | ||
| if not skip_records: | ||
| audit.finalize("ok") | ||
| audit.write(audit_path) | ||
| elapsed = _format_elapsed(elapsed_seconds) | ||
| console.print( | ||
| f"[green]Migrated '{name}' into project '{target_project}' as " | ||
| f"'{target_label}'.[/green] Took {elapsed}. Audit log: {audit_path}" | ||
| ) | ||
| return |
There was a problem hiding this comment.
SKIP_SUMMARY reads record.get("reason", ""), but _record_skip puts reason under details, so reason_to_total_key.get(...) ends up None and totals/total_skipped stay at 0; should we pull reason and count from record.get("details", {}).get("reason", "") and record.get("details", {}).get("count", 0)?
Finding type: Validate nullable inputs | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 211-242 inside
`_finalize_with_skips_or_ok`, fix the SKIP_SUMMARY aggregation logic so it reads skip
classification from the actual audit record shape. Update the loop to extract `reason`
and `count` from `record.get('details', {})` (e.g., `details.get('reason','')` and
`details.get('count',0)`) instead of `record.get('reason', '')`, and then use those
values to increment `totals` so `total_skipped` reflects real skips. Ensure the existing
mapping in `reason_to_total_key` and the final stderr messages stay the same, but now
compute correct counts when skip records are present.
There was a problem hiding this comment.
Verified this against the audit-record shape and the bot is mistaken here. AuditLog.record(details=...) in audit.py:51-52 flattens details into the entry via entry.update(details), so record["reason"] and record["count"] are on the top level, not nested under details. Quick repro:
>>> from opik.cli.migrate.audit import AuditLog
>>> a = AuditLog(command='test', args={})
>>> a.record(type='skip', status='skipped', details={'reason': 'r', 'count': 42})
>>> a.actions[0]
{'type': 'skip', 'status': 'skipped', 'recorded_at': '...', 'reason': 'r', 'count': 42}
The existing test_multiple_skip_records__totals_aggregated_by_reason test asserts the aggregation produces items_skipped_missing_trace=14 and items_skipped_missing_item=6 from skip records carrying count=7 and count=3 respectively — it would fail with =0 =0 if the read sites were broken. So the current code is correct as written.
There was a problem hiding this comment.
Thanks for the correction — you're right, I misread the audit record shape here. I'll save this to memory once the PR is merged.
| "experiment_recreate_returned_false": "experiments_skipped", | ||
| "items_missing_trace_remap": "items_skipped_missing_trace", | ||
| "items_missing_dataset_item_remap": "items_skipped_missing_item", | ||
| } |
There was a problem hiding this comment.
The skip-aggregation loop reads record.get("reason")/record.get("count"), but _record_skip stores them under details, so total_key ends up None and the SKIP_SUMMARY/total_skipped stays 0 even when skips exist—should we read record.get("details", {})["reason"|"count"] or move those fields to the top level?
Finding type: Centralize error handling | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 193-260, inside the
_finalize_with_skips_or_ok function where it builds totals from skip_records, fix the
aggregation to read reason and count from the correct location. The current code uses
record.get("reason") and record.get("count"), but skip records created elsewhere store
those fields under record["details"] (e.g., details={"reason": ..., "count": ...}).
Refactor the loop so it pulls details = record.get("details", {}) and then uses
details.get("reason", "") and int(details.get("count", 0)) when incrementing totals, so
SKIP_SUMMARY and total_skipped accurately reflect real skips. If there are tests
asserting the SKIP_SUMMARY fields, update them or add a targeted test to cover this
mismatch.
There was a problem hiding this comment.
Same as my reply on the sibling thread above — false positive. AuditLog.record(details=...) flattens details into the entry (audit.py:51-52), so top-level record.get("reason") / record.get("count") is correct. Verified with the test_multiple_skip_records__totals_aggregated_by_reason test which asserts non-zero aggregated totals.
There was a problem hiding this comment.
Thanks for the correction — that makes sense. I’ll save this to memory once the PR is merged.
| with pytest.raises(SystemExit) as exc: | ||
| _finalize_with_skips_or_ok( |
There was a problem hiding this comment.
These tests call _finalize_with_skips_or_ok directly instead of the public opik migrate dataset CLI, so they bypass the command wiring guidance in .agents/skills/python-sdk/testing.md — should we move them to runner.invoke(cli, ["migrate", "dataset", ...]) and patch planner/executor inputs rather than the internal helper?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/tests/unit/cli/test_migrate_dataset_planner.py around lines 101-114 (and
similarly 134-149 and 187-195), update the tests in `TestFinalizeWithSkipsOrOk` to stop
calling the private `_finalize_with_skips_or_ok` helper directly. Instead, use
`CliRunner` to invoke the public command `cli` with the `migrate dataset` arguments, and
patch the underlying migrate/planner/executor inputs so that the command emits at least
one `skip` audit record (and in the no-skips case emits none). Keep the same behavioral
assertions (non-zero exit code, SKIP_SUMMARY on stderr, and audit.json finalized
status/content) but derive them from the CLI invocation results and on-disk audit file
produced by the command.
There was a problem hiding this comment.
Intentional. The unit tests target _finalize_with_skips_or_ok directly because that helper is the only logic this PR adds in main.py — the surrounding Click wiring (migrate_group, migrate_dataset_command) is unchanged from origin/main and already covered by the existing TestMigrateHelp CliRunner tests and the planner-level tests in this module.
Going through runner.invoke(cli, ["migrate", "dataset", ...]) to reach this helper would require stubbing the planner, the version-replay phase, the cascade-experiments phase, and the workspace banner — a lot of scaffolding to verify branching logic that's already isolated in the helper. The cascade-level audit-record emission (which produces the skip records the helper reads) IS covered end-to-end in test_migrate_dataset_experiments_cascade.py::TestSkipAuditRecords, so the integration boundary is exercised — just not at the CLI surface.
If we add a follow-up test for a regression that only manifests with full CLI wiring, I'll switch then.
There was a problem hiding this comment.
Thanks for the context — that makes sense given the unchanged Click wiring and existing integration coverage. I'll save this to memory once the PR is merged.
Addresses PR review feedback on OPIK-6599: - Stderr now prints a high-level rollback hint naming the destination dataset / experiments / optimizations / traces / spans the operator must delete, plus the rename-back step on the source. Audit log remains the source of truth for the per-entity ids. Auto-rollback is tracked as a follow-up under OPIK-6635. - Cap missing_trace_sample / missing_item_sample during collection rather than slicing inside _record_skip. Counts come from the always-fully-incremented result.items_skipped_* counters, so the audit record still carries the true total even when the sample is capped. Bounds peak memory on pathological inputs. - New negative-path test: cascade emits a missing_trace_remap skip record when the trace_id_remap is incomplete after _copy_traces_and_spans. Monkey-patches the helper to leave the remap deliberately partial, matching the production case where a BE quirk drops a trace mid-cascade. - Existing planner test now asserts the rollback hint mentions the target project and the source's _v1 name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # High-level rollback hint. We deliberately don't ship a step-by-step | ||
| # CLI playbook here -- the audit log is the source of truth for what | ||
| # was actually created. Each ``ok`` action in ``audit.actions`` carries | ||
| # the destination entity id; an operator can grep the audit JSON to | ||
| # see exactly what landed in the destination project before deciding | ||
| # what to delete. Auto-rollback (a one-flag clean reverse) is tracked | ||
| # as a follow-up; this PR is the loud-fail mechanic only. | ||
| _stderr_console.print( | ||
| f"[yellow]To roll back manually: in project " | ||
| f"'{target_project}', delete the destination dataset " | ||
| f"'{target_label}' along with any experiments, optimizations, " | ||
| f"traces, and spans that were cascaded into it (the audit log " |
There was a problem hiding this comment.
This rollback hint is printed via _stderr_console.print (stderr), but CI guidance that captures only stdout through tee … won’t include it—should we emit it on stdout instead (e.g., print or a stdout console)?
Finding type: Prefer standard logging | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 259-273 where the “High-level
rollback hint” is emitted via `_stderr_console.print`, change it to write to stdout
instead. Replace `_stderr_console` with the stdout console (or use `print`) so CI
pipelines that only capture stdout via `tee` will include the rollback instructions in
summaries/PR bodies. Keep the message content and formatting the same; only the output
stream should change.
There was a problem hiding this comment.
Skipping — the rollback hint deliberately lives on stderr with the rest of the failure block:
opik migrate: N items skipped — destination state was NOT rolled back; see audit log: <path>(stderr, red)SKIP_SUMMARY: ...(stderr, machine-parseable)- The rollback hint (stderr, yellow)
(after Xs)(stderr, dim)
These four lines are one cohesive failure summary. Splitting just line 3 off to stdout would scatter the failure narrative across two streams.
CI that captures both streams via 2>&1 (or just redirects stderr via 2>) keeps the hint. CI that captures only stdout via tee was already missing the loud-fail message, the SKIP_SUMMARY line, and the exit-code context — its operators wouldn't even know the migrate failed, so missing the rollback hint isn't the load-bearing gap. Fixing that CI's capture is the right escalation; moving one line to stdout would only paper over it.
🤖 Reply posted via /address-github-pr-comments
There was a problem hiding this comment.
Thanks, that makes sense — keeping the rollback hint on stderr preserves the single failure-summary block here. I’ll save this to memory once the PR is merged.
|
🔄 Test environment deployment process has started Phase 1: Deploying base version You can monitor the progress here. |
|
✅ Test environment is now available! To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml) Access Information
The deployment has completed successfully and the version has been verified. |
|
🌙 Nightly cleanup: The test environment for this PR ( |
|
🌙 Nightly cleanup: The test environment for this PR ( |
1 similar comment
|
🌙 Nightly cleanup: The test environment for this PR ( |
Details
Fix silent data loss in
opik migrate dataset. Previously the experiment cascade tallied per-item skip counters but recorded them only on acascade_experiments_summaryaudit entry withstatus="ok", and the CLI exited 0 with no stderr signal. Operators relying on the audit log or the exit code had no machine-readable indication that data was lost.cascade_one_experimentnow emits oneskipaudit record per(experiment, reason)pair withcountandsample_source_ids(capped at 20 ids during collection — counts continue to come from the always-incrementedresult.items_skipped_*counters, so the audit row carries the true total even when the sample is truncated).cascade_experiments_summaryflips tostatus="failed"when any skip counter is non-zero.main.pyinspects skip records afterexecute_plan, prints aSKIP_SUMMARY: experiments_skipped=… items_skipped_missing_trace=… items_skipped_missing_item=…line to stderr, finalizes the audit tofailed, and exits 1.--to-project, plus the rename-back step on the source (<name>_v1→<name>). Destination state is intentionally not auto-rolled-back — the audit log is the source of truth for what made it across.Scope: only the data-loss-class skip counters tallied at the cascade summary (
experiments_skipped,items_skipped_missing_trace,items_skipped_missing_item). Cosmetic-degradation paths in_resolver.pyare out of scope (no data lost). No new flag — fail loud by default per AC. Auto-rollback (a--rollback-on-skipstyle UX) is tracked as a follow-up under OPIK-6635.What the user sees
Success (unchanged from before this PR):
Audit log:
"status": "ok". Noskiprecords underactions[].Failure (new):
Audit log:
"status": "failed", plus one{"type": "skip", "status": "skipped", "reason": "items_missing_dataset_item_remap", ...}row per affected experiment. The pre-existingcascade_experiments_summaryaction also flips to"status": "failed". CI pipelines that capture stderr (2>&1/2>) cangrep '^SKIP_SUMMARY:'to gate on specific skip reasons without parsing JSON.Diagram:
diagrams/opik-6599-diagram.html(locally generated; before/after flow, files changed, design decisions).Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
Testing
pytest sdks/python/tests/unit/cli/test_migrate_dataset_experiments_cascade.py— 27 tests, including 4 newTestSkipAuditRecordscases (skip-record shape, sample-id cap, missing-trace negative path, no-skip happy path)pytest sdks/python/tests/unit/cli/test_migrate_dataset_planner.py— 16 tests, including 3 newTestFinalizeWithSkipsOrOkcases (exit code 1, stderrSKIP_SUMMARY:line, rollback hint includes target project +_v1source name, totals aggregated by reason across multiple skip records)pytest sdks/python/tests/unit/cli/test_migrate_*— 100 tests passpre-commit(sdks/python config): ruff + ruff-format + mypy all green on changed filestests/e2e/cli/test_migrate_dataset_*) not run — requires real backend; covered separatelyDocumentation
N/A — internal CLI behavior change; the AC explicitly says no new flag. The existing
opik migrate dataset --helpoutput is unchanged. Audit-log readers gain newskiprecord entries (additive,cascade_experiments_summaryretains its existing fields for back-compat). The new stderr rollback hint is self-documenting for operators who hit the failure path.