Skip to content

[OPIK-6599] [SDK] fix: opik migrate fails loud on skipped items#6782

Open
JetoPistola wants to merge 2 commits into
mainfrom
danield/OPIK-6599-opik-migrate-silently-skips-items-and-exits-0
Open

[OPIK-6599] [SDK] fix: opik migrate fails loud on skipped items#6782
JetoPistola wants to merge 2 commits into
mainfrom
danield/OPIK-6599-opik-migrate-silently-skips-items-and-exits-0

Conversation

@JetoPistola
Copy link
Copy Markdown
Contributor

@JetoPistola JetoPistola commented May 20, 2026

Details

image

Fix silent data loss in opik migrate dataset. Previously the experiment cascade tallied per-item skip counters but recorded them only on a cascade_experiments_summary audit entry with status="ok", and the CLI exited 0 with no stderr signal. Operators relying on the audit log or the exit code had no machine-readable indication that data was lost.

  • cascade_one_experiment now emits one skip audit record per (experiment, reason) pair with count and sample_source_ids (capped at 20 ids during collection — counts continue to come from the always-incremented result.items_skipped_* counters, so the audit row carries the true total even when the sample is truncated).
  • cascade_experiments_summary flips to status="failed" when any skip counter is non-zero.
  • main.py inspects skip records after execute_plan, prints a SKIP_SUMMARY: experiments_skipped=… items_skipped_missing_trace=… items_skipped_missing_item=… line to stderr, finalizes the audit to failed, and exits 1.
  • Stderr also prints a high-level rollback hint naming the destination dataset, experiments, optimizations, traces, and spans the operator must delete in --to-project, plus the rename-back step on the source (<name>_v1<name>). Destination state is intentionally not auto-rolled-back — the audit log is the source of truth for what made it across.

Scope: only the data-loss-class skip counters tallied at the cascade summary (experiments_skipped, items_skipped_missing_trace, items_skipped_missing_item). Cosmetic-degradation paths in _resolver.py are out of scope (no data lost). No new flag — fail loud by default per AC. Auto-rollback (a --rollback-on-skip style UX) is tracked as a follow-up under OPIK-6635.

What the user sees

Success (unchanged from before this PR):

$ opik migrate dataset MyDataset --to-project=DestProject
Workspace: my-workspace
<plan table + per-experiment progress bars>
Migrated 'MyDataset' into project 'DestProject' as 'MyDataset'. Took 2m 14s. Audit log: ./opik-migrate-20260520T060114Z.json
$ echo $?
0

Audit log: "status": "ok". No skip records under actions[].

Failure (new):

$ opik migrate dataset MyDataset --to-project=DestProject
Workspace: my-workspace
<plan table + per-experiment progress bars>
<stdout from the cascade, including the existing yellow `Skipped N items due to missing data:` line from imports/experiment.py>
[stderr] opik migrate: 2500 items skipped — destination state was NOT rolled back; see audit log: ./opik-migrate-20260520T060114Z.json
[stderr] SKIP_SUMMARY: experiments_skipped=0 items_skipped_missing_trace=0 items_skipped_missing_item=2500
[stderr] To roll back manually: in project 'DestProject', delete the destination dataset 'MyDataset' along with any experiments, optimizations, traces, and spans that were cascaded into it (the audit log lists each created entity id); then rename the source 'MyDataset_v1' back to 'MyDataset'.
[stderr] (after 2m 14s)
$ echo $?
1

Audit log: "status": "failed", plus one {"type": "skip", "status": "skipped", "reason": "items_missing_dataset_item_remap", ...} row per affected experiment. The pre-existing cascade_experiments_summary action also flips to "status": "failed". CI pipelines that capture stderr (2>&1 / 2>) can grep '^SKIP_SUMMARY:' to gate on specific skip reasons without parsing JSON.

Diagram: diagrams/opik-6599-diagram.html (locally generated; before/after flow, files changed, design decisions).

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-6599
  • Follow-up: OPIK-6635 (auto-rollback UX on partial migrate)

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.7
  • Scope: full implementation
  • Human verification: code review + targeted unit tests

Testing

  • pytest sdks/python/tests/unit/cli/test_migrate_dataset_experiments_cascade.py — 27 tests, including 4 new TestSkipAuditRecords cases (skip-record shape, sample-id cap, missing-trace negative path, no-skip happy path)
  • pytest sdks/python/tests/unit/cli/test_migrate_dataset_planner.py — 16 tests, including 3 new TestFinalizeWithSkipsOrOk cases (exit code 1, stderr SKIP_SUMMARY: line, rollback hint includes target project + _v1 source name, totals aggregated by reason across multiple skip records)
  • Full migrate suite: pytest sdks/python/tests/unit/cli/test_migrate_*100 tests pass
  • pre-commit (sdks/python config): ruff + ruff-format + mypy all green on changed files
  • E2E (tests/e2e/cli/test_migrate_dataset_*) not run — requires real backend; covered separately

Documentation

N/A — internal CLI behavior change; the AC explicitly says no new flag. The existing opik migrate dataset --help output is unchanged. Audit-log readers gain new skip record entries (additive, cascade_experiments_summary retains its existing fields for back-compat). The new stderr rollback hint is self-documenting for operators who hit the failure path.

Previously the experiment cascade tallied skip counters but recorded
them only on the cascade_experiments_summary audit entry (status="ok"),
and the CLI exited 0 with no stderr signal. Operators relying on the
audit log or the exit code had no machine-readable indication that
data was lost.

Now:
- cascade_one_experiment emits one "skip" audit record per
  (experiment, reason) with count + sample source ids (capped at 20)
- cascade_experiments_summary flips to status="failed" when any
  skip counter is non-zero
- main.py inspects skip records after execute_plan, prints a
  SKIP_SUMMARY line to stderr, finalizes the audit to "failed",
  and exits 1

Destination state is intentionally NOT rolled back -- operators
read the audit log to know what made it across. The failure message
calls this out explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels May 20, 2026
Comment on lines +472 to 483
#
# Per-(experiment, reason) audit records are emitted at the end with
# the affected source ids (capped at ``_SKIP_SAMPLE_LIMIT``) so the
# CLI can fail loud with a machine-readable breakdown -- see OPIK-6599.
missing_trace_ids: List[str] = []
missing_item_ids: List[str] = []
for item in items:
if item.trace_id and item.trace_id not in result.trace_id_remap:
result.items_skipped_missing_trace += 1
missing_trace_ids.append(item.trace_id)
if item.dataset_item_id and item.dataset_item_id not in item_id_remap:
result.items_skipped_missing_item += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_trace_ids/missing_item_ids grow unbounded before being trimmed in _record_skip — should we cap them at _SKIP_SAMPLE_LIMIT during collection? Also, should we add a test for the missing-trace negative path asserting a skip audit with reason == "items_missing_trace_remap" and the expected count/sample?

Finding types: Reliable test patterns Batch and stream | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Two related improvements in `cascade_one_experiment` (experiments.py ~lines 472-503): 1.
Cap sample lists during collection: instead of appending every missing id and trimming
later in `_record_skip`, append only while `len(missing_trace_ids) < _SKIP_SAMPLE_LIMIT`
(same for `missing_item_ids`). Track total miss counts independently (or via
already-incremented counters like `result.items_skipped_missing_trace`) and pass those
as `count=` to `_record_skip`. 2. In `test_migrate_dataset_experiments_cascade.py`, add
a test for the missing-trace negative path: simulate an experiment cascade where
`dataset_item_id` remap is present but `trace_id` is NOT in `result.trace_id_remap`.
Assert the audit log receives exactly one `skip` record with `details.reason ==
"items_missing_trace_remap"`, `details.count` equals the expected number of missing
trace ids, and `details.sample_source_ids` contains the expected offending trace ids
(respecting `_SKIP_SAMPLE_LIMIT`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit e624694 addressed this comment by capping the missing-trace/item sample lists as they are collected and passing the total counts to the new _record_skip helper so the audit record keeps accurate counts with a bounded sample. It also emits explicit skip audit records for the missing-trace and missing-item reasons, including the newly added _SKIP_SAMPLE_LIMIT. The accompanying tests now cover the missing-trace negative path, asserting the audit reason, count, and sampled source IDs are as expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the unbounded growth. Fixed in e624694missing_trace_sample / missing_item_sample are now capped at _SKIP_SAMPLE_LIMIT during collection (the appends are guarded by len(...) < _SKIP_SAMPLE_LIMIT). count continues to come from the always-fully-incremented result.items_skipped_* counters, so the audit record still carries the true total even when the sample is capped.

Also added the missing-trace negative-path test you asked for: TestSkipAuditRecords.test_missing_trace_remap__emits_skip_record. It monkey-patches _copy_traces_and_spans so the trace_id_remap is deliberately incomplete after the trace-copy phase (mirroring a production BE quirk where a trace gets dropped mid-cascade), then asserts the cascade emits exactly one skip record with reason == "items_missing_trace_remap", the right count, and the orphan trace_id in sample_source_ids.

Comment on lines +211 to +222
skip_records = [
action for action in audit.actions if action.get("status") == "skipped"
]
if not skip_records:
audit.finalize("ok")
audit.write(audit_path)
elapsed = _format_elapsed(elapsed_seconds)
console.print(
f"[green]Migrated '{name}' into project '{target_project}' as "
f"'{target_label}'.[/green] Took {elapsed}. Audit log: {audit_path}"
)
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKIP_SUMMARY reads record.get("reason", ""), but _record_skip puts reason under details, so reason_to_total_key.get(...) ends up None and totals/total_skipped stay at 0; should we pull reason and count from record.get("details", {}).get("reason", "") and record.get("details", {}).get("count", 0)?

Finding type: Validate nullable inputs | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 211-242 inside
`_finalize_with_skips_or_ok`, fix the SKIP_SUMMARY aggregation logic so it reads skip
classification from the actual audit record shape. Update the loop to extract `reason`
and `count` from `record.get('details', {})` (e.g., `details.get('reason','')` and
`details.get('count',0)`) instead of `record.get('reason', '')`, and then use those
values to increment `totals` so `total_skipped` reflects real skips. Ensure the existing
mapping in `reason_to_total_key` and the final stderr messages stay the same, but now
compute correct counts when skip records are present.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified this against the audit-record shape and the bot is mistaken here. AuditLog.record(details=...) in audit.py:51-52 flattens details into the entry via entry.update(details), so record["reason"] and record["count"] are on the top level, not nested under details. Quick repro:

>>> from opik.cli.migrate.audit import AuditLog
>>> a = AuditLog(command='test', args={})
>>> a.record(type='skip', status='skipped', details={'reason': 'r', 'count': 42})
>>> a.actions[0]
{'type': 'skip', 'status': 'skipped', 'recorded_at': '...', 'reason': 'r', 'count': 42}

The existing test_multiple_skip_records__totals_aggregated_by_reason test asserts the aggregation produces items_skipped_missing_trace=14 and items_skipped_missing_item=6 from skip records carrying count=7 and count=3 respectively — it would fail with =0 =0 if the read sites were broken. So the current code is correct as written.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the correction — you're right, I misread the audit record shape here. I'll save this to memory once the PR is merged.

Comment on lines +233 to +236
"experiment_recreate_returned_false": "experiments_skipped",
"items_missing_trace_remap": "items_skipped_missing_trace",
"items_missing_dataset_item_remap": "items_skipped_missing_item",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip-aggregation loop reads record.get("reason")/record.get("count"), but _record_skip stores them under details, so total_key ends up None and the SKIP_SUMMARY/total_skipped stays 0 even when skips exist—should we read record.get("details", {})["reason"|"count"] or move those fields to the top level?

Finding type: Centralize error handling | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 193-260, inside the
_finalize_with_skips_or_ok function where it builds totals from skip_records, fix the
aggregation to read reason and count from the correct location. The current code uses
record.get("reason") and record.get("count"), but skip records created elsewhere store
those fields under record["details"] (e.g., details={"reason": ..., "count": ...}).
Refactor the loop so it pulls details = record.get("details", {}) and then uses
details.get("reason", "") and int(details.get("count", 0)) when incrementing totals, so
SKIP_SUMMARY and total_skipped accurately reflect real skips. If there are tests
asserting the SKIP_SUMMARY fields, update them or add a targeted test to cover this
mismatch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my reply on the sibling thread above — false positive. AuditLog.record(details=...) flattens details into the entry (audit.py:51-52), so top-level record.get("reason") / record.get("count") is correct. Verified with the test_multiple_skip_records__totals_aggregated_by_reason test which asserts non-zero aggregated totals.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the correction — that makes sense. I’ll save this to memory once the PR is merged.

Comment on lines +106 to +107
with pytest.raises(SystemExit) as exc:
_finalize_with_skips_or_ok(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests call _finalize_with_skips_or_ok directly instead of the public opik migrate dataset CLI, so they bypass the command wiring guidance in .agents/skills/python-sdk/testing.md — should we move them to runner.invoke(cli, ["migrate", "dataset", ...]) and patch planner/executor inputs rather than the internal helper?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/tests/unit/cli/test_migrate_dataset_planner.py around lines 101-114 (and
similarly 134-149 and 187-195), update the tests in `TestFinalizeWithSkipsOrOk` to stop
calling the private `_finalize_with_skips_or_ok` helper directly. Instead, use
`CliRunner` to invoke the public command `cli` with the `migrate dataset` arguments, and
patch the underlying migrate/planner/executor inputs so that the command emits at least
one `skip` audit record (and in the no-skips case emits none). Keep the same behavioral
assertions (non-zero exit code, SKIP_SUMMARY on stderr, and audit.json finalized
status/content) but derive them from the CLI invocation results and on-disk audit file
produced by the command.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. The unit tests target _finalize_with_skips_or_ok directly because that helper is the only logic this PR adds in main.py — the surrounding Click wiring (migrate_group, migrate_dataset_command) is unchanged from origin/main and already covered by the existing TestMigrateHelp CliRunner tests and the planner-level tests in this module.

Going through runner.invoke(cli, ["migrate", "dataset", ...]) to reach this helper would require stubbing the planner, the version-replay phase, the cascade-experiments phase, and the workspace banner — a lot of scaffolding to verify branching logic that's already isolated in the helper. The cascade-level audit-record emission (which produces the skip records the helper reads) IS covered end-to-end in test_migrate_dataset_experiments_cascade.py::TestSkipAuditRecords, so the integration boundary is exercised — just not at the CLI surface.

If we add a follow-up test for a regression that only manifests with full CLI wiring, I'll switch then.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context — that makes sense given the unchanged Click wiring and existing integration coverage. I'll save this to memory once the PR is merged.

Addresses PR review feedback on OPIK-6599:

- Stderr now prints a high-level rollback hint naming the destination
  dataset / experiments / optimizations / traces / spans the operator
  must delete, plus the rename-back step on the source. Audit log
  remains the source of truth for the per-entity ids. Auto-rollback
  is tracked as a follow-up under OPIK-6635.
- Cap missing_trace_sample / missing_item_sample during collection
  rather than slicing inside _record_skip. Counts come from the
  always-fully-incremented result.items_skipped_* counters, so the
  audit record still carries the true total even when the sample is
  capped. Bounds peak memory on pathological inputs.
- New negative-path test: cascade emits a missing_trace_remap skip
  record when the trace_id_remap is incomplete after _copy_traces_and_spans.
  Monkey-patches the helper to leave the remap deliberately partial,
  matching the production case where a BE quirk drops a trace mid-cascade.
- Existing planner test now asserts the rollback hint mentions the
  target project and the source's _v1 name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +259 to +270
# High-level rollback hint. We deliberately don't ship a step-by-step
# CLI playbook here -- the audit log is the source of truth for what
# was actually created. Each ``ok`` action in ``audit.actions`` carries
# the destination entity id; an operator can grep the audit JSON to
# see exactly what landed in the destination project before deciding
# what to delete. Auto-rollback (a one-flag clean reverse) is tracked
# as a follow-up; this PR is the loud-fail mechanic only.
_stderr_console.print(
f"[yellow]To roll back manually: in project "
f"'{target_project}', delete the destination dataset "
f"'{target_label}' along with any experiments, optimizations, "
f"traces, and spans that were cascaded into it (the audit log "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rollback hint is printed via _stderr_console.print (stderr), but CI guidance that captures only stdout through tee … won’t include it—should we emit it on stdout instead (e.g., print or a stdout console)?

Finding type: Prefer standard logging | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/migrate/main.py around lines 259-273 where the “High-level
rollback hint” is emitted via `_stderr_console.print`, change it to write to stdout
instead. Replace `_stderr_console` with the stdout console (or use `print`) so CI
pipelines that only capture stdout via `tee` will include the rollback instructions in
summaries/PR bodies. Keep the message content and formatting the same; only the output
stream should change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping — the rollback hint deliberately lives on stderr with the rest of the failure block:

  1. opik migrate: N items skipped — destination state was NOT rolled back; see audit log: <path> (stderr, red)
  2. SKIP_SUMMARY: ... (stderr, machine-parseable)
  3. The rollback hint (stderr, yellow)
  4. (after Xs) (stderr, dim)

These four lines are one cohesive failure summary. Splitting just line 3 off to stdout would scatter the failure narrative across two streams.

CI that captures both streams via 2>&1 (or just redirects stderr via 2>) keeps the hint. CI that captures only stdout via tee was already missing the loud-fail message, the SKIP_SUMMARY line, and the exit-code context — its operators wouldn't even know the migrate failed, so missing the rollback hint isn't the load-bearing gap. Fixing that CI's capture is the right escalation; moving one line to stdout would only paper over it.

🤖 Reply posted via /address-github-pr-comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense — keeping the rollback hint on stderr preserves the single failure-summary block here. I’ll save this to memory once the PR is merged.

@JetoPistola JetoPistola added the test-environment Deploy Opik adhoc environment label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔄 Test environment deployment process has started

Phase 1: Deploying base version 2.0.42 (from main branch) if environment doesn't exist
Phase 2: Building new images from PR branch danield/OPIK-6599-opik-migrate-silently-skips-items-and-exits-0
Phase 3: Will deploy newly built version after build completes

You can monitor the progress here.

@JetoPistola JetoPistola requested a review from alexkuzmik May 20, 2026 18:46
@JetoPistola JetoPistola marked this pull request as ready for review May 20, 2026 18:46
@JetoPistola JetoPistola requested a review from a team as a code owner May 20, 2026 18:46
@CometActions
Copy link
Copy Markdown
Collaborator

Test environment is now available!

To configure additional Environment variables for your environment, run [Deploy Opik AdHoc Environment workflow] (https://github.com/comet-ml/comet-deployment/actions/workflows/deploy_opik_adhoc_env.yaml)

Access Information

The deployment has completed successfully and the version has been verified.

@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

@CometActions CometActions removed the test-environment Deploy Opik adhoc environment label May 21, 2026
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

1 similar comment
@CometActions
Copy link
Copy Markdown
Collaborator

🌙 Nightly cleanup: The test environment for this PR (pr-6782) has been cleaned up to free cluster resources. PVCs are preserved — re-deploy to restore the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants