[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments by alexkuzmik · Pull Request #6829 · comet-ml/opik

alexkuzmik · 2026-05-22T13:30:13Z

Details

Adds a pytest-driven load-test suite under tests_load/suite/python_sdk/ covering the four ingestion shapes from OPIK-6576. Tests use the SDK via the public surface only (@opik.track decorators and start_as_current_trace / start_as_current_span context managers) so they mirror real user code. Each scenario captures submitted trace ids, calls opik.flush_tracker(), then polls search_traces / attachments.attachment_list until every submitted id lands with required fields set — any dropped message fails the test with a sample of missing ids.

test_ingestion_rate.py — high trace count (100k traces × 1 nested span) and high spans-per-trace (5k traces × 50 spans).
test_heavy_payload.py — 500 traces × (1 MB in + 1 MB out); 200 traces × 5 spans × (500 KB in + 500 KB out).
test_attachments.py — explicit Attachment(...) uploads and implicit base64 extraction (250 KB+ blobs auto-extracted by the SDK).
test_bursts.py — tight-loop burst (50k traces), steady spread over 10 min, 30 threads sharing one Opik client.

Bundles a small SDK fix discovered while building the suite: start_as_current_span was leaking the context_storage project-name owner on exit, breaking any subsequent @track call in the same thread. Two regression unit tests cover the single-call and looped patterns.

New .github/workflows/load_tests.yml runs the suite weekly via cron and on workflow_dispatch, with -n auto --dist=worksteal so independent scenarios run in parallel. A summary step aggregates per-test JSON into a Markdown table appended to $GITHUB_STEP_SUMMARY so metrics render directly on the workflow run page (artifact upload kept too).

Change checklist

User facing
Documentation update

Issues

OPIK-6576

AI-WATERMARK

AI-WATERMARK: yes

Tools: Claude Code
Model(s): Claude Opus 4.7
Scope: full implementation — suite scenarios, helpers, conftest, workflow, README, SDK context-leak fix and its regression tests
Human verification: iteratively reviewed and redirected by the author across multiple passes (scope, structure, volumes, env handling). Local smoke runs against a ./opik.sh install at scales 0.01 and 0.1 (xdist); SDK unit tests run after the fix; YAML and pytest collection validated.

Testing

pytest suite/python_sdk --load-scale 0.01 against local Opik: 9 passed in 16s.
pytest suite/python_sdk --load-scale 0.1 -n auto --dist=worksteal against local Opik: 9 passed in 66s (spread test is window-locked at 60 s, everything else fits inside).
pytest sdks/python/tests/unit/decorator/context_manager/test_span_context_manager.py sdks/python/tests/unit/decorator/test_project_name_context.py: 24 passed, including 2 new regressions for the context-leak fix.
Workflow YAML parses cleanly (python -c "import yaml; yaml.safe_load(open(...))").
Extrapolated full-scale runtime under xdist: ~10–15 min (longest test test_spread_over_time is window-locked at exactly 10 min; test_many_traces_one_span_each and test_traces_with_one_megabyte_payload ~6 and ~8 min respectively). Workflow timeout set to 60 min for headroom.
The implicit-attachment scenario also exercises an attachment-search workaround for OPIK-6651 (filed separately): backend trace search streams 0 results when traces contain attachment references because AttachmentService.list can't read workspaceName from the reactor context during enrichment. The suite uses exclude=["input","output","metadata"] in its verify helper to skip that path; can be dropped once OPIK-6651 is fixed.

Documentation

tests_load/README.md rewritten for the new structure (suite/<target>/ convention), scenario table with default volumes, install/run commands, scaling via --load-scale, and the scheduled CI run.
tests_load/suite/python_sdk/_helpers.py::verify_traces docstring references OPIK-6651 for the attachment-search workaround so the comment stays useful until that bug is fixed.
Per-test docstrings describe the scenario, the SDK API path used (decorator vs context manager), the volume at default scale, and what's verified.

…tachments Adds a pytest-driven load-test suite under tests_load/suite/python_sdk/ covering the four ingestion shapes from OPIK-6576: high trace/span counts, heavy payloads, explicit and implicit attachments, and burst/spread/concurrent patterns. Tests exercise the SDK via the public surface only — @opik.track decorators and start_as_current_trace / start_as_current_span context managers — so they mirror real user code. Each scenario captures the submitted trace ids, calls opik.flush_tracker, then polls search_traces (with attachment-search workarounds for OPIK-6651) and the attachments list endpoint until every submitted id lands with required fields set. A regression-style assertion fails fast if any submitted id is missing post-flush, catching dropped messages (same shape as the OPIK-6444 unit regression, one level up). Per-phase timings and counts are written to tests_load/.last_run/ <test_name>.json. The new .github/workflows/load_tests.yml runs the suite weekly via cron and on workflow_dispatch, with -n auto --dist=worksteal so independent scenarios run in parallel. A summary step aggregates the per-test JSON into a Markdown table appended to $GITHUB_STEP_SUMMARY so the metrics render directly on the workflow run page. Implements OPIK-6576: Load testing for spans & traces against open-source installation.

…exit start_as_current_span calls _try_acquire_project_name via add_start_candidates on enter, which sets the context_storage project name with the span/trace id as owner. The finally block popped span/ trace data but never called release_context_project_name_if_owner, so the owner leaked across context boundaries. After one start_as_current_span call, any later @opik.track invocation in the same thread silently inherited the leaked project name regardless of its own project_name argument. The decorator path in base_track_decorator.pop_end_candidates already releases by span/trace id on exit; this change makes the context- manager path symmetric. Two regression tests in test_span_context_manager cover the single-call and looped patterns. Discovered while building the OPIK-6576 load-test suite (test ordering across scenarios was non-deterministic because of this leak).

…-load-testing-spans-traces

baz-reviewer · 2026-05-22T13:34:24Z

+    encoded_base_url: str = base64.urlsafe_b64encode(
+        base_url.encode("utf-8")
+    ).decode("ascii")


verify_attachments is the only caller encoding path with urlsafe_b64encode; should we switch it back to base64.b64encode to match the existing attachments.attachment_list contract?

_{Finding type: Type Inconsistency | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In tests_load/suite/python_sdk/_helpers.py around lines 242-247 within the `verify_attachments` function, the code builds `encoded_base_url` using `base64.urlsafe_b64encode`, which changes the alphabet to `-`/`_`. Refactor this to use standard `base64.b64encode(base_url.encode('utf-8')).decode('ascii')` so the `path` query parameter matches the established contract used in `sdks/python/src/opik/api_objects/attachment/client.py` and `sdks/python/tests/e2e/verifiers.py`. Keep the rest of the polling logic unchanged and ensure the generated `path` string no longer contains the url-safe alphabet characters.

baz-reviewer · 2026-05-22T13:34:24Z

+def pytest_addoption(parser: pytest.Parser) -> None:
+    parser.addoption(
+        "--load-scale",
+        type=float,
+        default=float(os.getenv("OPIK_LOAD_SCALE", "1.0")),
+        help="Multiplier applied to default trace/span counts in load tests.",
+    )


--load-scale should probably reject values < 1 (or validate in load_scale), otherwise sub-unit inputs can zero out derived counts and trigger ZeroDivisionError/IndexError instead of a clear failure.

_{Finding type: Logical Bugs | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents

Before applying, verify this suggestion against the current code. In tests_load/suite/python_sdk/conftest.py around lines 19-25 (function pytest_addoption) and lines 49-50 (fixture load_scale), add validation for the --load-scale option so only strictly positive values are allowed. If the parsed value is <= 0 (including 0 or negative/sub-unit values the suite doesn’t support), fail fast with a clear pytest error message indicating that --load-scale must be > 0. Ensure the validation happens before any load-test computations so tests don’t crash with ZeroDivisionError/IndexError later.

alexkuzmik added 3 commits May 22, 2026 15:26

Merge remote-tracking branch 'origin/main' into aliaksandrk/OPIK-6576…

7ec310d

…-load-testing-spans-traces

github-actions Bot assigned alexkuzmik May 22, 2026

github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file python Pull requests that update Python code Infrastructure tests Including test files, or tests related like configuration. Python SDK labels May 22, 2026

baz-reviewer Bot reviewed May 22, 2026

View reviewed changes

This was referenced May 22, 2026

[NA] [SDK] fix(harbor): support both _setup_environment and _setup_agent_environment #6834

Merged

[NA] [SDK] test(crewai): tolerate v1 agent reasoning-loop spans in cy… #6833

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829

[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829
alexkuzmik wants to merge 3 commits into
mainfrom
aliaksandrk/OPIK-6576-load-testing-spans-traces

alexkuzmik commented May 22, 2026

Uh oh!

baz-reviewer Bot May 22, 2026

Uh oh!

baz-reviewer Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexkuzmik commented May 22, 2026

Details

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

baz-reviewer Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant