[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829
Draft
alexkuzmik wants to merge 3 commits into
Draft
[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829alexkuzmik wants to merge 3 commits into
alexkuzmik wants to merge 3 commits into
Conversation
…tachments Adds a pytest-driven load-test suite under tests_load/suite/python_sdk/ covering the four ingestion shapes from OPIK-6576: high trace/span counts, heavy payloads, explicit and implicit attachments, and burst/spread/concurrent patterns. Tests exercise the SDK via the public surface only — @opik.track decorators and start_as_current_trace / start_as_current_span context managers — so they mirror real user code. Each scenario captures the submitted trace ids, calls opik.flush_tracker, then polls search_traces (with attachment-search workarounds for OPIK-6651) and the attachments list endpoint until every submitted id lands with required fields set. A regression-style assertion fails fast if any submitted id is missing post-flush, catching dropped messages (same shape as the OPIK-6444 unit regression, one level up). Per-phase timings and counts are written to tests_load/.last_run/ <test_name>.json. The new .github/workflows/load_tests.yml runs the suite weekly via cron and on workflow_dispatch, with -n auto --dist=worksteal so independent scenarios run in parallel. A summary step aggregates the per-test JSON into a Markdown table appended to $GITHUB_STEP_SUMMARY so the metrics render directly on the workflow run page. Implements OPIK-6576: Load testing for spans & traces against open-source installation.
…exit start_as_current_span calls _try_acquire_project_name via add_start_candidates on enter, which sets the context_storage project name with the span/trace id as owner. The finally block popped span/ trace data but never called release_context_project_name_if_owner, so the owner leaked across context boundaries. After one start_as_current_span call, any later @opik.track invocation in the same thread silently inherited the leaked project name regardless of its own project_name argument. The decorator path in base_track_decorator.pop_end_candidates already releases by span/trace id on exit; this change makes the context- manager path symmetric. Two regression tests in test_span_context_manager cover the single-call and looped patterns. Discovered while building the OPIK-6576 load-test suite (test ordering across scenarios was non-deterministic because of this leak).
…-load-testing-spans-traces
Comment on lines
+244
to
+246
| encoded_base_url: str = base64.urlsafe_b64encode( | ||
| base_url.encode("utf-8") | ||
| ).decode("ascii") |
Contributor
There was a problem hiding this comment.
verify_attachments is the only caller encoding path with urlsafe_b64encode; should we switch it back to base64.b64encode to match the existing attachments.attachment_list contract?
Finding type: Type Inconsistency | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
tests_load/suite/python_sdk/_helpers.py around lines 242-247 within the
`verify_attachments` function, the code builds `encoded_base_url` using
`base64.urlsafe_b64encode`, which changes the alphabet to `-`/`_`. Refactor this to use
standard `base64.b64encode(base_url.encode('utf-8')).decode('ascii')` so the `path`
query parameter matches the established contract used in
`sdks/python/src/opik/api_objects/attachment/client.py` and
`sdks/python/tests/e2e/verifiers.py`. Keep the rest of the polling logic unchanged and
ensure the generated `path` string no longer contains the url-safe alphabet characters.
Comment on lines
+19
to
+25
| def pytest_addoption(parser: pytest.Parser) -> None: | ||
| parser.addoption( | ||
| "--load-scale", | ||
| type=float, | ||
| default=float(os.getenv("OPIK_LOAD_SCALE", "1.0")), | ||
| help="Multiplier applied to default trace/span counts in load tests.", | ||
| ) |
Contributor
There was a problem hiding this comment.
--load-scale should probably reject values < 1 (or validate in load_scale), otherwise sub-unit inputs can zero out derived counts and trigger ZeroDivisionError/IndexError instead of a clear failure.
Finding type: Logical Bugs | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
tests_load/suite/python_sdk/conftest.py around lines 19-25 (function pytest_addoption)
and lines 49-50 (fixture load_scale), add validation for the --load-scale option so only
strictly positive values are allowed. If the parsed value is <= 0 (including 0 or
negative/sub-unit values the suite doesn’t support), fail fast with a clear pytest
error message indicating that --load-scale must be > 0. Ensure the validation happens
before any load-test computations so tests don’t crash with
ZeroDivisionError/IndexError later.
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Details
Adds a pytest-driven load-test suite under
tests_load/suite/python_sdk/covering the four ingestion shapes from OPIK-6576. Tests use the SDK via the public surface only (@opik.trackdecorators andstart_as_current_trace/start_as_current_spancontext managers) so they mirror real user code. Each scenario captures submitted trace ids, callsopik.flush_tracker(), then pollssearch_traces/attachments.attachment_listuntil every submitted id lands with required fields set — any dropped message fails the test with a sample of missing ids.test_ingestion_rate.py— high trace count (100k traces × 1 nested span) and high spans-per-trace (5k traces × 50 spans).test_heavy_payload.py— 500 traces × (1 MB in + 1 MB out); 200 traces × 5 spans × (500 KB in + 500 KB out).test_attachments.py— explicitAttachment(...)uploads and implicit base64 extraction (250 KB+ blobs auto-extracted by the SDK).test_bursts.py— tight-loop burst (50k traces), steady spread over 10 min, 30 threads sharing oneOpikclient.Bundles a small SDK fix discovered while building the suite:
start_as_current_spanwas leaking thecontext_storageproject-name owner on exit, breaking any subsequent@trackcall in the same thread. Two regression unit tests cover the single-call and looped patterns.New
.github/workflows/load_tests.ymlruns the suite weekly via cron and onworkflow_dispatch, with-n auto --dist=workstealso independent scenarios run in parallel. A summary step aggregates per-test JSON into a Markdown table appended to$GITHUB_STEP_SUMMARYso metrics render directly on the workflow run page (artifact upload kept too).Change checklist
Issues
AI-WATERMARK
AI-WATERMARK: yes
./opik.shinstall at scales 0.01 and 0.1 (xdist); SDK unit tests run after the fix; YAML and pytest collection validated.Testing
pytest suite/python_sdk --load-scale 0.01against local Opik: 9 passed in 16s.pytest suite/python_sdk --load-scale 0.1 -n auto --dist=workstealagainst local Opik: 9 passed in 66s (spread test is window-locked at 60 s, everything else fits inside).pytest sdks/python/tests/unit/decorator/context_manager/test_span_context_manager.py sdks/python/tests/unit/decorator/test_project_name_context.py: 24 passed, including 2 new regressions for the context-leak fix.python -c "import yaml; yaml.safe_load(open(...))").test_spread_over_timeis window-locked at exactly 10 min;test_many_traces_one_span_eachandtest_traces_with_one_megabyte_payload~6 and ~8 min respectively). Workflow timeout set to 60 min for headroom.AttachmentService.listcan't readworkspaceNamefrom the reactor context during enrichment. The suite usesexclude=["input","output","metadata"]in its verify helper to skip that path; can be dropped once OPIK-6651 is fixed.Documentation
tests_load/README.mdrewritten for the new structure (suite/<target>/convention), scenario table with default volumes, install/run commands, scaling via--load-scale, and the scheduled CI run.tests_load/suite/python_sdk/_helpers.py::verify_tracesdocstring references OPIK-6651 for the attachment-search workaround so the comment stays useful until that bug is fixed.