Skip to content

[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829

Draft
alexkuzmik wants to merge 3 commits into
mainfrom
aliaksandrk/OPIK-6576-load-testing-spans-traces
Draft

[OPIK-6576] [SDK] feat: add load-test suite for traces, spans, and attachments#6829
alexkuzmik wants to merge 3 commits into
mainfrom
aliaksandrk/OPIK-6576-load-testing-spans-traces

Conversation

@alexkuzmik
Copy link
Copy Markdown
Collaborator

Details

Adds a pytest-driven load-test suite under tests_load/suite/python_sdk/ covering the four ingestion shapes from OPIK-6576. Tests use the SDK via the public surface only (@opik.track decorators and start_as_current_trace / start_as_current_span context managers) so they mirror real user code. Each scenario captures submitted trace ids, calls opik.flush_tracker(), then polls search_traces / attachments.attachment_list until every submitted id lands with required fields set — any dropped message fails the test with a sample of missing ids.

  • test_ingestion_rate.py — high trace count (100k traces × 1 nested span) and high spans-per-trace (5k traces × 50 spans).
  • test_heavy_payload.py — 500 traces × (1 MB in + 1 MB out); 200 traces × 5 spans × (500 KB in + 500 KB out).
  • test_attachments.py — explicit Attachment(...) uploads and implicit base64 extraction (250 KB+ blobs auto-extracted by the SDK).
  • test_bursts.py — tight-loop burst (50k traces), steady spread over 10 min, 30 threads sharing one Opik client.

Bundles a small SDK fix discovered while building the suite: start_as_current_span was leaking the context_storage project-name owner on exit, breaking any subsequent @track call in the same thread. Two regression unit tests cover the single-call and looped patterns.

New .github/workflows/load_tests.yml runs the suite weekly via cron and on workflow_dispatch, with -n auto --dist=worksteal so independent scenarios run in parallel. A summary step aggregates per-test JSON into a Markdown table appended to $GITHUB_STEP_SUMMARY so metrics render directly on the workflow run page (artifact upload kept too).

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-6576

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.7
  • Scope: full implementation — suite scenarios, helpers, conftest, workflow, README, SDK context-leak fix and its regression tests
  • Human verification: iteratively reviewed and redirected by the author across multiple passes (scope, structure, volumes, env handling). Local smoke runs against a ./opik.sh install at scales 0.01 and 0.1 (xdist); SDK unit tests run after the fix; YAML and pytest collection validated.

Testing

  • pytest suite/python_sdk --load-scale 0.01 against local Opik: 9 passed in 16s.
  • pytest suite/python_sdk --load-scale 0.1 -n auto --dist=worksteal against local Opik: 9 passed in 66s (spread test is window-locked at 60 s, everything else fits inside).
  • pytest sdks/python/tests/unit/decorator/context_manager/test_span_context_manager.py sdks/python/tests/unit/decorator/test_project_name_context.py: 24 passed, including 2 new regressions for the context-leak fix.
  • Workflow YAML parses cleanly (python -c "import yaml; yaml.safe_load(open(...))").
  • Extrapolated full-scale runtime under xdist: ~10–15 min (longest test test_spread_over_time is window-locked at exactly 10 min; test_many_traces_one_span_each and test_traces_with_one_megabyte_payload ~6 and ~8 min respectively). Workflow timeout set to 60 min for headroom.
  • The implicit-attachment scenario also exercises an attachment-search workaround for OPIK-6651 (filed separately): backend trace search streams 0 results when traces contain attachment references because AttachmentService.list can't read workspaceName from the reactor context during enrichment. The suite uses exclude=["input","output","metadata"] in its verify helper to skip that path; can be dropped once OPIK-6651 is fixed.

Documentation

  • tests_load/README.md rewritten for the new structure (suite/<target>/ convention), scenario table with default volumes, install/run commands, scaling via --load-scale, and the scheduled CI run.
  • tests_load/suite/python_sdk/_helpers.py::verify_traces docstring references OPIK-6651 for the attachment-search workaround so the comment stays useful until that bug is fixed.
  • Per-test docstrings describe the scenario, the SDK API path used (decorator vs context manager), the volume at default scale, and what's verified.

…tachments

Adds a pytest-driven load-test suite under tests_load/suite/python_sdk/
covering the four ingestion shapes from OPIK-6576: high trace/span
counts, heavy payloads, explicit and implicit attachments, and
burst/spread/concurrent patterns. Tests exercise the SDK via the public
surface only — @opik.track decorators and start_as_current_trace /
start_as_current_span context managers — so they mirror real user code.

Each scenario captures the submitted trace ids, calls opik.flush_tracker,
then polls search_traces (with attachment-search workarounds for
OPIK-6651) and the attachments list endpoint until every submitted id
lands with required fields set. A regression-style assertion fails fast
if any submitted id is missing post-flush, catching dropped messages
(same shape as the OPIK-6444 unit regression, one level up).

Per-phase timings and counts are written to tests_load/.last_run/
<test_name>.json. The new .github/workflows/load_tests.yml runs the
suite weekly via cron and on workflow_dispatch, with -n auto
--dist=worksteal so independent scenarios run in parallel. A summary
step aggregates the per-test JSON into a Markdown table appended to
$GITHUB_STEP_SUMMARY so the metrics render directly on the workflow
run page.

Implements OPIK-6576: Load testing for spans & traces against
open-source installation.
…exit

start_as_current_span calls _try_acquire_project_name via
add_start_candidates on enter, which sets the context_storage project
name with the span/trace id as owner. The finally block popped span/
trace data but never called release_context_project_name_if_owner, so
the owner leaked across context boundaries. After one
start_as_current_span call, any later @opik.track invocation in the
same thread silently inherited the leaked project name regardless of
its own project_name argument.

The decorator path in base_track_decorator.pop_end_candidates already
releases by span/trace id on exit; this change makes the context-
manager path symmetric. Two regression tests in
test_span_context_manager cover the single-call and looped patterns.

Discovered while building the OPIK-6576 load-test suite (test ordering
across scenarios was non-deterministic because of this leak).
@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file python Pull requests that update Python code Infrastructure tests Including test files, or tests related like configuration. Python SDK labels May 22, 2026
Comment on lines +244 to +246
encoded_base_url: str = base64.urlsafe_b64encode(
base_url.encode("utf-8")
).decode("ascii")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_attachments is the only caller encoding path with urlsafe_b64encode; should we switch it back to base64.b64encode to match the existing attachments.attachment_list contract?

Finding type: Type Inconsistency | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
tests_load/suite/python_sdk/_helpers.py around lines 242-247 within the
`verify_attachments` function, the code builds `encoded_base_url` using
`base64.urlsafe_b64encode`, which changes the alphabet to `-`/`_`. Refactor this to use
standard `base64.b64encode(base_url.encode('utf-8')).decode('ascii')` so the `path`
query parameter matches the established contract used in
`sdks/python/src/opik/api_objects/attachment/client.py` and
`sdks/python/tests/e2e/verifiers.py`. Keep the rest of the polling logic unchanged and
ensure the generated `path` string no longer contains the url-safe alphabet characters.

Comment on lines +19 to +25
def pytest_addoption(parser: pytest.Parser) -> None:
parser.addoption(
"--load-scale",
type=float,
default=float(os.getenv("OPIK_LOAD_SCALE", "1.0")),
help="Multiplier applied to default trace/span counts in load tests.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--load-scale should probably reject values < 1 (or validate in load_scale), otherwise sub-unit inputs can zero out derived counts and trigger ZeroDivisionError/IndexError instead of a clear failure.

Finding type: Logical Bugs | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
tests_load/suite/python_sdk/conftest.py around lines 19-25 (function pytest_addoption)
and lines 49-50 (fixture load_scale), add validation for the --load-scale option so only
strictly positive values are allowed. If the parsed value is <= 0 (including 0 or
negative/sub-unit values the suite doesn’t support), fail fast with a clear pytest
error message indicating that --load-scale must be > 0. Ensure the validation happens
before any load-test computations so tests don’t crash with
ZeroDivisionError/IndexError later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation Infrastructure Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant