Skip to content

chore(llmobs): support running experiment tasks #13970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 15, 2025
Merged

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Jul 11, 2025

MLOB-3272

Adds support for running experiment task functions concurrently, and tracing said experiment tasks.
Also does the following:

  • Make DatasetRecord.input_data a more strict Dict[str, NonNoneJSONType]. There was a discrepancy between previous iterations of input_data from the DatasetRecord class and the required signature for task functions.
  • Makes config a required input param (can be optional though) for task functions. Making it possibly optional is too clunky of an experience and users should be required to set it
  • Private helper to create experiment spans to trace the experiment task run. Note that this is still submitted to the regular LLMObs span track, future PR will handle custom scoping and experiment custom I/O annotation on said span.

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim changed the base branch from main to yunkim/llmobs-create-experiment July 11, 2025 22:30
Copy link
Contributor

github-actions bot commented Jul 11, 2025

CODEOWNERS have been resolved as:

tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_3bf4897d-e6aa-43a3-8d9c-5097b1f85177_batch_update_post_2d58a82a.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_67c7b6cc-ce98-481e-ab9b-e4925564826c_batch_update_post_a1f44751.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_delete_post_264c9f32.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_delete_post_d28fa230.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_post_35e295b1.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/datadog/datadog_api_unstable_llm-obs_v1_datasets_post_3f098e52.yaml  @DataDog/ml-observability
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_experiment.py                                           @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
tests/llmobs/test_experiments.py                                        @DataDog/ml-observability

Copy link
Contributor

github-actions bot commented Jul 11, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 280 ± 4 ms.

The average import time from base is: 282 ± 4 ms.

The import time difference between this PR and base is: -2.0 ± 0.2 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.984 ms (0.71%)
ddtrace.bootstrap.sitecustomize 1.305 ms (0.47%)
ddtrace.bootstrap.preload 1.305 ms (0.47%)
ddtrace.internal.remoteconfig.client 0.658 ms (0.24%)
ddtrace 0.679 ms (0.24%)
ddtrace.internal._unpatched 0.032 ms (0.01%)
json 0.032 ms (0.01%)
json.decoder 0.032 ms (0.01%)
re 0.032 ms (0.01%)
enum 0.032 ms (0.01%)
types 0.032 ms (0.01%)

@pr-commenter
Copy link

pr-commenter bot commented Jul 11, 2025

Benchmarks

Benchmark execution time: 2025-07-15 01:44:55

Comparing candidate commit c70bb0d in PR branch yunkim/dne-run-tasks with baseline commit c8d1431 in branch main.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 546 metrics, 2 unstable metrics.

scenario:iastaspects-format_map_aspect

  • 🟥 execution_time [+470.593ns; +565.544ns] or [+14.604%; +17.551%]

scenario:iastaspectsospath-ospathjoin_aspect

  • 🟥 execution_time [+809.319ns; +996.221ns] or [+13.167%; +16.208%]

Base automatically changed from yunkim/llmobs-create-experiment to main July 14, 2025 22:53
@Yun-Kim Yun-Kim force-pushed the yunkim/dne-run-tasks branch 2 times, most recently from cdfb6fa to 9ef1d9e Compare July 14, 2025 23:34
@Yun-Kim Yun-Kim marked this pull request as ready for review July 14, 2025 23:34
@Yun-Kim Yun-Kim requested a review from a team as a code owner July 14, 2025 23:34
@Yun-Kim Yun-Kim force-pushed the yunkim/dne-run-tasks branch from 9ef1d9e to 50f602e Compare July 14, 2025 23:35
@Yun-Kim Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Jul 14, 2025
@Yun-Kim Yun-Kim force-pushed the yunkim/dne-run-tasks branch from 50f602e to 6937e9c Compare July 14, 2025 23:46
Copy link
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some thoughts, i think i'd change the log/raise condition check for sure but trust you to address what you feel is necessary

@Kyle-Verhoog Kyle-Verhoog merged commit 345cacc into main Jul 15, 2025
461 checks passed
@Kyle-Verhoog Kyle-Verhoog deleted the yunkim/dne-run-tasks branch July 15, 2025 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants