-
Notifications
You must be signed in to change notification settings - Fork 440
feat(llmobs): add datasets and experiments features #13314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 235 ± 3 ms. The average import time from base is: 235 ± 3 ms. The import time difference between this PR and base is: -0.5 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
BenchmarksBenchmark execution time: 2025-05-19 20:06:37 Comparing candidate commit a648fe5 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 507 metrics, 4 unstable metrics. scenario:telemetryaddmetric-flush-1000-metrics
|
if site is None: | ||
site = os.getenv("DD_SITE") | ||
if site is None: | ||
raise ConfigurationError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we say in the docstring that this defaults to "datadoghq.com"
but raise here. I think we should default it and not raise here.
from .utils._exceptions import DatasetFileError | ||
|
||
if TYPE_CHECKING: | ||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could interfere with a user's typechecker if they don't have pandas installed when type checking their own app. We should still guard this with a try
self.add(record) | ||
return self | ||
|
||
def add(self, record: Dict[str, Union[str, Dict[str, Any]]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defining a record type could drastically improve usability here.
class Record(TypedDict):
input: str
expected_output: Dict[str, JsonType]
for example
|
||
# Experiments related | ||
EXPECTED_OUTPUT = "_ml_obs.meta.input.expected_output" | ||
EXPERIMENT_INPUT = "_ml_obs.meta.input" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused as to why we need new fields for input and output here but maybe it will become obvious from further reading
@@ -241,6 +253,12 @@ def _llmobs_tags(span: Span, ml_app: str, session_id: Optional[str] = None) -> L | |||
"language": "python", | |||
"error": span.error, | |||
} | |||
|
|||
# Add experiment_id from baggage if present | |||
experiment_id = span.context.get_baggage_item(EXPERIMENT_ID_BAGGAGE_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just look generally in the context for the experiment ID to follow the same paradigm we do for parent id and mlobs trace id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's separate out this ui code and printing logic to an additional PR. There's too much going on here
docker/.python-version
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should inline these file contents to the tests
def reset_config_state(): | ||
"""Resets global configuration state before each test.""" | ||
# Reset global state variables to defaults | ||
config._IS_INITIALIZED = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these values aren't kept in sync with the defaults, definitely want to move away from the globals
@pytest.fixture(scope="module", autouse=True) | ||
def init_llmobs(experiments_vcr): | ||
# Use the provided keys directly. VCR filtering handles redaction. | ||
api_key = DD_API_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely don't want to use globals in the tests
[WIP] Add proper description
Checklist
Reviewer Checklist