-
Notifications
You must be signed in to change notification settings - Fork 6
Add optional monorepo information into schema and post_live_metrics #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #144 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 312 326 +14
Branches 18 18
=========================================
+ Hits 312 326 +14 ☔ View full report in Codecov by Sentry. |
b9df0cd to
42d939a
Compare
42d939a to
4f772d1
Compare
| if subdir: | ||
| body["subdir"] = subdir | ||
| if dvc_experiment_parent_data: | ||
| body["dvc_experiment_parent_data"] = dvc_experiment_parent_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor / non-blocker:
I know that this name comes from Studio, but we might reconsider it for the API. E.g. we don't prefix other fields wit h dvc_experiment_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Studio it's a legacy naming because of different types of experiments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this.
I decided that it was more important to have a consistent (and slightly annoying) name throughout the spider's web ecosystem.
It has been my experience that cross-system renames (i.e. in APIs or directly before the DB) make it far more difficult to navigate the big picture.
We could pause and go back and rename the original field but I decided that was not worth the time commitment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical / not a blocker. Yep, my concern is these names were not visible before (they are some internal impl details of Studio), and now we are kinda expanding the surface. My 2cs on this - I would not pay attention to any internal Studio-specific conventions in out public tools. And I think it's fine to migrate Studio later, even it's way later. It's more important to my mind for public tools / APIs to be clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be where our opinion differs. I do not see dvc-studio-client as a public tool or API. I don't even think it should be a package that is external to DVC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's public since it's part of the DVC project (DVC itself, or a separate package - in this case it doesn't really matter). Let's say if tomorrow someone (e.g. DagsHub, or GitLab) would want to implement a DVC experiments-compatible service they would be using this API. Or if some other tools decide to send metrics to Studio.
Or another scenario / angle. It is similar to let's say DVC sending metrics to MLFlow - it's a very reasonable scenario, and we'll be using their API to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the way Studio operates internally (database schema, GraphQL schema, etc) - those are completely hidden and private and people don't care and won't care about those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say if tomorrow someone (e.g. DagsHub, or GitLab) would want to implement a DVC experiments-compatible service they would be using this API.
The chances of this hypothetical situation are much lower than some poor dev having to go back into this code to make changes.
| studio_repo_url: Optional[str] = None, | ||
| studio_url: Optional[str] = None, | ||
| dvc_experiment_parent_data: Optional[Dict[str, Any]] = None, | ||
| subdir: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split subdir into its own PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was hoping to solve the detached experiments problem yesterday and not have to go around the loop twice.
I will repurpose/redo all of the PRs.
Part of https://github.com/iterative/studio/issues/8848
Required for https://github.com/iterative/studio/pull/9019 / treeverse/dvc#10291 / treeverse/dvclive#779
This PR adds additional data into the schema to support live experiments from monorepos being displayed correctly in Studio.
We are adding the following to the start event:
The rest of the
startevent remains unchanged. The names of the new fields match the fields already present in Studio for pushed experiments/commits. They are optional so that we can continue to accept "legacy" events from older versions of the client in Studio.