Skip to content

Conversation

@mattseddon
Copy link
Contributor

@mattseddon mattseddon commented Feb 5, 2024

Part of https://github.com/iterative/studio/issues/8848

Required for https://github.com/iterative/studio/pull/9019 / treeverse/dvc#10291 / treeverse/dvclive#779

This PR adds additional data into the schema to support live experiments from monorepos being displayed correctly in Studio.

We are adding the following to the start event:

    dvc_experiment_parent_data: Optional[Dict[str, Any]] = None,
    subdir: Optional[str] = None,

The rest of the start event remains unchanged. The names of the new fields match the fields already present in Studio for pushed experiments/commits. They are optional so that we can continue to accept "legacy" events from older versions of the client in Studio.

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8235015) 100.00% compared to head (4f772d1) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #144   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          312       326   +14     
  Branches        18        18           
=========================================
+ Hits           312       326   +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon mattseddon force-pushed the add-monorepo-info branch 4 times, most recently from b9df0cd to 42d939a Compare February 11, 2024 21:41
if subdir:
body["subdir"] = subdir
if dvc_experiment_parent_data:
body["dvc_experiment_parent_data"] = dvc_experiment_parent_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor / non-blocker:

I know that this name comes from Studio, but we might reconsider it for the API. E.g. we don't prefix other fields wit h dvc_experiment_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Studio it's a legacy naming because of different types of experiments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this.

I decided that it was more important to have a consistent (and slightly annoying) name throughout the spider's web ecosystem.

It has been my experience that cross-system renames (i.e. in APIs or directly before the DB) make it far more difficult to navigate the big picture.

We could pause and go back and rename the original field but I decided that was not worth the time commitment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical / not a blocker. Yep, my concern is these names were not visible before (they are some internal impl details of Studio), and now we are kinda expanding the surface. My 2cs on this - I would not pay attention to any internal Studio-specific conventions in out public tools. And I think it's fine to migrate Studio later, even it's way later. It's more important to my mind for public tools / APIs to be clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be where our opinion differs. I do not see dvc-studio-client as a public tool or API. I don't even think it should be a package that is external to DVC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's public since it's part of the DVC project (DVC itself, or a separate package - in this case it doesn't really matter). Let's say if tomorrow someone (e.g. DagsHub, or GitLab) would want to implement a DVC experiments-compatible service they would be using this API. Or if some other tools decide to send metrics to Studio.

Or another scenario / angle. It is similar to let's say DVC sending metrics to MLFlow - it's a very reasonable scenario, and we'll be using their API to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the way Studio operates internally (database schema, GraphQL schema, etc) - those are completely hidden and private and people don't care and won't care about those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say if tomorrow someone (e.g. DagsHub, or GitLab) would want to implement a DVC experiments-compatible service they would be using this API.

The chances of this hypothetical situation are much lower than some poor dev having to go back into this code to make changes.

studio_repo_url: Optional[str] = None,
studio_url: Optional[str] = None,
dvc_experiment_parent_data: Optional[Dict[str, Any]] = None,
subdir: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split subdir into its own PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was hoping to solve the detached experiments problem yesterday and not have to go around the loop twice.

I will repurpose/redo all of the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants