-
Notifications
You must be signed in to change notification settings - Fork 1.2k
studio: add additional data to live metrics post messages to support live experiments in monorepos #10291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4475a64
to
9f1d799
Compare
@@ -351,16 +350,6 @@ def fs(self, fs: "FileSystem"): | |||
# fs. | |||
self._reset() | |||
|
|||
@property | |||
def subrepo_relpath(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[F] Moved into dvc.utils.studio
. Tests moved too.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #10291 +/- ##
=======================================
Coverage 90.42% 90.43%
=======================================
Files 494 494
Lines 37821 37866 +45
Branches 5469 5484 +15
=======================================
+ Hits 34201 34244 +43
- Misses 2986 2988 +2
Partials 634 634 ☔ View full report in Codecov by Sentry. |
9f1d799
to
53a03e4
Compare
or isinstance(scm, NoSCM) | ||
or not (commit := scm.resolve_commit(baseline_rev)) | ||
or not ( | ||
first_100_parent_shas := get_n_commits(scm, [baseline_rev], 101)[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: shy do we need this? (looks complicated, and not very obvious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/iterative/studio/issues/8848#issuecomment-1935151710
It is complicated and it is needed because of commits missing from Studio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. A few thoughts / questions:
- should it be fine for us at least initially disregard it (and improve later?). Rely on the existence of the baseline commit in Studio? (if I understood the issue correctly, apologies if not)
- should it be optional (remove it from
if .. or not
) - so that's it's not a blocker for sending at least basic data? - probably a place where a comment would not hurt :) (absolutely your call on this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should it be fine for us at least initially disregard it (and improve later?). Rely on the existence of the baseline commit in Studio? (if I understood the issue correctly, apologies if not)
- should it be optional (remove it from
if .. or not
) - so that's it's not a blocker for sending at least basic data?
This data is additional to the baseline_rev
not instead of. The extra data will only be used if the baseline cannot be found in Studio. Under those circumstances we will try to create a "fake" parent. Without the parent commits we will not know where to insert the "fake" parent and so will still end up with a detached commit. IMO there is therefore no point in sending the extra data without the parent commits.
This solutions is the best (and maybe only) available given the current constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify and double check.get_dvc_experiment_parent_data
would return None
(no even baseline sent) if we can't get first_100_parent_shas
for some reason ... is it correct?
My question was - can there a be situation when getting just a single baseline works, but 100 fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The baseline is duplicated in the payload sent for the start
event. It is sent both inside and outside of the dvc_experiment_parent_data
(smaller change). An example payload is shown in this test. I.e.
assert start_call.kwargs["json"] == {
"type": "start",
"repo_url": "STUDIO_REPO_URL",
"baseline_sha": baseline_sha,
"name": name,
"params": {"params.yaml": {"foo": 1}},
"client": "dvc",
"dvc_experiment_parent_data": {
"author": {
"email": "dvctester@example.com",
"name": "DVC Tester",
},
"date": M.any,
"message": "init",
"parent_shas": M.any,
"title": "init",
"sha": baseline_sha,
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was - can there a be situation when getting just a single baseline works, but 100 fails.
Shouldn't happen but get_dvc_experiment_parent_data
is meant to be designed to not stop the start
event being sent OR brick the experiment if the required data is not found.
@@ -135,51 +135,3 @@ def test_dynamic_cache_initialization(tmp_dir, scm): | |||
dvc.close() | |||
|
|||
Repo(str(tmp_dir)).close() | |||
|
|||
|
|||
def test_monorepo_relpath(tmp_dir, scm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see https://github.com/iterative/dvc/pull/10291/files#r1482381232 (moved these along with the method)
…live experiments in monorepos
53a03e4
to
b3c210c
Compare
@@ -38,7 +38,7 @@ dependencies = [ | |||
"dvc-data>=3.10,<3.12", | |||
"dvc-http>=2.29.0", | |||
"dvc-render>=1.0.1,<2", | |||
"dvc-studio-client>=0.19,<1", | |||
"dvc-studio-client@git+https://github.com/iterative/dvc-studio-client.git@refs/pull/144/head", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[I] Do not merge before releasing dvc-studio-client & studio and updating this.
Thanks @mattseddon! Initial questions:
|
Sorry for the confusion. I had reviewed this and felt that it was ready so marked it ready for review. I wanted to record a demo for the Studio PR and got blocked by a merge stomped migration/docker issues. The dependency is that this cannot be released without the Studio PR being merged and released first. Other than that they are decoupled. I will have the other PR ready for review today.
Fake commits should only appear when the Please LMK if you have any more questions/concerns. |
Closing in favour of shipping only |
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Part of https://github.com/iterative/studio/issues/8848
Depends on iterative/dvc-studio-client#144 / https://github.com/iterative/studio/pull/9019
Required by iterative/dvclive#779
The PR moves a util added in #10283 and adds a further util which will be used by DVC/DVCLive to send additional data to Studio to support live experiments being run in monorepos.