Skip to content

studio: add additional data to live metrics post messages to support live experiments in monorepos #10291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mattseddon
Copy link
Contributor

@mattseddon mattseddon commented Feb 8, 2024

Thank you for the contribution - we'll try to review it as soon as possible. 🙏


Part of https://github.com/iterative/studio/issues/8848

Depends on iterative/dvc-studio-client#144 / https://github.com/iterative/studio/pull/9019

Required by iterative/dvclive#779

The PR moves a util added in #10283 and adds a further util which will be used by DVC/DVCLive to send additional data to Studio to support live experiments being run in monorepos.

@mattseddon mattseddon force-pushed the add-live-monorepo-support branch from 4475a64 to 9f1d799 Compare February 8, 2024 03:52
@@ -351,16 +350,6 @@ def fs(self, fs: "FileSystem"):
# fs.
self._reset()

@property
def subrepo_relpath(self) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F] Moved into dvc.utils.studio. Tests moved too.

Copy link

codecov bot commented Feb 8, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (6efd699) 90.42% compared to head (b3c210c) 90.43%.

Files Patch % Lines
dvc/utils/studio.py 88.23% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10291   +/-   ##
=======================================
  Coverage   90.42%   90.43%           
=======================================
  Files         494      494           
  Lines       37821    37866   +45     
  Branches     5469     5484   +15     
=======================================
+ Hits        34201    34244   +43     
- Misses       2986     2988    +2     
  Partials      634      634           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon mattseddon force-pushed the add-live-monorepo-support branch from 9f1d799 to 53a03e4 Compare February 9, 2024 08:27
or isinstance(scm, NoSCM)
or not (commit := scm.resolve_commit(baseline_rev))
or not (
first_100_parent_shas := get_n_commits(scm, [baseline_rev], 101)[1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: shy do we need this? (looks complicated, and not very obvious)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/iterative/studio/issues/8848#issuecomment-1935151710

It is complicated and it is needed because of commits missing from Studio.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. A few thoughts / questions:

  • should it be fine for us at least initially disregard it (and improve later?). Rely on the existence of the baseline commit in Studio? (if I understood the issue correctly, apologies if not)
  • should it be optional (remove it from if .. or not) - so that's it's not a blocker for sending at least basic data?
  • probably a place where a comment would not hurt :) (absolutely your call on this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should it be fine for us at least initially disregard it (and improve later?). Rely on the existence of the baseline commit in Studio? (if I understood the issue correctly, apologies if not)
  • should it be optional (remove it from if .. or not) - so that's it's not a blocker for sending at least basic data?

This data is additional to the baseline_rev not instead of. The extra data will only be used if the baseline cannot be found in Studio. Under those circumstances we will try to create a "fake" parent. Without the parent commits we will not know where to insert the "fake" parent and so will still end up with a detached commit. IMO there is therefore no point in sending the extra data without the parent commits.

This solutions is the best (and maybe only) available given the current constraints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify and double check.get_dvc_experiment_parent_data would return None (no even baseline sent) if we can't get first_100_parent_shas for some reason ... is it correct?

My question was - can there a be situation when getting just a single baseline works, but 100 fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The baseline is duplicated in the payload sent for the start event. It is sent both inside and outside of the dvc_experiment_parent_data (smaller change). An example payload is shown in this test. I.e.

        assert start_call.kwargs["json"] == {
            "type": "start",
            "repo_url": "STUDIO_REPO_URL",
            "baseline_sha": baseline_sha,
            "name": name,
            "params": {"params.yaml": {"foo": 1}},
            "client": "dvc",
            "dvc_experiment_parent_data": {
                "author": {
                    "email": "dvctester@example.com",
                    "name": "DVC Tester",
                },
                "date": M.any,
                "message": "init",
                "parent_shas": M.any,
                "title": "init",
                "sha": baseline_sha,
            },
        }

Copy link
Contributor Author

@mattseddon mattseddon Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was - can there a be situation when getting just a single baseline works, but 100 fails.

Shouldn't happen but get_dvc_experiment_parent_data is meant to be designed to not stop the start event being sent OR brick the experiment if the required data is not found.

@@ -135,51 +135,3 @@ def test_dynamic_cache_initialization(tmp_dir, scm):
dvc.close()

Repo(str(tmp_dir)).close()


def test_monorepo_relpath(tmp_dir, scm):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://github.com/iterative/dvc/pull/10291/files#r1482381232 (moved these along with the method)

@mattseddon mattseddon force-pushed the add-live-monorepo-support branch from 53a03e4 to b3c210c Compare February 11, 2024 21:46
@mattseddon mattseddon changed the title Add additional data to Studio live metrics post messages to support live experiments in monorepos studio: add additional data to live metrics post messages to support live experiments in monorepos Feb 11, 2024
@@ -38,7 +38,7 @@ dependencies = [
"dvc-data>=3.10,<3.12",
"dvc-http>=2.29.0",
"dvc-render>=1.0.1,<2",
"dvc-studio-client>=0.19,<1",
"dvc-studio-client@git+https://github.com/iterative/dvc-studio-client.git@refs/pull/144/head",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I] Do not merge before releasing dvc-studio-client & studio and updating this.

@mattseddon mattseddon self-assigned this Feb 12, 2024
@mattseddon mattseddon marked this pull request as ready for review February 12, 2024 07:30
@mattseddon mattseddon requested a review from a team February 12, 2024 07:30
@dberenbaum
Copy link
Contributor

Thanks @mattseddon!

Initial questions:

  1. What is the status of this and Replaces plots dvc_data_version_info with flat fields to be used by VS Code and Studio #9019? Why is that in draft and this one ready? I thought this one depends on Replaces plots dvc_data_version_info with flat fields to be used by VS Code and Studio #9019?
  2. Do you foresee this changing behavior of non-monorepo projects? Will "fake" commits start to be appear for regular repos? For example, if my local repo is ahead of my remote, new experiments have always been shown as detached, but does this mean they will now show up in a branch with the non-pushed commits?

@mattseddon
Copy link
Contributor Author

mattseddon commented Feb 12, 2024

Thanks @mattseddon!

Initial questions:

  1. What is the status of this and https://github.com/iterative/studio/pull/9019? Why is that in draft and this one ready? I thought this one depends on https://github.com/iterative/studio/pull/9019?

Sorry for the confusion. I had reviewed this and felt that it was ready so marked it ready for review. I wanted to record a demo for the Studio PR and got blocked by a merge stomped migration/docker issues. The dependency is that this cannot be released without the Studio PR being merged and released first. Other than that they are decoupled. I will have the other PR ready for review today.

  1. Do you foresee this changing behavior of non-monorepo projects? Will "fake" commits start to be appear for regular repos? For example, if my local repo is ahead of my remote, new experiments have always been shown as detached, but does this mean they will now show up in a branch with the non-pushed commits?

Fake commits should only appear when the baseline_rev does not exist in the Studio DB AND when the commit is not ahead of the last commit present in Studio (test for that behaviour is around here - I say around because I will be working on that PR today). edit: There is also an e2e that confirms this behaviour.

Please LMK if you have any more questions/concerns.

@mattseddon mattseddon added A: experiments Related to dvc exp blocked labels Feb 12, 2024
@mattseddon
Copy link
Contributor Author

Closing in favour of shipping only subdir updates first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp blocked
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants