Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erepo: cache all read only external repos by hexsha #3286

Merged
merged 2 commits into from
Feb 11, 2020

Conversation

Suor
Copy link
Contributor

@Suor Suor commented Feb 7, 2020

So we have 3 things cached now separately:

  • clean clones, to not ask for creds repatedly
  • cache dirs, also shared between erepos with same origin
  • checked out clones if they are read only, addressed by (url, hexsha)

Several additions to Git along the way:

  • Git.is_sha() static method
  • .pull() and .push() work with multiple returned records correctly
  • .get_rev() and .resolve_rev() work faster
  • .resolve_rev() looks for remote branches
  • .has_rev()

Fixes #3280.

dvc/external_repo.py Outdated Show resolved Hide resolved
dvc/external_repo.py Outdated Show resolved Hide resolved
try:
# Try python implementation of rev-parse first, it's faster
return self.repo.rev_parse(name).hexsha
except NotImplementedError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, when does it happen though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For text search, see test_diff.test_directories. There are also dates in @{...} format.

So we have 3 things cached now separately:
- clean clones, to not ask for creds repatedly
- cache dirs, also shared between erepos with same origin
- checked out clones if they are read only, addressed by (url, hexsha)

Several additions to `Git` along the way:
- Git.is_sha() static method
- .pull() and .push() work with multiple returned records correctly
- .get_rev() and .resolve_rev() work faster
- .resolve_rev() looks for remote branches
- .has_rev()

Fixes iterative#3280.
@Suor
Copy link
Contributor Author

Suor commented Feb 8, 2020

Everything should be addressed now.

dvc/scm/git/__init__.py Outdated Show resolved Hide resolved
It follows `git checkout` logic now - if name can be unambiguously
resolved across known remotes then it's done.
@Suor Suor requested review from efiop and skshetry February 11, 2020 07:17
Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one question. 🙂

@skshetry
Copy link
Member

skshetry commented Feb 11, 2020

Oops, forgot to mention. Can we have at least a test that no cloning is done for the same rev? Maybe, just an assert that the cloned directory is same when rev is same?

with external_repo("/tmp/repo", "master") as repo1:
    pass
with external_repo("/tmp/repo", "master") as repo2: # or, even better, check here with hash?
    pass
assert repo1.root_dir == repo2.root_dir

@Suor
Copy link
Contributor Author

Suor commented Feb 11, 2020

Oops, forgot to mention. Can we have at least a test that no cloning is done for the same rev? Maybe, just an assert that the cloned directory is same when rev is same?

Not sure we should test that, this is an implementation detail.

EDIT. Also, we were not cloning twice before this PR, clones were cached. This PR caches copies from a cached clone

for remote in self.repo.remotes
} - {None}
if len(shas) > 1:
raise RevError("ambiguous Git revision '{}'".format(rev))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC git choses the ref-like rev if it can, even in ambigous case. It does print a warning though. Not sure if it matters that much here.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Let's merge and move on. Not happy about the lack of tests, but that is on you 🙂

@efiop efiop merged commit 47081cb into iterative:master Feb 11, 2020
@skshetry
Copy link
Member

skshetry commented Feb 11, 2020

Not sure we should test that, this is an implementation detail.

I'd argue, it's not an implementation detail but a behavior that is reasonable and expected, a subtle but important distinction because if I refactor this, I'd like to keep this behavior intact (without a need to change test code).

How'd you prevent this from regressing?

@efiop
Copy link
Contributor

efiop commented Feb 13, 2020

@Suor #3280 (comment) So we should've added tests, right? 😉

@casperdcl
Copy link
Contributor

:D just posted #3280 (comment) before reading this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up dvc status for large projects
4 participants