Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare revisions for remote git branches/tags #4386

Closed
wants to merge 5 commits into from

Conversation

linar-jether
Copy link

Adds a simple check if the local revision is the same as the remote revision, even if the ref is a tag or branch name.

Until now branches or tags would always try to update the repository.

Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but a test would be needed to merge this.
Otherwise, it will likely go away on the next (needed) refactor...

pip/vcs/git.py Outdated
@@ -171,6 +179,12 @@ def get_revision(self, location):
['rev-parse', 'HEAD'], show_stdout=False, cwd=location)
return current_rev.strip()

def get_remote_revision(self, url, ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is not part of VersionControl "API", I'd rather it be called _get_remote_revision

@pradyunsg
Copy link
Member

Hi @linar-jether!

Could you add a test for this? :)

@pradyunsg pradyunsg added the S: awaiting response Waiting for a response/more information label Aug 21, 2017
@linar-jether
Copy link
Author

@pradyunsg Sure sorry for the delay, added tests now...

@xavfernandez anything else in order to merge this?

@cjerdonek
Copy link
Member

cjerdonek commented Aug 23, 2017

I would caution against this change because it adds yet another place where the ref / rev_options are getting converted to a sha, and it does so in a way that isn't necessarily consistent with how it's done elsewhere in the code.

Currently, similar conversion logic occurs in check_rev_options(). You can see here in the call to obtain() how check_rev_options() is called right before check_version() to attempt to get a sha:

if rev:
    rev_options = self.check_rev_options(rev, dest, rev_options)
    # Only do a checkout if rev_options differs from HEAD
    if not self.check_version(dest, rev_options):
        self.run_command(
            ['fetch', '-q', url] + rev_options,
            cwd=dest,
        )
        self.run_command(
            ['checkout', '-q', 'FETCH_HEAD'],
            cwd=dest,
        )

With this PR, similar but slightly different conversion logic (using ls-remote) would get added to check_version(). What introduces even more complexity though is that this new logic will be exercised only in the update() code path but not in the obtain() code path. This is because the url argument is being used as a flag (adding cyclotomic complexity), and the argument isn't passed in the obtain() code path.

So the PR introduces the following asymmetry: obtain() does its extra checking before the call to check_version() and passes None to check_version() to bypass extra checking in check_version(), whereas in the update() code path, no extra checking takes place before the call to check_version() and a non-None argument is passed to check_version() to trigger extra checking in check_version().

A couple other comments:

The news item has extension "bugfix", but this looks like a performance enhancement to me (and there is no issue / bug report).

Also, the tests don't do anything to check that the obtain() code path passes None for the url while the update() code path passes non-None, which is a key aspect of this change.

@cjerdonek
Copy link
Member

It seems like a way to address this that would be more consistent with other parts of the code (namely the initial clone case) would be to change Git.update() in the following code to fetch only the ref you need:

def update(self, dest, rev_options):
    # First fetch changes from the default remote
    if self.get_git_version() >= parse_version('1.9.0'):
        # fetch tags in addition to everything else
        self.run_command(['fetch', '-q', '--tags'], cwd=dest)  # <-- here
    else:
        self.run_command(['fetch', '-q'], cwd=dest)
    # Then reset to wanted revision (maybe even origin/master)  # <-- here
    if rev_options:
        rev_options = self.check_rev_options(
            rev_options[0], dest, rev_options,
        )
    self.run_command(['reset', '--hard', '-q'] + rev_options, cwd=dest)
    #: update submodules
    self.update_submodules(dest)

This is what the clone case does here:

# Only do a checkout if rev_options differs from HEAD
if not self.check_version(dest, rev_options):
    self.run_command(
        ['fetch', '-q', url] + rev_options,  # <-- here
        cwd=dest,
    )
    self.run_command(
        ['checkout', '-q', 'FETCH_HEAD'],
        cwd=dest,
    )

This approach seems equivalent and possibly more efficient in some cases because you avoid an extra ls-remote call. It would also make the clone and update cases more parallel, as well as avoiding the added extra complexity I mentioned in my previous comment.

@linar-jether
Copy link
Author

linar-jether commented Aug 24, 2017

@cjerdonek
Thank for the detailed response! I'm not very familiar with the code base so any alternative approaches are appreciated...

I labeled it a bugfix because currently pip re-installs packages even when not necessary, if the remote is a tag name and it hasn't changed there is no reason to reinstall, not only is this slower, it's also a broken functionality.

Also, the tests don't do anything to check that the obtain() code path passes None for the url while the update() code path passes non-None, which is a key aspect of this change.

I'm not sure what this means, it seems the update() path only happens if check_version returned false...

Currently, similar conversion logic occurs in check_rev_options(). You can see here in the call to obtain() how check_rev_options() is called right before check_version() to attempt to get a sha:

This only happens if check_destination returned true (and now it won't if the remote is the same), and it (show-ref) is not the same as calling ls-remote, unless you call fetch before, so either way, we must add the additional command.

@cjerdonek
Copy link
Member

and it (show-ref) is not the same as calling ls-remote, unless you call fetch before, so either way, we must add the additional command.

My point in mentioning show-ref isn't that it's the same as ls-remote, but rather that it's in some sense a parallel operation with similar requirements. The call to show-ref currently needs to deal with deciding between remotes, branches, tags, refs, etc. And if you add a call to ls-remote, you're going to need to add similar logic to that call with behavior matching the show-ref logic. That logic would also need to be tested separately. Basically, it's going to add more complexity.

For example, if you do this, you get three results:

$ git ls-remote https://github.com/di/git_ref_example blah
59354f1a353783073ad80dfe619ccbe4ac38fd2b	refs/heads/aaa/blah
f04115ccf5fb9e5ee5c89e15c0e58928a61719d9	refs/heads/blah
316b5d2bb9cd4e25486a3b22dcd006cf4f9b7836	refs/heads/zzz/blah

But your code only assumes one result. You would also have to consider things like whether there is a matching tag.

So what I was cautioning against is having to duplicate this type of logic in a second code path in a slightly different way. It would be better if we can be moving in the direction of making the fresh-install and update cases more similar to each other rather than more different.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Sep 1, 2017
@pradyunsg pradyunsg removed the S: awaiting response Waiting for a response/more information label Nov 15, 2017
@pradyunsg
Copy link
Member

@linar-jether @cjerdonek @pypa/pip-committers What do we want to do about this PR?

I haven't really looked at this closely enough to have any strong opinion. I do think though that this can be closed or at least needs to adopt a different strategy.

@cjerdonek
Copy link
Member

While I think the intent here is good, honestly I think we should address issues like #2037 first for a couple reasons. First, the PR is suggesting a certain optimization. While it's certainly doable, I would consider the code needed somewhat complex relative to what will be gained. IIRC, it would require code analogous to what I wrote for PR #4690 (specifically the get_revision_sha() function).

Second, I'm not sure the optimization will work in the same way depending on how #2037 is resolved, so I would rather we not bake in the optimization because it will make later issues harder and we might need to remove it (or at least rework it) later anyways.

@pradyunsg pradyunsg added the S: needs triage Issues/PRs that need to be triaged label May 11, 2018
@pradyunsg
Copy link
Member

I'll go ahead and close this, due to lack of activity here.

@pradyunsg pradyunsg closed this Dec 17, 2018
@lock
Copy link

lock bot commented May 31, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 31, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 31, 2019
@pradyunsg pradyunsg removed the S: needs triage Issues/PRs that need to be triaged label Feb 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants