Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch specific commits even if the github fast path fails #13946

Merged
merged 1 commit into from
May 22, 2024

Conversation

iliana
Copy link
Contributor

@iliana iliana commented May 22, 2024

What does this PR try to resolve?

This PR fixes #13555, which describes a regression from 1.64.0 to 1.65.0 where the inability to fetch commit information from api.github.com (the "GitHub fast path") silently changes Cargo's behavior.

Cargo can fetch a specific Git commit from a remote without having to fetch all refs. Prior to #10807, this functionality required a repository hosted on github.com and providing the full commit hash (usually available from the Cargo.lock); after that change, any revision (including abbreviated revisions) that could be resolved by GitHub's API could be fetched directly. However, this logic requires the "GitHub fast path", which was not intended to be robust, to successfully return the resolved commit hash; if a client is currently rate-limited by api.github.com (very common in CI and shared cloud / corporate environments) this fails and Cargo falls back to fetching all refs.

Usually this is not noticeable. However, GitHub allows fetching commits that are related to the repository but not actually part of any of its refs, including commits pushed to a fork. This results in the same command working fine in some environments where api.github.com is accessible, and not working in other environments that are rate-limited, which is very confusing and difficult to debug.

This change adds another branch to cover the regression case: if we are going through the GitHub fast path with a full commit hash, return early indicating that we need to fetch it. (Previously: when the GitHub fast path was unsuccessful, the user is not using the unstable shallow clone options, and we have a full commit hash and expect to be able to fetch it directly because we know it's a github.com repository.)

How should we test and review this PR?

I have been testing this PR by temporarily adding a 0.0.0.0 api.github.com entry to my /etc/hosts, which causes the GitHub fast path to always fail, then running:

target/release/cargo install --git https://github.com/haha-business/unstable-test-repo.git --rev c9040898c9183ddbb9402dcbf749ed06d6ea90ad

This refers to a particular commit on a fork of the repo which won't be found by the fallback path or current Cargo.

Note that you will need to delete ~/.cargo/git/checkouts/unstable-test-repo-* and ~/.cargo/git/db/unstable-test-repo-* after a successful run with this change in order to reproduce the broken behavior of the current release.

I am having trouble getting the test suite to run at all on my system so I haven't experimented with writing a specific test for this case, but I probably should.

Additional information

This uses the same logic as the unstable shallow clone support to detect if the revision is a full commit hash. This is not compatible with SHA-256 commit hashes; git2::Oid specifically expects a 40-character hexadecimal string. Given that the change introducing this bug was meant to future-proof SHA-256 support (despite only doing so for GitHub repositories), it might be good to make the logic more explicit within Cargo and allow either 40- or 64-character hex strings.

I wanted to keep this change focused on the regression fix, but in testing, pretty much every Git repository I could think of (including non-forges, like git.kernel.org and some repositories I host on my own infrastructure with cgit) supports fetching directly from a commit, so it would be ideal to eventually relax the GitHub requirement for this functionality. However, it would need some sort of fallback logic because I suspect the HTTP dumb protocol doesn't support commit references, and I haven't researched when this functionality was added to the smart protocol.

@rustbot
Copy link
Collaborator

rustbot commented May 22, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added A-git Area: anything dealing with git S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 22, 2024
@@ -977,6 +977,13 @@ pub fn fetch(
// The reason we write to `refs/remotes/origin/HEAD` is that it's of special significance
// when during `GitReference::resolve()`, but otherwise it shouldn't matter.
refspecs.push(format!("+{0}:refs/remotes/origin/HEAD", rev));
} else if Url::parse(remote_url).map_or(false, |url| is_github(&url))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the nice finding and write-up!

I understand supporting other Git hosting would be awesome, though this is still a GitHub specific fast path. I wonder if we could move this patch to under github_fast_path. Maybe something like this?

                if let Some(local_object) = local_object {
                    if is_short_hash_of(rev, local_object) {
                        debug!("github fast path already has {local_object}");
                        return Ok(FastPathRev::UpToDate);
                    }
                }
+               if let Ok(oid) = rev.parse::<Oid>() {
+                    debug!("github fast path is already a full commit hash {rev}");
+                    return Ok(FastPathRev::NeedsFetch(oid))
+               }
                rev
            } else {
                debug!("can't use github fast path with `rev = \"{}\"`", rev);
                return Ok(FastPathRev::Indeterminate);
            }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's clever! Let me give that a shot.

Co-authored-by: Weihang Lo <weihanglo@users.noreply.github.com>
@iliana iliana force-pushed the 13555-object-not-found branch from 745fc3b to 5d7a06b Compare May 22, 2024 04:33
@iliana
Copy link
Contributor Author

iliana commented May 22, 2024

I added a Co-authored-by header to my commit since you wrote the actual code! Let me know if that should be adjusted.

It looks like that approach also has the benefit of making the behavior around what the ref is fetched as more consistent; previously if the GitHub API request failed the ref would instead be fetched to refs/remotes/origin/HEAD instead of refs/commit/{commit_hash}. Given that was never supposed to happen, and the ref we fetch as is pretty meaningless in the context of these Cargo-managed repos, I think that's fine.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for the contribution.

Let's move on and leave off the issue of handling 64-hex-digit commit hash, believing this part won't be the only place need to change at that time 😬.

@weihanglo
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented May 22, 2024

📌 Commit 5d7a06b has been approved by weihanglo

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 22, 2024
@bors
Copy link
Contributor

bors commented May 22, 2024

⌛ Testing commit 5d7a06b with merge 99075f2...

@bors
Copy link
Contributor

bors commented May 22, 2024

☀️ Test successful - checks-actions
Approved by: weihanglo
Pushing 99075f2 to master...

@bors bors merged commit 99075f2 into rust-lang:master May 22, 2024
21 checks passed
bors added a commit to rust-lang-ci/rust that referenced this pull request May 24, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
bors added a commit to rust-lang-ci/rust that referenced this pull request May 25, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
bors added a commit to rust-lang-ci/rust that referenced this pull request May 25, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
bors added a commit to rust-lang-ci/rust that referenced this pull request May 25, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
@rustbot rustbot added this to the 1.80.0 milestone May 25, 2024
RalfJung pushed a commit to RalfJung/miri that referenced this pull request May 27, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Jun 28, 2024
Update cargo

7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb
2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000
- Improve error description when deserializing partial field struct (rust-lang/cargo#13956)
- fix: remove symlink dir on Windows (rust-lang/cargo#13910)
- Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957)
- Add more high level traces (rust-lang/cargo#13951)
- upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948)
- Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939)
- fetch specific commits even if the github fast path fails (rust-lang/cargo#13946)

r? ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

object not found when fetching git dependency
4 participants