Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch even SHA revisions directly #475

Conversation

mbolivar-nordic
Copy link
Contributor

@mbolivar-nordic mbolivar-nordic commented Feb 8, 2021

Since we implemented the feature, GitHub has started to allow fetching
SHAs directly. Let's take advantage of that where we can by avoiding a
full sync of all the available refs in a remote when it's not needed.

Fall back on a "full sync" if that fails. This is a bit wasteful if
the SHA is truly gone, but it preserves compatibility and will let the
update succeed if GitHub ever changes their minds, or if using another
git server with similar restrictions.

Nobody is calling _fetch() with a rev kwarg; delete this stale code.

Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
Since we implemented the feature, GitHub has started to allow fetching
SHAs directly. Let's take advantage of that where we can by avoiding a
full sync of all the available refs in a remote when it's not needed.

Fall back on a "full sync" if that fails. This is a bit wasteful if
the SHA is truly gone, but it preserves compatibility and will let the
update succeed if GitHub ever changes their minds, or if using another
git server with similar restrictions.

Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
@mbolivar-nordic
Copy link
Contributor Author

No changes in last force push, just trying to get past what looks like a temporary CI failure.

# been extensively tested.
# newly-fetched SHA corresponding to project.revision.
#
# "Hopefully" because there are many ways to spell a revision, and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Hopefully" because there are many ways to spell a revision (git help revisions)

log.small_banner(msg)
try:
# We can't in general fetch a SHA from a remote, as some hosts
# forbid it for security reasons.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forbid it for security reasons (see git help fetch-pack)

# --tags is required to get tags, since the remote is
# specified as a URL. We hold on to tags to avoid losing track
# of revisions that got left behind by branches that rebase.
project.git(['fetch', '-f', '--tags'] +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--tags rings a few alarm bells.
1a. Fetching a SHA directly had the promise of a huge fetch optimization for repos with many tags (and branches). Gone?
1b. git tags are somewhat flawed because unlike branches they are all global, no namespace. That's why --tags is NOT the default, avoids large pollution. Gone? No QUAL_REFS_WEST or anything?
2. Does this solve a real problem? A SHA checked out (HEAD) will never be garbage-collected. So the only problem is when the manifest moves away from this SHA and returns to it (why?) after gc.reflogExpire which defaults to 90 days. Then it will merely be fetched again, not a big deal.
3a If network usage issue 2. is really a problem for someone (for instance because they have a crazy short gc.reflogExpire), they can "solve" that issue in 2 seconds by merely slapping a local branch or local tag onto that commit
3b. Something like revision-in / #344 would also avoid this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--tags rings a few alarm bells.

Yep. This is the crux of this PR, thanks for finding it right away 😄 .

Context:

  • Nordic's SDK rebases zephyr whenever we cut a new release, to keep a linear history of downstream patches, but allowing git merge of upstream zephyr during development.

  • We've historically relied on --tags being here to make sure that jumping around between versions during development never creates the possibility for garbage collection of old code, because we tag our history right before we rebase. Running west update gets the tag too, so nobody can lose anything.

However, I would be open to dropping --tags here if we can figure out a way to avoid issues.

My main concern is that if I do this:

git init
git fetch https://github.com/zephyrproject-rtos/zephyr v2.4.0
git update-ref refs/heads/manifest-rev FETCH_HEAD^{commit}
git checkout --detach manifest-rev

Then I have no tags at all, which isn't really what I think we want. I would like to find a solution where fetching a tag gives you a local tag, so you can't lose it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why --tags is NOT the default, avoids large pollution. Gone? No QUAL_REFS_WEST or anything?

       By default, any tag that points into the histories being fetched is
       also fetched; the effect is to fetch tags that point at branches that
       you are interested in. 

...
       -n, --no-tags
           By default, tags that point at objects that are downloaded from the
           remote repository are fetched and stored locally.

I'm curious what you mean by "NOT the default".

Copy link
Contributor Author

@mbolivar-nordic mbolivar-nordic Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a comment indicating that the default behavior changes depending on whether you're fetching from a URL or a remote name.

Edit: I read this backwards the first time

That appears inconsistent with what I'm getting from git 2.30.0:

$ git remote add origin https://github.com/zephyrproject-rtos/zephyr
$ git fetch origin v2.4.0
From https://github.com/zephyrproject-rtos/zephyr
 * tag                     v2.4.0     -> FETCH_HEAD

I still don't get a v2.4.0 tag locally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never creates the possibility for garbage collection of old code,

Can you elaborate why this is a real issue?
From a performance point of view west update would just fetch again and considering the rebase it would have very little to fetch again because most objects are unchanged by a rebase.
Is the (former) lack of uploadpack.allow*SHA1InWant the concern? Either the fallback here or the revision-in / #344 would solve this. Something has to anyway.

I'm curious what you mean by [--tags is] "NOT the default".

I'm merely reading git help fetch.

what I'm getting from git 2.30.0. I still don't get a v2.4.0 tag locally.

I hope there is a solution that does not depend on obscure and changing git implementation details. At least not functionally - performance is of course a different story.

I would not mind some kind of --tags option in the west user interface to support this use case, maybe even turned on by default. What bothers me is hardcoding it at the lowest/deepest level in the code which precludes any "high performance" mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merely reading git help fetch.

What I pasted above is from the same page, and seems to contradict your statement in some ways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think --no-tags is the opposite of --tags. The default seems to be somewhere in the middle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance point of view west update would just fetch again

I'm thinking of things like long airplane rides without network connectivity, where an ill-timed gc is a showstopper until you land and have wifi again.

I hope there is a solution that does not depend on obscure and changing git implementation details

I'm not sure git has changed. Just wanted to share the exact version I was using in case you repro and get a difference.

I hope so too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think --no-tags is the opposite of --tags. The default seems to be somewhere in the middle.

OK, I get you now, and it seems like you're right.

Lol, that is just classic git.

"We have --foo and --no-foo options."
"What's the default?"
"Neither"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find useful to always keep in mind a "decoder ring" for git's user interface. For instance every time I see "git rebase" I mentally translate to "git replay" - cause I'm changing base less often than not. I just added --tags -> --all-tags to my internal decoder ring.

It's far from a silver bullet though, "checkout" and "reset" are lost causes https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 10, 2021

I'm thinking of things like long airplane rides without network connectivity, where an ill-timed gc is a showstopper until you land and have wifi again.

Fetching all tags certainly helps when reverting to an old manifest while on a plane, however it's not even enough in the general case: you could still miss a branch.

I think the only way to resolve the tension between "fetch as little as fast as possible" (think CI) versus "west update backwards on a plane" is to have two different and explicit modes of operations, say fast-fetch versus full-fetch and give the user the choice. Not sure which one should be the default.

When the user asks for "fast-fetching" a SHA1 and it fails because of some missing uploadpack.allow*SHA1InWant setting on the server side, simply fallback on full-fetch for that repo only. Add a logging statement that the optimization failed.

@mbolivar-nordic
Copy link
Contributor Author

When the user asks for "fast-fetching" a SHA1 and it fails because of some missing uploadpack.allow*SHA1InWant setting on the server side, simply fallback on full-fetch for that repo only. Add a logging statement that the optimization failed.

I think that's a reasonable conclusion from this discussion. So I'll close this PR for now and we can keep it in mind when we try to do shallow updates 'for real': that will be something like a west update --shallow which does not use --tags and does something like this instead.

However, I think since the user asked for it explicitly, if it doesn't work, we should just fail the entire west update --shallow process. They can decide what to do (retry, fall back on plain west update, go eat lunch, etc.)

Sound good to you?

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 25, 2021

I think giving #475 and #344 a name should help clarify concepts, I suggest "narrow", see #319 (comment)

I'm not sure why you closed this because it was about "narrow", not "shallow".

Rephrasing what I wrote earlier with my new fancy words: defaulting on the "LARGE" / airplaine mode class for historical reasons seems perfectly fine but it should be possible to opt out. West should never hardcode --tags at the lowest level and especially not use --tags when attempting a DEPTH1 or NARROW. The users must make up their mind: either they want fast CI or they want airplane mode but they can't ask for both at the same time!

You could want to restrict west to only 3 classes and drop "DEFAULT" and fall back on --tags/LARGE when hitting a uploadpack.allow*SHA1InWant blockage but even in that case --tags would not be hardcoded.

@mbolivar-nordic
Copy link
Contributor Author

OK, so you're saying this PR covers 'narrow', and it should be reopened. I think these are good points and I'm in favor of your ideas.

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 25, 2021

BTW git-repo had a "--narrow" option since 2012, it's called sync-c
https://android.googlesource.com/tools/repo/+/HEAD/docs/manifest-format.md#Element-default

It's not clear what 'c' stands for, I found current_branch_only in the code, maybe that's it? Clearly a name not as good as "narrow" ;-)

git-repo's equivalent of revision-in #344 is called upstream, not a great name either.

sync-c "narrowness" was a critical optimization as Google automation tends to create many branches (which also motivated git's protocol-v2). IMHO automation and meta-git tools like west should always use some refs/other/ namespace different from refs/heads/ and refs/tags/ and thay may be happening right now but not everyone does this all the time. In the short term it's more convenient to just use git branch in a script - and regret it later.

I also spotted by chance a later git-repo workaround for uploadpack.allow*SHA1InWant. I think it's auto-generating the revision-in / upstream #344 field. I didn't spend enough time to really understand it but here it is:

https://gerrit-review.googlesource.com/c/git-repo/+/38150

Currently when doing a sync against a revision locked [with "pinned" SHA1] manifest,
sync has no option but to fall back to sync'ing the entire refs space; [...]
This sucks if we're in -c mode; ...

@mbolivar-nordic
Copy link
Contributor Author

Historical note on west's side: the presence of sync-c in repo was actually a motivation for west's current behavior. I figured if they got away with just grabbing everything by default, we could too, and fix it later when the time came.

@mbolivar-nordic
Copy link
Contributor Author

Obsoleted by the --narrow option now in master

@marc-hb marc-hb added the performance How long things take label May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance How long things take
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants