-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch even SHA revisions directly #475
Fetch even SHA revisions directly #475
Conversation
Nobody is calling _fetch() with a rev kwarg; delete this stale code. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
d8b9f7f
to
fdddb30
Compare
fdddb30
to
af9f721
Compare
Since we implemented the feature, GitHub has started to allow fetching SHAs directly. Let's take advantage of that where we can by avoiding a full sync of all the available refs in a remote when it's not needed. Fall back on a "full sync" if that fails. This is a bit wasteful if the SHA is truly gone, but it preserves compatibility and will let the update succeed if GitHub ever changes their minds, or if using another git server with similar restrictions. Signed-off-by: Martí Bolívar <marti.bolivar@nordicsemi.no>
af9f721
to
b8e7a1c
Compare
No changes in last force push, just trying to get past what looks like a temporary CI failure. |
# been extensively tested. | ||
# newly-fetched SHA corresponding to project.revision. | ||
# | ||
# "Hopefully" because there are many ways to spell a revision, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Hopefully" because there are many ways to spell a revision (
git help revisions
)
log.small_banner(msg) | ||
try: | ||
# We can't in general fetch a SHA from a remote, as some hosts | ||
# forbid it for security reasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forbid it for security reasons (see
git help fetch-pack
)
# --tags is required to get tags, since the remote is | ||
# specified as a URL. We hold on to tags to avoid losing track | ||
# of revisions that got left behind by branches that rebase. | ||
project.git(['fetch', '-f', '--tags'] + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--tags
rings a few alarm bells.
1a. Fetching a SHA directly had the promise of a huge fetch optimization for repos with many tags (and branches). Gone?
1b. git tags are somewhat flawed because unlike branches they are all global, no namespace. That's why --tags
is NOT the default, avoids large pollution. Gone? No QUAL_REFS_WEST or anything?
2. Does this solve a real problem? A SHA checked out (HEAD) will never be garbage-collected. So the only problem is when the manifest moves away from this SHA and returns to it (why?) after gc.reflogExpire
which defaults to 90 days. Then it will merely be fetched again, not a big deal.
3a If network usage issue 2. is really a problem for someone (for instance because they have a crazy short gc.reflogExpire
), they can "solve" that issue in 2 seconds by merely slapping a local branch or local tag onto that commit
3b. Something like revision-in
/ #344 would also avoid this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--tags
rings a few alarm bells.
Yep. This is the crux of this PR, thanks for finding it right away 😄 .
Context:
-
Nordic's SDK rebases zephyr whenever we cut a new release, to keep a linear history of downstream patches, but allowing
git merge
of upstream zephyr during development. -
We've historically relied on --tags being here to make sure that jumping around between versions during development never creates the possibility for garbage collection of old code, because we tag our history right before we rebase. Running
west update
gets the tag too, so nobody can lose anything.
However, I would be open to dropping --tags here if we can figure out a way to avoid issues.
My main concern is that if I do this:
git init
git fetch https://github.com/zephyrproject-rtos/zephyr v2.4.0
git update-ref refs/heads/manifest-rev FETCH_HEAD^{commit}
git checkout --detach manifest-rev
Then I have no tags at all, which isn't really what I think we want. I would like to find a solution where fetching a tag gives you a local tag, so you can't lose it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why
--tags
is NOT the default, avoids large pollution. Gone? No QUAL_REFS_WEST or anything?
By default, any tag that points into the histories being fetched is
also fetched; the effect is to fetch tags that point at branches that
you are interested in.
...
-n, --no-tags
By default, tags that point at objects that are downloaded from the
remote repository are fetched and stored locally.
I'm curious what you mean by "NOT the default".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a comment indicating that the default behavior changes depending on whether you're fetching from a URL or a remote name.
Edit: I read this backwards the first time
That appears inconsistent with what I'm getting from git 2.30.0:
$ git remote add origin https://github.com/zephyrproject-rtos/zephyr
$ git fetch origin v2.4.0
From https://github.com/zephyrproject-rtos/zephyr
* tag v2.4.0 -> FETCH_HEAD
I still don't get a v2.4.0 tag locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never creates the possibility for garbage collection of old code,
Can you elaborate why this is a real issue?
From a performance point of view west update
would just fetch again and considering the rebase it would have very little to fetch again because most objects are unchanged by a rebase.
Is the (former) lack of uploadpack.allow*SHA1InWant
the concern? Either the fallback here or the revision-in
/ #344 would solve this. Something has to anyway.
I'm curious what you mean by [--tags is] "NOT the default".
I'm merely reading git help fetch
.
what I'm getting from git 2.30.0. I still don't get a v2.4.0 tag locally.
I hope there is a solution that does not depend on obscure and changing git implementation details. At least not functionally - performance is of course a different story.
I would not mind some kind of --tags
option in the west user interface to support this use case, maybe even turned on by default. What bothers me is hardcoding it at the lowest/deepest level in the code which precludes any "high performance" mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm merely reading
git help fetch
.
What I pasted above is from the same page, and seems to contradict your statement in some ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think --no-tags
is the opposite of --tags
. The default seems to be somewhere in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a performance point of view
west update
would just fetch again
I'm thinking of things like long airplane rides without network connectivity, where an ill-timed gc is a showstopper until you land and have wifi again.
I hope there is a solution that does not depend on obscure and changing git implementation details
I'm not sure git has changed. Just wanted to share the exact version I was using in case you repro and get a difference.
I hope so too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think
--no-tags
is the opposite of--tags
. The default seems to be somewhere in the middle.
OK, I get you now, and it seems like you're right.
Lol, that is just classic git.
"We have --foo and --no-foo options."
"What's the default?"
"Neither"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find useful to always keep in mind a "decoder ring" for git's user interface. For instance every time I see "git rebase" I mentally translate to "git replay" - cause I'm changing base less often than not. I just added --tags
-> --all-tags
to my internal decoder ring.
It's far from a silver bullet though, "checkout" and "reset" are lost causes https://git-scm.com/book/en/v2/Git-Tools-Reset-Demystified
Fetching all tags certainly helps when reverting to an old manifest while on a plane, however it's not even enough in the general case: you could still miss a branch. I think the only way to resolve the tension between "fetch as little as fast as possible" (think CI) versus "west update backwards on a plane" is to have two different and explicit modes of operations, say When the user asks for "fast-fetching" a SHA1 and it fails because of some missing |
I think that's a reasonable conclusion from this discussion. So I'll close this PR for now and we can keep it in mind when we try to do shallow updates 'for real': that will be something like a However, I think since the user asked for it explicitly, if it doesn't work, we should just fail the entire Sound good to you? |
I think giving #475 and #344 a name should help clarify concepts, I suggest "narrow", see #319 (comment) I'm not sure why you closed this because it was about "narrow", not "shallow". Rephrasing what I wrote earlier with my new fancy words: defaulting on the "LARGE" / airplaine mode class for historical reasons seems perfectly fine but it should be possible to opt out. West should never hardcode You could want to restrict west to only 3 classes and drop "DEFAULT" and fall back on |
OK, so you're saying this PR covers 'narrow', and it should be reopened. I think these are good points and I'm in favor of your ideas. |
BTW git-repo had a "--narrow" option since 2012, it's called It's not clear what 'c' stands for, I found git-repo's equivalent of
I also spotted by chance a later git-repo workaround for https://gerrit-review.googlesource.com/c/git-repo/+/38150
|
Historical note on west's side: the presence of sync-c in repo was actually a motivation for west's current behavior. I figured if they got away with just grabbing everything by default, we could too, and fix it later when the time came. |
Obsoleted by the --narrow option now in master |
Since we implemented the feature, GitHub has started to allow fetching
SHAs directly. Let's take advantage of that where we can by avoiding a
full sync of all the available refs in a remote when it's not needed.
Fall back on a "full sync" if that fails. This is a bit wasteful if
the SHA is truly gone, but it preserves compatibility and will let the
update succeed if GitHub ever changes their minds, or if using another
git server with similar restrictions.