Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cabal.project: source-package-repository with submodule #5536

Closed
ghost opened this issue Aug 21, 2018 · 15 comments · Fixed by #7625
Closed

cabal.project: source-package-repository with submodule #5536

ghost opened this issue Aug 21, 2018 · 15 comments · Fixed by #7625

Comments

@ghost
Copy link

ghost commented Aug 21, 2018

If you have a source-repository-package of type git in cabal.project, and that project has a submodule, then the current arguments passed in VCS.vcsGit are insufficient. I think vcsGit's cloneArgs ought to, by default, use --recursive. Otherwise, files needed for building might be missing.

@ghost
Copy link
Author

ghost commented Aug 21, 2018

Found while referencing hlibgit2 from https://github.com/jwiegley/gitlib/. /cc @23Skidoo.

@ghost
Copy link
Author

ghost commented Aug 21, 2018

From cabal.project:

source-repository-package
  type:     git
  location: https://github.com/jwiegley/gitlib
  subdir:   gitlib
  tag:      7d0edc372e839c2716bea1311b61b09ce783f801

-- work around Setup.hs issue in hlibgit2
-- https://github.com/jwiegley/gitlib/issues/82
allow-newer: hlibgit2:Cabal
source-repository-package
  type:     git
  location: https://github.com/jwiegley/gitlib
  subdir:   hlibgit2
  tag:      7d0edc372e839c2716bea1311b61b09ce783f801

source-repository-package
  type:     git
  location: https://github.com/jwiegley/gitlib
  subdir:   gitlib-libgit2
  tag:      7d0edc372e839c2716bea1311b61b09ce783f801

@hvr
Copy link
Member

hvr commented Aug 21, 2018

@Tuncer ...would you be willing to try creating a PR? The tricky part is probably making sure that submodules are properly updated/setup/removed/added when you change the tag value and the git repo needs to be updated

@ghost
Copy link
Author

ghost commented Aug 25, 2018

Don't think I can get to it in time for 2.5, so feel free to take over.

Concerning the implementation:

  • I don't know how submodule removals would be handled. Is there a reliable method for that?
  • git submodule init && git submodule update after a fetch --all and merge sounds reasonable to me.
  • Do we have to consider modified submodules, and is that a common multi-repo development workflow? If not, a revert before submodule update makes sense.
  • Since we'll have a code path doingsubmodule update anyway, we can skip --recursive if needed.

@hvr
Copy link
Member

hvr commented Aug 26, 2018

@Tuncer maybe the following would be a reasonable 80/20 solution to sync submodules after having checked out a new commit:

git submodule sync --recursive
git submodule update --init --force --recursive
git submodule foreach --recursive 'git clean -ffxdq'
git clean -ffxdq

this should work in most case and only choke on corner cases

...and there's of course also the brutal but robust approach: whenever a different commit needs to be checked out, start from scratch; i.e. remove the git checkout completely and git clone afresh...

or we could remove everything but the top-level .git folder ourselves, if we want to retain the local git object store cache (which includes also all submodule's git object caches)...

@ghost
Copy link
Author

ghost commented Aug 27, 2018

The four git invocations look good to me. Since you suggest to additionally remove clones explicitly, I wonder why you wrote clean -ff. Doesn't -ff (compared to single -f) remove .git, too?

What do you think about generally using reset --hard <TAG> (instead of checkout), since that's what we're asked to do, anyway?

As an always-works/brute-force method, I can see the advantage of nuking everything but .git and checking out fresh. This might be easier than potentially dealing with some project's .gitignore conflicting with what we want to delete. I wouldn't remove .git, though, since the re-download can be expensive.

@ghost
Copy link
Author

ghost commented Aug 27, 2018

Extra idea that's not required for the initial implementation:

Rust's cargo manages git dependencies by storing them in ~/.cargo/git/, where ~/.cargo/git/db/ has bare repos and ~/.cargo/git/checkouts/ references the bare repo as its remote. That would require garbage-collection to be practical, but the idea of downloading a repo from a unique remote only once per user sounds good to me.

@hvr
Copy link
Member

hvr commented Aug 27, 2018

I suggested git clean -ff because it tries to remove dead folders more aggresively even if they contain .git entries; but afaik it only ever deletes .gits of submodules, never the real top-level .git folder

git reset --hard sounds good

What cargo does sounds interesting, but it also sounds more complicated to implement? do you happen to know what steps would be involved in maintaining a global git cache? how well does this work w/ cargo? can we steal the logic from cargo?

@ghost
Copy link
Author

ghost commented Aug 27, 2018

In that case, we can just nuke everything but the outer .git and have simpler code, can't we? And, if we do that, we're pretty much at a point where a backing bare repo makes more sense.

I just noticed cargo's git repo sharing and it reminded me of how Darcs shares patches, but I don't know the details of their implementation.

Since I suspect users would demand a cache in different levels/places, perhaps starting with a bare+checkout model inside dist-newstyle would be sufficient and easy enough? Later we could look into a shared cache and the complexity involved with that.

cd dist-newstyle/vcs/git/<dep>
git fetch --all
cd -
cd src
rm -fr <dep>
git clone ../vcs/git/<dep>
git reset --hard <TAG>
git submodule.....

@hvr
Copy link
Member

hvr commented Aug 28, 2018

Hrm, but if git clone ../vcs/git/<dep> this wouldn't take care of caching submodule's git object stores (which are stored in .git/modules/* but are only populated if you actually clone a repo in a non-bare mode; i.e. if you git clone --bare --recursive ... they won't be taken care of by git), would it?

I'm starting to think we should just start w/ the basic idea of

we can just nuke everything but the outer .git and have simpler code

and see how far we get with that approach; cabal can easily keep track of which commit its internal hidden git clone of a repo is expected to be at; so it can easily detect whenever the cabal.project was changed to request a different commit -- and then start performing the "wipe everything but the top-level .git folder and reset/checkout anew"-steps

Does this sound reasonable? Does it address your requirements?

PS: Btw, another kind of tooling we could look at is how CI buildbots manage to clone new states of a Git repo while keeping a cache the git objects

@ghost
Copy link
Author

ghost commented Aug 28, 2018

Yes, the simple method we discussed ought to work. Since smart caching is a secondary concern and unrelated to recursive clones, let's tackle submodules first. In fact, smart caching is an optimization and won't block 2.5. So, maybe we should open a separate ticket.

I don't think all CI systems reuse git objects, but it's interesting to investigate, sure.

@mpickering
Copy link
Collaborator

Still an issue with a recent version of cabal.

@patrickt
Copy link
Contributor

patrickt commented Jun 3, 2019

This is currently blocking semantic from using source-package-repository for all its vendored dependencies. We’d love to see a fix!

@patrickt
Copy link
Contributor

patrickt commented Jun 3, 2019

Upon reflection, would it be possible to simulate this with a custom Setup.hs?

@23Skidoo
Copy link
Member

23Skidoo commented Jun 3, 2019

/cc @dcoutts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants