-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace GitPython with pygit2 #2120
base: main
Are you sure you want to change the base?
Conversation
f6dadb8
to
e90a067
Compare
Replace the use of GitPython package with pygit2. The latter seems to have better git support, in particular it supports the newer index versions 3 and 4. Since it is backed by the libgit2 library that is also used by Cargo, it seems to have the best chances of being updated for compatibility with new git versions. Admittedly, the API feels very low-level. In particular, it is necessary to explicitly request writing changes to index back, and explicitly reread it when it's modified externally (e.g. via another `pygit2.Repository` instance, as in tests). On the plus side, it does not invoke `git` at all -- everything is done by the library. Fixes conda-forge#2116
Remove the `search_parent_directories` kwarg that's never been used, and instead always enable searching parent directories for better cross-version pygit2 compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API change and is not allowed under semantic versioning. Please put back this functionality.
This reverts commit a21d135.
Restored, and instead added backwards compatibility for |
@beckermr, another use is in In [10]: list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))
File ~/git/conda-smithy/conda_smithy/feedstocks.py:197, in feedstocks_repos(organization, feedstocks_directory, pull_up_to_date, randomise, regexp)
195 for feedstock in feedstocks:
196 repo = git.Repo(feedstock.directory)
--> 197 upstream = repo.remotes.upstream
199 if pull_up_to_date:
200 print("Fetching ", feedstock.package)
File ~/miniforge3/envs/conda-smithy/lib/python3.12/site-packages/git/util.py:1198, in IterableList.__getattr__(self, attr)
1196 return item
1197 # END for each item
-> 1198 return list.__getattribute__(self, attr)
AttributeError: 'IterableList' object has no attribute 'upstream' FWICS |
We need to migrate and fit. |
Ah, sorry — I was wrong, it wasn't broken. I've just realized it expected a remote called |
Another question: do we need SSH support for
Of course, another option is to call |
Anything that is currently supported in the code needs to be supported in this PR. We cannot have API changes like this. |
Is it okay to call |
Subprocesses are fine, but we should be mindful of how complex this might get. The point of this PR is to support new git index versions. IDK anything about those. Are these in use? How have we not hit bugs for this before new? |
Subprocesses will probably be less complex than doing everything via libgit2. Another option is to stick to GitPython for some of the code, at least until it actually breaks for someone. From what I understand, to hit this issue you need to use new enough git to clone the repository. It is also possible that the repository itself must have some characteristics that actually trigger the use of new index format. Maybe I was just unlucky that the first feedstock that I've cloned triggered this, or it is possible that more people will hit this as they upgrade their git to new versions and clone new feedstocks. |
I'd rather have one tool used and defer to subprocesses. Let's use a subprocess. |
I've updated the remaining code. However, I still need to update the tests. I'll do that tomorrow. However, feel free to point out if you don't like some of the changes and would prefer a different approach. |
) | ||
if search_parent_directories: | ||
path = pygit2.discover_repository(path) | ||
if path is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if path is none
? Should we raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original behavior was to return repo
, i.e. None
then, and I've preserved that. In other words, if anything fails (pygit2
is not installed, there is no repo, opening repo fails for some reason), the function returns None
, as it used to.
clone = pygit2.clone_repository(repo.clone_url, clone_directory) | ||
clone.remotes.delete("origin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So wait, now we can clone? I thought we were using git
subprocesses for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, libgit2 can clone directly. However, as noted in the PR discussion this (as implemented here) works for public HTTPS repos only. Using libgit2 to clone via SSH would require fixed libgit2 on conda and explicit credential handling which would be a lot of code and not 100% equivalent to external git anyway.
I think we just need a news item now! |
Ok, I'm done updating tests. I think the code is ready now. |
Looks like some tests are still using |
My bad. Really should've uninstalled GitPython when testing locally. |
I'm really sorry about that. Fixed now. Tests pass for me now with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a news item.
Added. |
Co-authored-by: Matthew R. Becker <beckermr@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR changes some pretty low-level functionality. The test suite uses mocks which is great, but we don't have tests of functioning code for all of the changes. We need to test these changes live for both staged-recipes and feedstock token handling before we can merge them.
I am happy to do the testing of these changes live, but it will be a bit before I can get to it. |
Not sure if you're asking me to do anything here. If you're asking whether I've tested them locally, I did test calling every function with some args suitable for local/semi-remote testing. |
Yep, nothing for you to do here. Sorry for blocking this one. Testing much of the code in smithy is hard since it runs against external services. :/ |
Yeah, I know. If I only had more time, I'd have added some more tests. |
Checklist
news
entrypython conda_smithy/schema.py
)Replace the use of GitPython package with pygit2. The latter seems to have better git support, in particular it supports the newer index versions 3 and 4. Since it is backed by the libgit2 library that is also used by Cargo, it seems to have the best chances of being updated for compatibility with new git versions.
Admittedly, the API feels very low-level. In particular, it is necessary to explicitly request writing changes to index back, and explicitly reread it when it's modified externally (e.g. via another
pygit2.Repository
instance, as in tests). On the plus side, it does not invokegit
at all -- everything is done by the library.Fixes #2116
So far focused on
feedstock_io.py
and its tests. I need to figure out how to test the changes to other files properly, given that the tests mock the entiregit.Repo.clone_from
call.