Skip to content

Win http fetch #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Win http fetch #213

wants to merge 2 commits into from

Conversation

jamill
Copy link
Member

@jamill jamill commented Sep 12, 2012

An updated Fetch implementation based off of @ben's repository, from suggestions in Pull Request #206. This includes a basic test for fetching from a remote into an empty repository. I made some choices on how / where to expose this functionality, so please let me know if you have thoughts / comments / questions.

Missing functionality:

  • It does not appear that libgit2 fetches tags
    • Tests do no cover fetching tags
  • Completion callback in git_remote_callbacks does not appear to be called by libgit2
    • Completion callback not plumbed through LibGit2Sharp

Implementation:

  • I have exposed the git_remote_callbacks as a property on the FetchProgress object that is passed into Fetch(). This is slightly different than how it is organized in libgit2 (where the callbacks are added to the remote itself). I went this way because the lifetime of these callbacks seems tied to the actual fetch operation itself, at least in how I see callers of the LibGit2Sharp API. If I am missing something and this is not the appropriate place, please let me know.
  • I slighty changed the signature on the clone interface to so that callers of the LibGit2Sharp do not have to include the LibGitSharp.Core namespace. Instead, I added a wrapper around GitIndexerStats for LibGit2Sharp callers, and exposed the uint fields as long properties.
  • I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

Issues:

I have seen intermittent failures from libgit2 during both the clone and fetch tests that I have not tracked down at this point (as part of call to git_remote_update_tips in fetch, and git_clone in Clone).

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

// the data in the git_remote_callbacks structure. If, in the future, libgit2 changes its implementation
// to store a reference to the git_remote_callbacks structure this would introduce a subtle bug
// where the managed layer could move the git_remote_callbacks to a different location in memory,
// but libgit2 would still reference the old address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💕 comments like this.

@nulltoken
Copy link
Member

@jamill Wow. That's a very detailed PR. Thanks a lot for this!

It does not appear that libgit2 fetches tags

It doesn't do this by default. Yet. (@carlosmn ;-) )
However, maybe would it make sense to "emulate" this in C# meanwhile. Once the native implementation is done, we would get rid of the managed code (provided the tests still pass).

A naive implementation could be, once the heads have been fetched, to create a temporary remote targeting the same url and bearing a different refspec (refs/tags/*:refs/tags/* should work), fetch from this remote, then drop the remote. Provided we make this work, this might do this trick at the cost of an additional fetch.

I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

@ben @carlosmn Have you encountered this as well?

@ben
Copy link
Member

ben commented Sep 13, 2012

I've seen it, and I can reproduce it fairly regularly using the libgit2sharp unit tests. I'll do some digging today.

@ben
Copy link
Member

ben commented Sep 13, 2012

Found the root cause of the "OID doesn't exist" problem; a fix is pending on libgit2/libgit2#932. This will most likely only happen with REALLY small repos.

@nulltoken
Copy link
Member

@ben ❤️

@jamill
Copy link
Member Author

jamill commented Sep 13, 2012

@ben Thanks!

@nulltoken

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

sure!

I will look at "emulating" the tag fetching in C#.

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method. The other approach would have been to expose loading the remote, downloading the packfile, updating the reference tips as seperate operations (including managing the lifetime of the remote handle and remote connection between - as they would have to be kept alive for longer than the single call on the public method)

@nulltoken
Copy link
Member

I will look at "emulating" the tag fetching in C#.

Great!

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method.

I completely agree with this. Let's make the standard operations easy to deal with.

@dahlbyk
Copy link
Member

dahlbyk commented Sep 14, 2012

Even if it's not the primary API, a useful overload might look something like this...

public delegate void FetchProgressHandler(string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);

@uluhonolulu
Copy link
Contributor

+1 on handlers vs events -- makes consuming easier.

@jamill
Copy link
Member Author

jamill commented Sep 15, 2012

@dahlbyk , @uluhonolulu Thanks for the feedback! I don't mind exposing these callbacks as delegates - I would prefer to only expose one way or the other, but I don't mind if that way is as delegates.

}
finally
{
if (remoteHandle != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even valid for remoteHandle to be null here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the null check here.

@dahlbyk
Copy link
Member

dahlbyk commented Sep 16, 2012

I'll cast my vote for delegates:

FetchProgress progress = new FetchProgress();
progress.RemoteCallbacks.UpdateTipsChanged += new System.EventHandler<UpdateTipsChangedEventArgs>(helper.RemoteUpdateTipsHandler);
repo.Fetch(remote, progress);

vs

repo.Fetch(remote, onUpdateTips = helper.RemoteUpdateTipsHandler);

Can/should Fetch() support cancellation?

@jamill
Copy link
Member Author

jamill commented Sep 17, 2012

Can/should Fetch() support cancellation?

I think it seems reasonable that Fetch() would support cancellation - however, I do not see this supported in libgit2's fetch implementation. I think this is something that would be handled at a layer below LibGit2Sharp...

@dahlbyk
Copy link
Member

dahlbyk commented Sep 17, 2012

Is calling git_remote_disconnect mid-request not supported?

@jamill
Copy link
Member Author

jamill commented Sep 18, 2012

Is calling git_remote_disconnect mid-request not supported?

I see, thanks! Looks like that would cancel the network transfer portion of the fetch (where I would expect the majority of the time is). I like the idea of fetch being cancelable, but I would like think through some of the details a bit more before I would go ahead and implement it. I wouldn't expect the Fetch interface to change much to support cancelation.

I will expose the callbacks as delegates instead of events. I had a question about the other part of your API suggestion. The signature of the Fetch method overload that you proposed takes some extra parameters (one for each of the callbacks) and removes the progress reporting parameter (containing the bytes received and the GitIndexerStats). Is the scenario that you have in mind that a caller would only care about the callbacks and not the other progress information? Or, were you suggesting that there should be a flavor of the Fetch method that can take each of the arguments individually as optional parameters.

That is, is:

public void Fetch(Remote remote,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

the full signature of the overload, or were you suggesting something more along the lines of:

public void Fetch(Remote remote,
              ref long bytes,
              ref GitIndexerStats stats,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

Thank you again for you feedback.

@dahlbyk
Copy link
Member

dahlbyk commented Sep 18, 2012

My first reaction was to just return the FetchProgress:

public FetchProgress Fetch(Remote remote,
                           FetchProgressHandler onProgress = null,
                           UpdateTipsHandler onUpdateTips = null,
                           Action onCompleted = null);

However, it seems reasonable to assume onProgress would want access to the progress details, so maybe this makes more sense?

public delegate void FetchProgressHandler(FetchProgress progress, string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);

Or you could return the FetchProgress but also expose it to the callback.

@uluhonolulu
Copy link
Contributor

+1 for public delegate void FetchProgressHandler(FetchProgress progress, string message); ref parameters should be avoided IMO -- make the code much more verbose than it has to be.

Returning FetchProgress seems reasonable if we want to have the Cancel method hanging off it.

var progress = repo.Fetch(..);
Button1.Click += () => progress.Cancel();

In that case, C# grammar requires we both return it and use it in the delegate, since we cannot write a lambda for it.

Anyway, let's try and write a couple of tests, and see which way is less painful.

@@ -83,7 +83,10 @@ public void Dispose()
{
foreach (string directory in directories)
{
DirectoryHelper.DeleteDirectory(directory);
if (Directory.Exists(directory))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give me some more context about this?
When would this check be required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had hit an issue running the CanCloneARepository(...) tests where the test would create the SelfCleaningDirectory object and register the temporary directory path for deletion, but the test would fail in Repository.Clone(...) before the temporary directory actually got created. On cleanup, the test would throw an exception indicating that it could not find the path of the temporary directory (as it does not exist). The end result is that the test reports that it failed to find the path of the temporary directory (which hides the original exception).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your answer. Indeed, that makes sense.

However, I'm afraid this check could hide that something may have went wrong earlier (a wrongly written test for instance). Could you think of a way to let the user know about this not expected state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now (especially as the clone is no longer part of the test suite) I will remove this change so we can concentrate on the core Fetch changes.

@jamill
Copy link
Member Author

jamill commented Sep 18, 2012

@dahlbyk, @uluhonolulu Regarding the Fetch method:

  1. As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure. This is also why there are ref parameters. I could look at making method this asynchronous, but it did not appear that were any other asynchronous methods at the moment, which is one reason I did not go down this route. Has there been other discussion(s) on how to handle long running operations in LibGit2Sharp (push, pull, fetch, clone, etc)?

  2. I could see the progress datastructure could be useful to the OnProgress callback. Currently the OnProgress matches the libgit2 progress callback. How do we feel about breaking this mapping here - I know the high level design goals indicate that LibGit2Sharp is not a 1:1 mapping of libgit2, so maybe this would not be a problem.

@nulltoken
Copy link
Member

@ben regarding the Clone implementation, could you please get rid of the GitCheckoutOptions?

@nulltoken
Copy link
Member

@jamill @ben I'm not in favor of exposing the IndexerStats as it is. Unless I'm wrong, consuming it would require a separate thread.

@carlosmn How complex would that be (or is it even feasible) to make the libgit2 indexing process invoke a progress callback?

@nulltoken
Copy link
Member

@jamill

Regarding the API:

I'm not sure about Events..

I also agree with @dahlbyk and @uluhonolulu. Let's go down the delegates road :)

Is calling git_remote_disconnect mid-request not supported?

When callbacks are involved, usually returning something different from 0 is the "conventional" way to interrupt the process. Hopefully, the remote would be disconnected.

Can/should Fetch() support cancellation?

I'm not even sure this scenario has been tested at the libgit2 level. /cc @carlosmn

@nulltoken
Copy link
Member

@jamill

  1. As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure.

FWIW libgit2/libgit2#890 is trying to tackle this issue (/cc @ben)

@nulltoken
Copy link
Member

@jamill @ben Provided I do not encounter any regression while running the LibGit2Sharp test suite, I'm going to update vNext very soon with a newer version of libgit2 as soon as libgit2/libgit2#932 is merged.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

/cc @arrbee

@jamill
Copy link
Member Author

jamill commented Sep 18, 2012

@nulltoken @ben @dahlbyk

Thanks, so a couple of questions:

1. How to report progress

My understanding of the current state of things is that currently, the best way to get some of the progress information for git_indexer_stats and the bytes is by passing them in as ref parameters to the fetch call.

  • Including these parameters as part of the OnProgress callback does not look like it would work, as it appears that the git_indexer_stat datastructure and the OnProgress callbacks are updated at different points of the fetch process.
    This has several downsides, in particular that it can be akward to work with.
  • This has the downside that this can be akward for callers (need to be queried on a different thread, method signature has ref parameters)

However, there is some work going on to make update progress reporting:

  • With A real struct for indicating progress libgit2#890, the progress callback should report more interperatable information.
  • If it is possible to for the indexing process to invoke a callback, then we could at least get indexing progress on the same thread. I will wait for @carlosmn for more on this

With these items, it seems we would not have to query the progress datastructures on a seperate thread for progress.

2. Whether there should be an asynchronous fetch method

From the API usage examples above, it seems that an asynchronous version of this method would be nice. In the clone / fetch examples for libgit2 the caller is responsible for running clone / fetch on a background thread. Should LibGit2Sharp follow this pattern? Alternatively, should LibGit2Sharp provide an asynchronous method that will go ahead and run the long running operation on the background thread and return immediately.

Even if there was an async version of Fetch, it seems that a synchronous version is needed. Given the above, I propose the following to move forward (in addition to addressing other feedback):

  • Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.
  • Leave an async version of fetch for future work.

Thoughts?

@nulltoken
Copy link
Member

Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.

👍 Even if I tend to avoid delivering API that will be broken by next version, Fetch has been delayed for too long.

Leave an async version of fetch for future work.

👍 as well.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

Unfortunately, libgit2/libgit2#932 cannot be merged right now. However, in order to not keep you holding, vNext has been updated with the tip of libgit2 development branch (libgit2/libgit2@411cb01).

Please note that a Proxy class has been added since the benstraub/clone branch was created. It's now required to go through this layer when invoking NativeMethods.

@dahlbyk
Copy link
Member

dahlbyk commented Sep 18, 2012

👍 from me as well.

Thanks for clearing up the difference between the values exposed through FetchProgress and the onProgress callback. A callback for stats seems like a better option there.

@uluhonolulu
Copy link
Contributor

I'm all for release now, enhance later.

@ben
Copy link
Member

ben commented Sep 19, 2012

I'm actually thinking that passing a structure that receives progress info is a Bad Idea. What if the GC moves the structure around on the managed heap during the fetch? The C code wouldn't have a clue, and Bad Things could happen.

Given this, we should probably double-down on delegates for reporting progress, which means I'll have to think about how to make that happen with clone. Unfortunately I'm jet-lagged today and speaking at a conference tomorrow morning, so I can't read through @jamill's code to figure out how it should be done. Hopefully I'll have a chance in the next couple of days.

@jamill
Copy link
Member Author

jamill commented Sep 19, 2012

@ben From a correctness point of view, I think passing structures that receive progress info is OK - The progress structures are pinned by the interop layer during the call to git_remote_download and should not be moved by GC (git_remote_download is synchronous and will not return until fetch is complete).

@carlosmn
Copy link
Member

carlosmn commented Oct 2, 2012

The code to auto-follow tags and create them has been merged to libgit2, you'll get that whenever you update the library.

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

@nulltoken
Copy link
Member

@carlosmn ❤️

@jamill I'll update vNext with a more recent version of libgit2 in the following days.

@jamill
Copy link
Member Author

jamill commented Oct 3, 2012

@carlosmn , @nulltoken Thanks! I will move back on to the main development branch. I will get back to this shortly and update the change based on the feedback.

@dahlbyk
Copy link
Member

dahlbyk commented Oct 3, 2012

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

I'd go with what's easiest. If they're equally easy, go with the more granular. If it's called more often than a consumer cares for, they can throttle it.

@nulltoken
Copy link
Member

@jamill vNext has been updated with libgit2/libgit2@9adfa7d

@jamill jamill mentioned this pull request Oct 9, 2012
@jamill
Copy link
Member Author

jamill commented Oct 9, 2012

@nulltoken @dahlbyk @carlosmn @uluhonolulu : I have created a new PR with the changes based off of vNext (I did not see a way to update the branch the PR is based off of - if I missed something, please let me know).

Again, thank you everyone for the feedback so far. I have addressed much of it in the new PR.

@nulltoken
Copy link
Member

Closing this one, as all the love is now redirected to #221

@nulltoken nulltoken closed this Oct 10, 2012
@jamill jamill mentioned this pull request Oct 12, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants