Win http fetch #213

jamill · 2012-09-12T15:29:27Z

An updated Fetch implementation based off of @ben's repository, from suggestions in Pull Request #206. This includes a basic test for fetching from a remote into an empty repository. I made some choices on how / where to expose this functionality, so please let me know if you have thoughts / comments / questions.

Missing functionality:

It does not appear that libgit2 fetches tags
- Tests do no cover fetching tags
Completion callback in git_remote_callbacks does not appear to be called by libgit2
- Completion callback not plumbed through LibGit2Sharp

Implementation:

I have exposed the git_remote_callbacks as a property on the FetchProgress object that is passed into Fetch(). This is slightly different than how it is organized in libgit2 (where the callbacks are added to the remote itself). I went this way because the lifetime of these callbacks seems tied to the actual fetch operation itself, at least in how I see callers of the LibGit2Sharp API. If I am missing something and this is not the appropriate place, please let me know.
I slighty changed the signature on the clone interface to so that callers of the LibGit2Sharp do not have to include the LibGitSharp.Core namespace. Instead, I added a wrapper around GitIndexerStats for LibGit2Sharp callers, and exposed the uint fields as long properties.
I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

Issues:

I have seen intermittent failures from libgit2 during both the clone and fetch tests that I have not tracked down at this point (as part of call to git_remote_update_tips in fetch, and git_clone in Clone).

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

ben · 2012-09-12T19:47:19Z

LibGit2Sharp/Repository.cs

+                // the data in the git_remote_callbacks structure. If, in the future, libgit2 changes its implementation
+                // to store a reference to the git_remote_callbacks structure this would introduce a subtle bug
+                // where the managed layer could move the git_remote_callbacks to a different location in memory,
+                // but libgit2 would still reference the old address.


💕 comments like this.

nulltoken · 2012-09-13T09:46:24Z

@jamill Wow. That's a very detailed PR. Thanks a lot for this!

It does not appear that libgit2 fetches tags

It doesn't do this by default. Yet. (@carlosmn ;-) )
However, maybe would it make sense to "emulate" this in C# meanwhile. Once the native implementation is done, we would get rid of the managed code (provided the tests still pass).

A naive implementation could be, once the heads have been fetched, to create a temporary remote targeting the same url and bearing a different refspec (refs/tags/*:refs/tags/* should work), fetch from this remote, then drop the remote. Provided we make this work, this might do this trick at the cost of an additional fetch.

I expose the libgit2 C callbacks as Events on the LibGit2Sharp interface.

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

LibGit2Sharp.LibGit2SharpException : An error was raised by libgit2. Category = Reference (Error).
Target OID for the reference doesn't exist on the repository

@ben @carlosmn Have you encountered this as well?

ben · 2012-09-13T17:22:27Z

I've seen it, and I can reproduce it fairly regularly using the libgit2sharp unit tests. I'll do some digging today.

ben · 2012-09-13T21:39:41Z

Found the root cause of the "OID doesn't exist" problem; a fix is pending on libgit2/libgit2#932. This will most likely only happen with REALLY small repos.

nulltoken · 2012-09-13T21:43:11Z

@ben ❤️

jamill · 2012-09-13T22:23:00Z

@ben Thanks!

@nulltoken

I'm not sure about Events.. Please give me a couple of days to contemplate your proposal...

sure!

I will look at "emulating" the tag fetching in C#.

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method. The other approach would have been to expose loading the remote, downloading the packfile, updating the reference tips as seperate operations (including managing the lifetime of the remote handle and remote connection between - as they would have to be kept alive for longer than the single call on the public method)

nulltoken · 2012-09-13T22:29:38Z

I will look at "emulating" the tag fetching in C#.

Great!

The other thing I wanted to ask your thoughts on was the fact that I have encapsulated the entire fetch operation in a basically a single method.

I completely agree with this. Let's make the standard operations easy to deal with.

dahlbyk · 2012-09-14T02:10:59Z

Even if it's not the primary API, a useful overload might look something like this...

public delegate void FetchProgressHandler(string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);

uluhonolulu · 2012-09-14T07:21:20Z

+1 on handlers vs events -- makes consuming easier.

jamill · 2012-09-15T21:12:51Z

@dahlbyk , @uluhonolulu Thanks for the feedback! I don't mind exposing these callbacks as delegates - I would prefer to only expose one way or the other, but I don't mind if that way is as delegates.

dahlbyk · 2012-09-16T05:08:32Z

LibGit2Sharp/Repository.cs

+            }
+            finally
+            {
+                if (remoteHandle != null)


Is it even valid for remoteHandle to be null here?

I will remove the null check here.

dahlbyk · 2012-09-16T05:12:26Z

I'll cast my vote for delegates:

FetchProgress progress = new FetchProgress();
progress.RemoteCallbacks.UpdateTipsChanged += new System.EventHandler<UpdateTipsChangedEventArgs>(helper.RemoteUpdateTipsHandler);
repo.Fetch(remote, progress);

vs

repo.Fetch(remote, onUpdateTips = helper.RemoteUpdateTipsHandler);

Can/should Fetch() support cancellation?

jamill · 2012-09-17T15:03:12Z

Can/should Fetch() support cancellation?

I think it seems reasonable that Fetch() would support cancellation - however, I do not see this supported in libgit2's fetch implementation. I think this is something that would be handled at a layer below LibGit2Sharp...

dahlbyk · 2012-09-17T16:33:15Z

Is calling git_remote_disconnect mid-request not supported?

jamill · 2012-09-18T02:11:39Z

Is calling git_remote_disconnect mid-request not supported?

I see, thanks! Looks like that would cancel the network transfer portion of the fetch (where I would expect the majority of the time is). I like the idea of fetch being cancelable, but I would like think through some of the details a bit more before I would go ahead and implement it. I wouldn't expect the Fetch interface to change much to support cancelation.

I will expose the callbacks as delegates instead of events. I had a question about the other part of your API suggestion. The signature of the Fetch method overload that you proposed takes some extra parameters (one for each of the callbacks) and removes the progress reporting parameter (containing the bytes received and the GitIndexerStats). Is the scenario that you have in mind that a caller would only care about the callbacks and not the other progress information? Or, were you suggesting that there should be a flavor of the Fetch method that can take each of the arguments individually as optional parameters.

That is, is:

public void Fetch(Remote remote,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

the full signature of the overload, or were you suggesting something more along the lines of:

public void Fetch(Remote remote,
              ref long bytes,
              ref GitIndexerStats stats,
              FetchProgressHandler onProgress = null,
              UpdateTipsHandler onUpdateTips = null,
              Action onCompleted = null);

Thank you again for you feedback.

dahlbyk · 2012-09-18T02:37:56Z

My first reaction was to just return the FetchProgress:

public FetchProgress Fetch(Remote remote,
                           FetchProgressHandler onProgress = null,
                           UpdateTipsHandler onUpdateTips = null,
                           Action onCompleted = null);

However, it seems reasonable to assume onProgress would want access to the progress details, so maybe this makes more sense?

public delegate void FetchProgressHandler(FetchProgress progress, string message);
public delegate void UpdateTipsHandler(string referenceName, ObjectId oldId, ObjectId newId);

public void Fetch(Remote remote,
                  FetchProgressHandler onProgress = null,
                  UpdateTipsHandler onUpdateTips = null,
                  Action onCompleted = null);

Or you could return the FetchProgress but also expose it to the callback.

uluhonolulu · 2012-09-18T07:24:13Z

+1 for public delegate void FetchProgressHandler(FetchProgress progress, string message); ref parameters should be avoided IMO -- make the code much more verbose than it has to be.

Returning FetchProgress seems reasonable if we want to have the Cancel method hanging off it.

var progress = repo.Fetch(..);
Button1.Click += () => progress.Cancel();

In that case, C# grammar requires we both return it and use it in the delegate, since we cannot write a lambda for it.

Anyway, let's try and write a couple of tests, and see which way is less painful.

nulltoken · 2012-09-18T14:52:14Z

LibGit2Sharp.Tests/TestHelpers/BaseFixture.cs

@@ -83,7 +83,10 @@ public void Dispose()
        {
            foreach (string directory in directories)
            {
-                DirectoryHelper.DeleteDirectory(directory);
+                if (Directory.Exists(directory))


Could you please give me some more context about this?
When would this check be required?

I had hit an issue running the CanCloneARepository(...) tests where the test would create the SelfCleaningDirectory object and register the temporary directory path for deletion, but the test would fail in Repository.Clone(...) before the temporary directory actually got created. On cleanup, the test would throw an exception indicating that it could not find the path of the temporary directory (as it does not exist). The end result is that the test reports that it failed to find the path of the temporary directory (which hides the original exception).

Thanks a lot for your answer. Indeed, that makes sense.

However, I'm afraid this check could hide that something may have went wrong earlier (a wrongly written test for instance). Could you think of a way to let the user know about this not expected state?

For now (especially as the clone is no longer part of the test suite) I will remove this change so we can concentrate on the core Fetch changes.

jamill · 2012-09-18T15:46:17Z

@dahlbyk, @uluhonolulu Regarding the Fetch method:

As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure. This is also why there are ref parameters. I could look at making method this asynchronous, but it did not appear that were any other asynchronous methods at the moment, which is one reason I did not go down this route. Has there been other discussion(s) on how to handle long running operations in LibGit2Sharp (push, pull, fetch, clone, etc)?
I could see the progress datastructure could be useful to the OnProgress callback. Currently the OnProgress matches the libgit2 progress callback. How do we feel about breaking this mapping here - I know the high level design goals indicate that LibGit2Sharp is not a 1:1 mapping of libgit2, so maybe this would not be a problem.

nulltoken · 2012-09-18T15:46:35Z

@ben regarding the Clone implementation, could you please get rid of the GitCheckoutOptions?

nulltoken · 2012-09-18T15:53:28Z

@jamill @ben I'm not in favor of exposing the IndexerStats as it is. Unless I'm wrong, consuming it would require a separate thread.

@carlosmn How complex would that be (or is it even feasible) to make the libgit2 indexing process invoke a progress callback?

nulltoken · 2012-09-18T16:06:12Z

@jamill

Regarding the API:

I'm not sure about Events..

I also agree with @dahlbyk and @uluhonolulu. Let's go down the delegates road :)

Is calling git_remote_disconnect mid-request not supported?

When callbacks are involved, usually returning something different from 0 is the "conventional" way to interrupt the process. Hopefully, the remote would be disconnected.

Can/should Fetch() support cancellation?

I'm not even sure this scenario has been tested at the libgit2 level. /cc @carlosmn

nulltoken · 2012-09-18T16:13:12Z

@jamill

As currently implemented, the Fetch method is synchronous, which prevents it from returning a useful progress datastructure.

FWIW libgit2/libgit2#890 is trying to tackle this issue (/cc @ben)

nulltoken · 2012-09-18T17:37:08Z

@jamill @ben Provided I do not encounter any regression while running the LibGit2Sharp test suite, I'm going to update vNext very soon with a newer version of libgit2 as soon as libgit2/libgit2#932 is merged.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

/cc @arrbee

jamill · 2012-09-18T18:58:09Z

@nulltoken @ben @dahlbyk

Thanks, so a couple of questions:

1. How to report progress

My understanding of the current state of things is that currently, the best way to get some of the progress information for git_indexer_stats and the bytes is by passing them in as ref parameters to the fetch call.

Including these parameters as part of the OnProgress callback does not look like it would work, as it appears that the git_indexer_stat datastructure and the OnProgress callbacks are updated at different points of the fetch process.
This has several downsides, in particular that it can be akward to work with.
This has the downside that this can be akward for callers (need to be queried on a different thread, method signature has ref parameters)

However, there is some work going on to make update progress reporting:

With A real struct for indicating progress libgit2#890, the progress callback should report more interperatable information.
If it is possible to for the indexing process to invoke a callback, then we could at least get indexing progress on the same thread. I will wait for @carlosmn for more on this

With these items, it seems we would not have to query the progress datastructures on a seperate thread for progress.

2. Whether there should be an asynchronous fetch method

From the API usage examples above, it seems that an asynchronous version of this method would be nice. In the clone / fetch examples for libgit2 the caller is responsible for running clone / fetch on a background thread. Should LibGit2Sharp follow this pattern? Alternatively, should LibGit2Sharp provide an asynchronous method that will go ahead and run the long running operation on the background thread and return immediately.

Even if there was an async version of Fetch, it seems that a synchronous version is needed. Given the above, I propose the following to move forward (in addition to addressing other feedback):

Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.
Leave an async version of fetch for future work.

Thoughts?

nulltoken · 2012-09-18T21:01:26Z

Leave the Progress structure (with GitIndexerStats and bytes) as a ref parameter until there is a better progress reporting mechanism.

👍 Even if I tend to avoid delivering API that will be broken by next version, Fetch has been delayed for too long.

Leave an async version of fetch for future work.

👍 as well.

The dedicated libgit2 build and LibGit2Sharp branch are no longer needed as libgit2/libgit2#901 is now merged.

Unfortunately, libgit2/libgit2#932 cannot be merged right now. However, in order to not keep you holding, vNext has been updated with the tip of libgit2 development branch (libgit2/libgit2@411cb01).

Please note that a Proxy class has been added since the benstraub/clone branch was created. It's now required to go through this layer when invoking NativeMethods.

dahlbyk · 2012-09-18T21:11:30Z

👍 from me as well.

Thanks for clearing up the difference between the values exposed through FetchProgress and the onProgress callback. A callback for stats seems like a better option there.

uluhonolulu · 2012-09-19T06:52:25Z

I'm all for release now, enhance later.

ben · 2012-09-19T12:19:49Z

I'm actually thinking that passing a structure that receives progress info is a Bad Idea. What if the GC moves the structure around on the managed heap during the fetch? The C code wouldn't have a clue, and Bad Things could happen.

Given this, we should probably double-down on delegates for reporting progress, which means I'll have to think about how to make that happen with clone. Unfortunately I'm jet-lagged today and speaking at a conference tomorrow morning, so I can't read through @jamill's code to figure out how it should be done. Hopefully I'll have a chance in the next couple of days.

jamill · 2012-09-19T13:24:24Z

@ben From a correctness point of view, I think passing structures that receive progress info is OK - The progress structures are pinned by the interop layer during the call to git_remote_download and should not be moved by GC (git_remote_download is synchronous and will not return until fetch is complete).

carlosmn · 2012-10-02T08:55:52Z

The code to auto-follow tags and create them has been merged to libgit2, you'll get that whenever you update the library.

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

nulltoken · 2012-10-02T09:25:21Z

@carlosmn ❤️

@jamill I'll update vNext with a more recent version of libgit2 in the following days.

jamill · 2012-10-03T01:42:27Z

@carlosmn , @nulltoken Thanks! I will move back on to the main development branch. I will get back to this shortly and update the change based on the feedback.

dahlbyk · 2012-10-03T02:25:32Z

As for the indexer calling a callback, there isn't really a unit of measurement where it makes more sense than another to call it. How often should we call it? Per received object? Per received x bytes?

I'd go with what's easiest. If they're equally easy, go with the more granular. If it's called more often than a consumer cares for, they can throttle it.

nulltoken · 2012-10-09T17:43:57Z

@jamill vNext has been updated with libgit2/libgit2@9adfa7d

jamill · 2012-10-09T21:57:41Z

@nulltoken @dahlbyk @carlosmn @uluhonolulu : I have created a new PR with the changes based off of vNext (I did not see a way to update the branch the PR is based off of - if I missed something, please let me know).

Again, thank you everyone for the feedback so far. I have addressed much of it in the new PR.

nulltoken · 2012-10-10T12:54:46Z

Closing this one, as all the love is now redirected to #221

jamill added 2 commits September 11, 2012 11:01

BaseFixture should not attempt to delete empty directories.

e88e1a4

Expose libgit2 Fetch functionality through LibGit2Sharp.

defd8a0

jamill mentioned this pull request Sep 12, 2012

Expose libgit2 fetch functionality. #206

Closed

ben reviewed Sep 12, 2012
View reviewed changes

dahlbyk reviewed Sep 16, 2012
View reviewed changes

nulltoken reviewed Sep 18, 2012
View reviewed changes

jamill mentioned this pull request Oct 9, 2012

Fetch #221

Closed

nulltoken closed this Oct 10, 2012

jamill mentioned this pull request Oct 12, 2012

Clone #223

Merged

Win http fetch #213

Win http fetch #213

Uh oh!

Conversation

jamill commented Sep 12, 2012

Missing functionality:

Implementation:

Issues:

Uh oh!

ben Sep 12, 2012

Choose a reason for hiding this comment

Uh oh!

nulltoken commented Sep 13, 2012

Uh oh!

ben commented Sep 13, 2012

Uh oh!

ben commented Sep 13, 2012

Uh oh!

nulltoken commented Sep 13, 2012

Uh oh!

jamill commented Sep 13, 2012

Uh oh!

nulltoken commented Sep 13, 2012

Uh oh!

dahlbyk commented Sep 14, 2012

Uh oh!

uluhonolulu commented Sep 14, 2012

Uh oh!

jamill commented Sep 15, 2012

Uh oh!

dahlbyk Sep 16, 2012

Choose a reason for hiding this comment

Uh oh!

jamill Oct 9, 2012

Choose a reason for hiding this comment

Uh oh!

dahlbyk commented Sep 16, 2012

Uh oh!

jamill commented Sep 17, 2012

Uh oh!

dahlbyk commented Sep 17, 2012

Uh oh!

jamill commented Sep 18, 2012

Uh oh!

dahlbyk commented Sep 18, 2012

Uh oh!

uluhonolulu commented Sep 18, 2012

Uh oh!

nulltoken Sep 18, 2012

Choose a reason for hiding this comment

Uh oh!

jamill Sep 18, 2012

Choose a reason for hiding this comment

Uh oh!

nulltoken Sep 18, 2012

Choose a reason for hiding this comment

Uh oh!

jamill Oct 9, 2012

Choose a reason for hiding this comment

Uh oh!

jamill commented Sep 18, 2012

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

jamill commented Sep 18, 2012

1. How to report progress

2. Whether there should be an asynchronous fetch method

Uh oh!

nulltoken commented Sep 18, 2012

Uh oh!

dahlbyk commented Sep 18, 2012

Uh oh!