Skip to content

Faster graph rendering for repositories with huge amount of branches. #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2015
Merged

Conversation

mandrigin
Copy link
Contributor

We have a huge repo at work (~22000 branches at the moment) and GitUp is really slow on it.

While profiling it turns to be that GitUp makes a lot of comparisons while building an object graph.
instruments2 2015-08-19 14-05-05

I have made a special optimised class which is much faster (also it checks SHA strings, not objects). According to Instruments, it is not the heaviest stack trace anymore (and the app seems to be far more usable on our repo right now).

@swisspol
Copy link
Contributor

GitUp is not designed for very large repos - see http://forums.gitup.co/t/gitup-limitations-known-issues/29.

This change will make things a bit faster, but the reason it's an NSMutableArray right now and not hashed-based container is because ordering of tips must be preserved.

@mandrigin
Copy link
Contributor Author

I'm still profiling the code, but this change reduced the startup time on my computer from ~1 minute to ~10 seconds which is already a good thing, IMO. I would love to be able to use it at work, it is a really cool client :)

Ok, so the hash table approach per se doesn't work here.
What if we will use a hash table or set only to check if commit is already there?
E.g.:

if (![tipsDict objectForKey:commit.SHA1]) {
    [tipsArray addObject:commit];
    tipsDict[commit.SHA1] = @(YES);
}

Can there be any case when the same commit will be in this array twice?

@mandrigin
Copy link
Contributor Author

Of course I will encapsulate the code in a separate class that will both preserve commit order and have a faster way to check if commit has already been added.

@swisspol
Copy link
Contributor

I unfortunately don't have bandwidth to help understand the code. It's been written to be pretty clear regarding how it works and you can always trace what is happening using the debugger 😄

@mandrigin
Copy link
Contributor Author

Sure, no problem :)

@bryankeller
Copy link

Perhaps a separate data structure could be used to keep track of ordering, while still getting the speed benefits from using the NSMutableDictionary?

@mandrigin
Copy link
Contributor Author

Yes, I'm currently working on it.

@mandrigin
Copy link
Contributor Author

Ok, tests are passing now, but I have no idea why they were failing in commit ed9b145. I did nothing except forcing Travis to re-test branch. So I think it is pretty much ready IMO. I'll just rename the last commit.

@swisspol
Copy link
Contributor

FYI I updated the contribution requirements at https://github.com/git-up/GitUp#contributing.

@mandrigin
Copy link
Contributor Author

👍 Will fix my pull request then.

@mandrigin
Copy link
Contributor Author

Done

@swisspol
Copy link
Contributor

So what's the performance like with this change? GitUp will log exactly how long graph generation and graph rendering take.

Note that like I was saying GitUp is not designed to handle very large repos, so a number of things are going to be inefficient, not just this. Improving the speed from say 1mn to 10s is interesting but doesn't solve the actual problem: for it to be usable it needs to be below 1s.

@swisspol
Copy link
Contributor

Also there are still a number of areas that don't match the GitUp style e.g. foo *bar instead of foo* bar, missing braces for if() statements, ...

@mandrigin
Copy link
Contributor Author

Test set: the repository of ~237000 commits. I run it on my MacBook Pro (2015).

The performance before this change:
00:01:26.214 [VERBOSE ]> Graph regenerated for "<redacted>" in 76.468 seconds

The performance after this change:
00:00:23.048 [VERBOSE ]> Graph regenerated for "<redacted>" in 3.623 seconds

It is still not that comfortable, but looks like a good step forward.

Yes, there are a few places that can be improved, but I wanted to take an iterative approach and try to optimise one thing at a time, until it gets usable on the repo of our size.
Maybe it is very ambitious, but at least I have a very good test repo :)

If you want me to measure all the bottlenecks first and make a bigger PR, I'm ok with this :)

I will fix codestyle issues and also I noticed one warning I have introduced, will fix that as well.

@mandrigin
Copy link
Contributor Author

Done. I tried to imitate the code style you have in an app, but please, tell me if anything left...

@@ -15,6 +15,8 @@

#import <Foundation/Foundation.h>

@class GCRepository;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a separate commit.

}

- (void)addCommit:(GCCommit *)commit {
if (![self hasCommit:commit]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call -hasCommit, access _actualCommitHashes directly.

@swisspol
Copy link
Contributor

I had a first pass look and pointed out a few things. You need to keep crafting this whole thing until it is as concise and elegant as possible - keep iterating! 😄

For instance, I would use CFMutableSets with no retain / release callbacks (since the commits are already in the array anyway and therefore retained there) and low-overhead hash/equal callbacks accessing directly the underlying git_object. Retain/release becomes expensive when dealing with lots of objects. Same for using ARC in this file. If it's that performance sensitive, then MRC will be much better (critical code paths in GitUp are all MRC).

In any case what makes you think the number of commits in your repo is the problem here? This code only cares about tips i.e. references. What's the output of git show-ref --head | wc -l in your repo?

@mandrigin
Copy link
Contributor Author

In any case what makes you think the number of commits in your repo is the problem here?

Yes, we have a lot of tips in our repo as well. Anyway it looks like a good test set for this product :)

What's the output of git show-ref --head | wc -l in your repo?

~22000

@mandrigin
Copy link
Contributor Author

I've fixed most of the things you pointed.
This thread might be also a good reference to the code style you have...

For instance, I would use CFMutableSets with no retain / release callbacks (since the commits are already in the array anyway and therefore retained there) and low-overhead hash/equal callbacks accessing directly the underlying git_object. Retain/release becomes expensive when dealing with lots of objects. Same for using ARC in this file. If it's that performance sensitive, then MRC will be much better (critical code paths in GitUp are all MRC).

Currently I see only a small overhead from this class in profiler.
These changes are really reasonable but is it worth to implement them right now?

#endif

#import "GCOrderedSet.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this by #import GCPrivate.h like other implementation files.

@swisspol
Copy link
Contributor

These changes are really reasonable but is it worth to implement them right now?

If this is a critical code path, which it is, then I would say yes, might as well go 100%, instead of stopping at 90% 😄

In any case this PR is certainly a step in the right direction, but still need some polish before it can land. Keep reviewing that diff over and over again until you're 100% confident this is it.

Also note that the problem here is not the number of commits but the number of branches in your repo (22,000), which is plain huge. In my experience however, repos that have such an abnormally high number of branches are also going to have hundreds of thousands of commits if not millions, so other parts of GitUp become really slow.

BTW the commit title should be more precise since this change optimizes only drawing graphs with lots of refs, not huge repos in general.

@mandrigin mandrigin changed the title Faster graph rendering for huge repos. Faster graph rendering for repositories with huge amount of branches. Aug 28, 2015
@mandrigin
Copy link
Contributor Author

I did some measurements, and found out that I can get a somewhat visible performance improvements only by using CFMutableSet class. Neither moving to MRC, nor accessing object.private didn't help much. (Maybe I did something wrong there though, I can't say I'm an expert in MRC).

In my experience however, repos that have such an abnormally high number of branches are also going to have hundreds of thousands of commits if not millions, so other parts of GitUp become really slow.

I did some more profiling and the next important bottlenecks are:

  • History scanning is not very slow (~5 seconds). I didn't look at it precisely yet, though I think there are some places where GCOrderedSet might help as well.
  • Checking changes in the work directory. This case is more tricky, and it is not that bad (3~4 seconds), but still quite annoying. I don't have a good solution yet, but I'll try to understand what can be done here as well.

@swisspol
Copy link
Contributor

swisspol commented Sep 2, 2015

The problem with ARC when you have code dealing with lots of objects in tight loops or similar is that since it has to be extremely conservative to avoid crashes, it retains / releases everything. So you end up with hundreds of thousands or retain/release calls which take locks and slow down everything.

A repo can be huge along 3 axis (and any combination of course):

  1. Number of commits (> 100,000)
  2. Number of refs (> 1,000)
  3. Number of files in the trees (> 10,000)

Each of them will affect GitUp differently, requires appropriate performance analysis, and dedicated fixes. My understanding is that you're hitting (2).

The important thing is that GitUp has been deliberately architectured to handle very efficiently repos below these limits. In the real world, the vast vast majority of repos are an order of magnitude smaller than these limits.

The direct consequences is that properly addressing these performance issues (and going from say 10s to 2s is still not enough as a reactive app needs to be <1s on representative hardware) is most likely going to require rewriting entire parts of GitUpKit, which may be a lot of work and is certainly very high-risk in terms of potential regressions.

It's interesting work no matter what, so by all means, feel free to continue, but I thought that it's important you understand what you're getting into 😄

@mandrigin
Copy link
Contributor Author

I do agree, making app fast & responsive on huge amount of data is hard.

There are 3 types of performance fixes that can be introduced for an app:

  1. Real performance improvement w/o re-designing the app (by introducing faster data types, by getting rid of unnecessary re-checks, etc).
  2. Real performance improvement by redesigning the app architecture.
  3. Perceived performance improvements (progress-bars, animations, making the UI responsive even though the app is busy, etc). There is a book written on this topic: Designing and Engineering Time

I will try to focus on (1) first.

A repo can be huge along 3 axis (and any combination of course):

I guess our repo hit all 3 of this... Just refs had the worst performance penalty.

@mandrigin
Copy link
Contributor Author

Ok, I did change the commit message, rebased it on the latest version of master. Is there anything left to be done?

@swisspol
Copy link
Contributor

swisspol commented Sep 5, 2015

It's looking pretty good now. Just some minor things left.

Did you make sure to add the new .m file to the iOS GitUpKit target as well?

Please agree to GitUp CLA at http://forums.gitup.co/t/gitup-contributor-license-agreement/323.

@mandrigin
Copy link
Contributor Author

Did you make sure to add the new .m file to the iOS GitUpKit target as well?

Yes, I did (OSX+iOS Targets) ;)
screenshot

Please agree to GitUp CLA at http://forums.gitup.co/t/gitup-contributor-license-agreement/323

Done

NSMutableArray usage in [GIGraph _generateGraph] caused a bottleneck due to a lot of calls
to the [NSMutableArray containsObject:] and [NSMutableArray removeObject:] methods.
These methods are really slow on a huge amount of tips (~24000 in my case).

To solve this issue CGOrderedSet class was created:
* optimized for speed of adding/removal objects;
* preserving the objects order.

NSMutableArray was replaced to GCOrderedSet which solved this particular bottleneck.
@swisspol
Copy link
Contributor

swisspol commented Sep 5, 2015

Your screenshot is for the OSX target though :)

@mandrigin
Copy link
Contributor Author

Yes, but I just double-checked an iOS version as well :)

@swisspol
Copy link
Contributor

swisspol commented Sep 5, 2015

All right, great work, thanks!

swisspol added a commit that referenced this pull request Sep 5, 2015
Faster graph rendering for repositories with lots of branches
@swisspol swisspol merged commit 4cfc9f9 into git-up:master Sep 5, 2015
@mandrigin
Copy link
Contributor Author

Awesome!

@swisspol
Copy link
Contributor

swisspol commented Sep 5, 2015

The header was missing but I fixed it and added a test in Travis CI to catch this going forward: 9d06064.

@mandrigin
Copy link
Contributor Author

oops...

gr00nd added a commit to Gr00nd-INC/GitUp that referenced this pull request Sep 21, 2023
(https://travis-ci.org/git-up/GitUp.svg?branch=master)](https://travis-ci.org/git-up/GitUp)  GitUp =====  **Work quickly, safely, and without headaches. The Git interface you've been missing all your life has finally arrived.**  <p align="center"> <img src="https://i.imgur.com/JuQIxJu.png" width="50%" height="50%"><img src="https://i.imgur.com/9rgXktz.png" width="50%" height="50%"> </p>  Git recently celebrated its 10 years anniversary, but most engineers are still confused by its intricacy (3 of the [top 5 questions of all time](http://stackoverflow.com/questions?sort=votes) on Stack Overflow are Git related). Since Git turns even simple actions into mystifying commands (“git add” to stage versus “git reset HEAD” to unstage anyone?), it’s no surprise users waste time, get frustrated, distract the rest of their team for help, or worse, screw up their repo!  GitUp is a bet to invent a new Git interaction model that lets engineers of all levels work quickly, safely, and without headaches. It's unlike any other Git client out there from the way it’s built (it interacts directly with the Git database on disk), to the way it works (you manipulate the repository graph instead of manipulating commits).  With GitUp, you get a truly efficient Git client for Mac: - A **live and interactive repo graph** (edit, reorder, fixup, merge commits…), - **Unlimited undo / redo** of almost all operations (even rebases and merges), - Time Machine like **snapshots for 1-click rollbacks** to previous repo states, - Features that don’t even exist natively in Git like a **visual commit splitter** or a **unified reflog browser**, - **Instant search across the entire repo** including diff contents,  - A **ridiculously fast UI**, often faster than the command line.  *GitUp was created by [@swisspol](https://github.com/swisspol) in late 2014 as a bet to reinvent the way developers interact with Git. After several months of work, it was made available in pre-release early 2015 and reached the [top of Hacker News](https://news.ycombinator.com/item?id=9653978) along with being [featured by Product Hunt](http://www.producthunt.com/tech/gitup-1) and [Daring Fireball](http://daringfireball.net/linked/2015/06/04/gitup). 30,000 lines of code later, GitUp reached 1.0 mid-August 2015 and was released open source as a gift to the developer community.*  Getting Started ===============  - Official website: https://gitup.co  ## Download:  - Latest release on GitHub: https://github.com/git-up/GitUp/releases - Homebrew (Not maintained by GitUp developers): `brew install homebrew/cask/gitup` (Note: There is already a formula called gitup, so the full name must be specified!)  **Read the [docs](https://github.com/git-up/GitUp/wiki) and use [GitHub Issues](https://github.com/git-up/GitUp/issues) for support & feedback.**  Releases notes are available at https://github.com/git-up/GitUp/releases. Builds tagged with a `v` (e.g. `v1.2.3`) are released on the "Stable" channel, while builds tagged with a `b` (e.g. `b1234`) are only released on the "Continuous" channel. You can change the update channel used by GitUp in the app preferences.  ## Build  To build GitUp yourself, simply run the command `git clone --recursive https://github.com/git-up/GitUp.git` in Terminal, then open the `GitUp/GitUp.xcodeproj` Xcode project and hit Run.  **IMPORTANT:** If you do not have an Apple ID with a developer account for code signing Mac apps, the build  will fail with a code signing error. Simply delete the "Code Signing Identity" build setting of the "Application" target to work around the issue:  <p align="center"> <img src="http://i.imgur.com/dWpJExk.png"> </p>  **Alternatively**, if you do have a developer account, you can create the file "Xcode-Configurations/DEVELOPMENT_TEAM.xcconfig" with the following build setting as its content: > DEVELOPMENT_TEAM = [Your TeamID]  For a more detailed description of this, you can have a look at the comments at the end of the file "Xcode-Configurations/Base.xcconfig".   GitUpKit ========  **GitUp is built as a thin layer on top of a reusable generic Git toolkit called "GitUpKit". This means that you can use that same GitUpKit framework to build your very own Git UI!**  *GitUpKit has a very different goal than [ObjectiveGit](https://github.com/libgit2/objective-git). Instead of offering extensive raw bindings to [libgit2](https://github.com/libgit2/libgit2), GitUpKit only uses a minimal subset of libgit2 and reimplements everything else on top of it (it has its own "rebase engine" for instance). This allows it to expose a very tight and consistent API, that completely follows Obj-C conventions and hides away the libgit2 complexity and sometimes inconsistencies. GitUpKit adds on top of that a number of exclusive and powerful features, from undo/redo and Time Machine like snapshots, to entire drop-in UI components.*  Architecture ------------  The GitUpKit source code is organized as 2 independent layers communicating only through the use of public APIs:  **Base Layer (depends on Foundation only and is compatible with OS X and iOS)** - `Core/`: wrapper around the required minimal functionality of [libgit2](https://github.com/libgit2/libgit2), on top of which is then implemented all the Git functionality required by GitUp (note that GitUp uses a [slightly customized fork](https://github.com/git-up/libgit2/tree/gitup) of libgit2) - `Extensions/`: categories on the `Core` classes to add convenience features implemented only using the public APIs  **UI Layer (depends on AppKit and is compatible with OS X only)** - `Interface/`: low-level view classes e.g. `GIGraphView` to render the GitUp Map view - `Utilities/`: interface utility classes e.g. the base view controller class `GIViewController` - `Components/`: reusable single-view view controllers e.g. `GIDiffContentsViewController` to render a diff - `Views/`: high-level reusable multi-views view controllers e.g. `GIAdvancedCommitViewController` to implement the entire GitUp Advanced Commit view  **IMPORTANT**: If the preprocessor constant `DEBUG` is defined to a non-zero value when building GitUpKit (this is the default when building in "Debug" configuration), a number of extra consistency checks are enabled at run time as well as extra logging. Be aware that this overhead can significantly affect performance.  GitUpKit API ------------  Using the GitUpKit API should be pretty straightforward since it is organized by functionality (e.g. repository, branches, commits, interface components, etc...) and a best effort has been made to name functions clearly.  Regarding the "Core" APIs, the best way to learn them is to peruse the associated unit tests - for instance see [the branch tests](GitUpKit/Core/GCBranch-Tests.m) for the branch API.  Here is some sample code to get you started (error handling is left as an exercise to the reader):  **Opening and browsing a repository:** ```objc // Open repo GCRepository* repo = [[GCRepository alloc] initWithExistingLocalRepository:<PATH> error:NULL];  // Make sure repo is clean assert([repo checkClean:kGCCleanCheckOption_IgnoreUntrackedFiles error:NULL]);  // List all branches NSArray* branches = [repo listAllBranches:NULL]; NSLog(@"%@", branches);  // Lookup HEAD GCLocalBranch* headBranch;  // This would be nil if the HEAD is detached GCCommit* headCommit; [repo lookupHEADCurrentCommit:&headCommit branch:&headBranch error:NULL]; NSLog(@"%@ = %@", headBranch, headCommit);  // Load the *entire* repo history in memory for fast access, including all commits, branches and tags GCHistory* history = [repo loadHistoryUsingSorting:kGCHistorySorting_ReverseChronological error:NULL]; assert(history); NSLog(@"%lu commits total", history.allCommits.count); NSLog(@"%@\n%@", history.rootCommits, history.leafCommits); ```  **Modifying a repository:** ```objc // Take a snapshot of the repo GCSnapshot* snapshot = [repo takeSnapshot:NULL];  // Create a new branch and check it out GCLocalBranch* newBranch = [repo createLocalBranchFromCommit:headCommit withName:@"temp" force:NO error:NULL]; NSLog(@"%@", newBranch); assert([repo checkoutLocalBranch:newBranch options:0 error:NULL]);  // Add a file to the index [[NSData data] writeToFile:[repo.workingDirectoryPath stringByAppendingPathComponent:@"empty.data"] atomically:YES]; assert([repo addFileToIndex:@"empty.data" error:NULL]);  // Check index status GCDiff* diff = [repo diffRepositoryIndexWithHEAD:nil options:0 maxInterHunkLines:0 maxContextLines:0 error:NULL]; assert(diff.deltas.count == 1); NSLog(@"%@", diff);  // Create a commit GCCommit* newCommit = [repo createCommitFromHEADWithMessage:@"Added file" error:NULL]; assert(newCommit); NSLog(@"%@", newCommit);  // Restore repo to saved snapshot before topic branch and commit were created BOOL success = [repo restoreSnapshot:snapshot withOptions:kGCSnapshotOption_IncludeAll reflogMessage:@"Rolled back" didUpdateReferences:NULL error:NULL]; assert(success);    // Make sure topic branch is gone assert([repo findLocalBranchWithName:@"temp" error:NULL] == nil);    // Update workdir and index to match HEAD assert([repo resetToHEAD:kGCResetMode_Hard error:NULL]); ```  Complete Example #1: GitDown ----------------------------  [GitDown](Examples/GitDown) is a very basic app that prompts the user for a repo and displays an interactive and live-updating list of its stashes (all with ~20 lines of code in `-[AppDelegate applicationDidFinishLaunching:]`):  <p align="center"> <img src="http://i.imgur.com/ZfxM7su.png"> </p>  Through GitUpKit, this basic app also gets for free unlimited undo/redo, unified and side-by-side diffs, text selection and copy, keyboard shortcuts, etc...  This source code also demonstrates how to use some other GitUpKit view controllers as well as building a customized one.  Complete Example git-up#2: GitDiff ----------------------------  [GitDiff](Examples/GitDiff) demonstrates how to create a view controller that displays a live updating diff between `HEAD` and the workdir à la `git diff HEAD`:  <p align="center"> <img src="http://i.imgur.com/29hxDcJ.png"> </p>  Complete Example git-up#3: GitY -------------------------  [GitY](Examples/GitY) is a [GitX](http://gitx.frim.nl/) clone built using GitUpKit and less than 200 lines of code:  <p align="center"> <img src="http://i.imgur.com/6cuPcT4.png"> </p>  Complete Example git-up#4: iGit -------------------------  [iGit](Examples/iGit) is a test iOS app that simply uses GitUpKit to clone a GitHub repo and perform a commit.  Contributing ============  See [CONTRIBUTING.md](CONTRIBUTING.md).  Credits =======  - [@swisspol](https://github.com/swisspol): concept and code - [@wwayneee](https://github.com/wwayneee): UI design - [@jayeb](https://github.com/jayeb): website  *Also a big thanks to the fine [libgit2](https://libgit2.github.com/) contributors without whom GitUp would have never existed!*  License =======  GitUp is copyright 2015-2018 Pierre-Olivier Latour and available under [GPL v3 license](http://www.gnu.org/licenses/gpl-3.0.txt). See the [LICENSE](LICENSE) file in the project for more information.  **IMPORTANT:** GitUp includes some other open-source projects and such projects remain under their own license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants