Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with large numbers of untracked or modified files #3919

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

parthokunda
Copy link

@parthokunda parthokunda commented Sep 16, 2024

  • PR Description
    BuildTreeFromFiles used a linear complexity lookup to find if the children has already been added. Use maps to get constant time lookup for children.

For the test scenario in #3798 (90000 untracked files) this reduces the time of BuildTreeFromFiles from something like 10s down to about 30ms on my machine.

Fixes #3798.

  • Please check if the PR fulfills these requirements
  • Cheatsheets are up-to-date (run go generate ./...)
  • Code has been formatted (see here)
  • Tests have been added/updated (see here for the integration test guide)
  • Text is internationalised (see here)
  • If a new UserConfig entry was added, make sure it can be hot-reloaded (see here)
  • Docs have been updated if necessary
  • You've read through your own file changes for silly mistakes etc

@parthokunda
Copy link
Author

parthokunda commented Sep 16, 2024

BuildTreeFromFiles CPU Profiling result for a initial git repo with data directory having 40000 unstaged files

  • without PR: 620ms(1.6%)
  • with PR: 20ms(0.09%)

When I profile memory for BuildTreeFromFiles, with PR take 4.46MB (8.4%) whereas without PR one takes 6.50MB (10%). My assumption was that my fix would take more memory. Not sure why it is happening. Would love to know the reason.

@parthokunda parthokunda marked this pull request as ready for review September 16, 2024 20:23
@parthokunda
Copy link
Author

I have to update the tests. Made a mistake with git when testing.

@parthokunda parthokunda marked this pull request as draft September 16, 2024 20:58
@stefanhaller
Copy link
Collaborator

Thanks for this, it's a great improvement, and the change looks very straight-forward (more than I expected, to be honest).

However, I wonder if we should keep the ChildrenPathMap out of the Node struct. It is really only needed inside the BuildTreeFromFiles function, no need to hold on to it afterwards. This will also solve the test failures automatically.

I'm thinking of a diff like this, on top of yours:

diff --git a/pkg/gui/filetree/build_tree.go b/pkg/gui/filetree/build_tree.go
index 3d0eb713c..926c5ea04 100644
--- a/pkg/gui/filetree/build_tree.go
+++ b/pkg/gui/filetree/build_tree.go
@@ -10,6 +10,8 @@ import (
 func BuildTreeFromFiles(files []*models.File) *Node[models.File] {
 	root := &Node[models.File]{}
 
+	childrenMapsByNode := make(map[*Node[models.File]]map[string]*Node[models.File])
+
 	var curr *Node[models.File]
 	for _, file := range files {
 		splitPath := split(file.Name)
@@ -24,7 +26,13 @@ func BuildTreeFromFiles(files []*models.File) *Node[models.File] {
 
 			path := join(splitPath[:i+1])
 
-			child, isFound := curr.ChildrenPathMap[path]
+			var childrenMap map[string]*Node[models.File]
+			var ok bool
+			if childrenMap, ok = childrenMapsByNode[curr]; !ok {
+				childrenMap = make(map[string]*Node[models.File])
+				childrenMapsByNode[curr] = childrenMap
+			}
+			child, isFound := childrenMap[path]
 			if isFound {
 				curr = child
 				continue outer
@@ -37,10 +45,7 @@ func BuildTreeFromFiles(files []*models.File) *Node[models.File] {
 
 			curr.Children = append(curr.Children, newChild)
 
-			if curr.ChildrenPathMap == nil {
-				curr.ChildrenPathMap = make(map[string]*Node[models.File])
-			}
-			curr.ChildrenPathMap[path] = newChild
+			childrenMap[path] = newChild
 
 			curr = newChild
 		}
diff --git a/pkg/gui/filetree/node.go b/pkg/gui/filetree/node.go
index 9ec5b0ada..e38a1c5de 100644
--- a/pkg/gui/filetree/node.go
+++ b/pkg/gui/filetree/node.go
@@ -18,10 +18,6 @@ type Node[T any] struct {
 	// otherwise it's nil.
 	Children []*Node[T]
 
-	// every child node is accessibe using their path string. Giving O(1) lookup
-	// for their given their path
-	ChildrenPathMap map[string]*Node[T]
-
 	// path of the file/directory
 	Path string
 

I only whipped this up quickly without testing it much, I hope it even works 😄. I also didn't measure if the performance is as good as yours, but I would expect it to be close.

@parthokunda
Copy link
Author

parthokunda commented Sep 17, 2024

You are right, we should surely separate out this map; otherwise it would create confusion. Also, I would not need to rewrite all the unit tests. But in future we should try to make the children field a map itself, maybe? We could use it instead of sorting the list everytime.

Can you make a commit to my branch? Or should I include your edits myself.

@stefanhaller
Copy link
Collaborator

But in future we should try to make the children field a map itself, maybe? We could use it instead of sorting the list everytime.

I don't see how this would help with sorting, go's maps are unsorted.

Can you make a commit to my branch? Or should I include your edits myself.

I pushed a fixup to your branch, please squash it in and reword the commit message. And test it a bit more than I did! Thanks. 😄

@stefanhaller stefanhaller added the enhancement New feature or request label Sep 17, 2024
@stefanhaller
Copy link
Collaborator

Oh, and there's probably some more potential for cleaning up the code, I really didn't pay much attention when I wrote it. For example, we now have two bool variables ok and isFound. Also, I wonder if we can avoid doing the childrenMapsByNode[curr] lookup every time through the loop; we really only need to do it again when the value of curr changes. Not sure how much this complicates the code.

@parthokunda parthokunda force-pushed the dev/addChildrenMap branch 2 times, most recently from 1e7e0a1 to c53fdd7 Compare September 17, 2024 21:12
@parthokunda
Copy link
Author

Hey, I have updated the PR. I changed the variable names to make them a bit more easy to understand.

Also, I wonder if we can avoid doing the childrenMapsByNode[curr] lookup every time through the loop; we really only need to do it again when the value of curr changes. Not sure how much this complicates the code.

I think we should not overcomplicate this code. The lookup should be fast enough anyway.

@parthokunda parthokunda marked this pull request as ready for review September 18, 2024 03:06
@stefanhaller stefanhaller added bug Something isn't working and removed enhancement New feature or request labels Sep 18, 2024
@stefanhaller stefanhaller changed the title Create ChildrenPathMap in Node for quick lookup in BuildTreeFromFiles Improve performance with large numbers of untracked or modified files Sep 18, 2024
@stefanhaller
Copy link
Collaborator

Awesome, thanks. Great improvement!

@stefanhaller stefanhaller merged commit 4dadcd2 into jesseduffield:master Sep 18, 2024
16 checks passed
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Sep 23, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [jesseduffield/lazygit](https://github.com/jesseduffield/lazygit) | minor | `v0.43.1` -> `v0.44.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>jesseduffield/lazygit (jesseduffield/lazygit)</summary>

### [`v0.44.1`](https://github.com/jesseduffield/lazygit/releases/tag/v0.44.1)

[Compare Source](jesseduffield/lazygit@v0.44.0...v0.44.1)

#### What's Changed

##### Enhancements 🔥

-   With stacked branches, create fixup commit at the end of the branch it belongs to by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3892
-   Add options for disabling switching to the Files panel after popping or applying a stash by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3913

##### Fixes 🔧

-   Fix crash when viewing the divergence of a branch which is up to date with its upstream by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3918

##### Performance improvements 📊

-   Improve performance with large numbers of untracked or modified files by [@&#8203;parthokunda](https://github.com/parthokunda) in jesseduffield/lazygit#3919

##### I18n 🌎

-   Update language files from Crowdin by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3898

#### New Contributors

-   [@&#8203;parthokunda](https://github.com/parthokunda) made their first contribution in jesseduffield/lazygit#3919

**Full Changelog**: jesseduffield/lazygit@v0.44.0...v0.44.1

### [`v0.44.0`](https://github.com/jesseduffield/lazygit/releases/tag/v0.44.0)

[Compare Source](jesseduffield/lazygit@v0.43.1...v0.44.0)

#### What's Changed

Lots of great changes in this release. Thanks to everybody who contributed!

##### Enhancements 🔥

-   Per-repo config files (and reloading of edited config files) by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3787
    -   In addition to the global config file you can now create repo-specific config files in `<repo>/.git/lazygit.yml`. Settings in these files override settings in the global config file. In addition, files called `.lazygit.yml` in any of the parent directories of a repo will also be loaded; this can be useful if you have settings that you want to apply to a group of repositories.
    -   We now also automatically apply (most) config changes without the need to restart lazygit
-   Easily view diff across range of commits by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3869
    -   If you select a range of commits, we now show the diff across the range (inclusive). This makes it easy to see the total changes across a number of commits. Likewise, if you press enter when a range of commits are selected, we will show the changed files for the range.

https://github.com/user-attachments/assets/6646c78b-5770-41c1-93b9-5442d32404de

-   Support hyperlinks from pagers by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3825
    -   If you're using delta as a pager (which I highly recommend trying), you can now click on line numbers to go to that line in your editor by using the following config:
    ```yaml
    git:
      paging:
        colorArg: always
        pager: delta --paging=never --line-numbers --hyperlinks --hyperlinks-file-link-format="lazygit-edit://{path}:{line}"
    ```

https://github.com/user-attachments/assets/75fef6c4-d437-4595-ab00-b8990215cfed

-   Switch to Files panel after popping/applying a stash by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3888
    -   This is a nice quality of life improvement. You generally want to go straight to the files panel after you pop or apply from the stash
-   Ask to auto-stage unstaged files when continuing a rebase after resolving conflicts by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3879
    -   Another quality of life improvement: often you resolve some conflicts, then make another couple changes, then in lazygit you say to continue and you get an error saying there are unstaged changes. Now instead of showing an error, lazygit asks if you want to stage those changes and continue.
-   Offer autostash option when creating new branch by [@&#8203;brandondong](https://github.com/brandondong) in jesseduffield/lazygit#3871
    -   Another great quality of life improvement
-   Allow using shell aliases in interactive custom commands by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3793
-   Switch tabs with panel jump keys by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3794
    -   If you've already been using the number keys (1-5) for jumping to specific side panels, you'll be pleased to know that you can now also use those keys for jumping to tabs within a side panel. E.g. to go to the reflog panel, you can now press 4 to jump to the commits panel, then 4 again to go to the reflog tab.
-   Rename "Custom Command" to "Shell Command" by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3800
    -   There are two ways of invoking a 'custom' command in Lazygit: first by pre-defining a command in your config, and second by pressing ':' and typing in the command directly. We referred to both of these as 'custom commands' which was confusing. We now refer to the second approach as invoking a 'shell command'.
-   Improve template placeholders for custom commands by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3809
    -   Now you can use `SelectedCommit` to refer to the selected commit regardless of which commits panel you're in (local commits, reflog, etc)
    -   Likewise, you can use `SelectedPath` whether you're in the files panel or the commit-files panel.
-   feat(custom command): support multiple contexts within one command by [@&#8203;yam-liu](https://github.com/yam-liu) in jesseduffield/lazygit#3784
    -   You can now use a comma-separated list of contexts for which a custom command can be invoked. For example:
    ```yaml
    customCommands:
      - description: 'Add empty commit'
        key: 'E'
        context: 'commits,files'
    ```
-   Improve mouse support for commit message panel by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3836
-   Make auto-staging resolved conflicts optional by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3870
    -   If you set `git.autoStageResolvedConflicts` to false in your config, lazygit will no longer auto-stage files in which you've resolved merge conflicts.
-   Allow using `<`/`>` and `,`/`.` in sticky range select mode in patch explorer by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3837
-   Add Zed editor support to editorPreset by [@&#8203;susl](https://github.com/susl) in jesseduffield/lazygit#3886

##### Fixes 🔧

-   Allow GPG reword for last commit by [@&#8203;Neko-Box-Coder](https://github.com/Neko-Box-Coder) in jesseduffield/lazygit#3815
-   Don't exit app when GetRepoPaths call fails during startup by [@&#8203;ppoum](https://github.com/ppoum) in jesseduffield/lazygit#3779
-   Fix lack of icon when extension isn't lowercase by [@&#8203;hasecilu](https://github.com/hasecilu) in jesseduffield/lazygit#3810
-   Fix redraw bug (stale content) in commits view by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3783
-   Fix line coloring when using the delta pager by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3820
-   Fix pressing escape after clicking in diff view by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3828
-   Fix fast-forward issue caused by a conflicting tag name [@&#8203;Neko-Box-Coder](https://github.com/Neko-Box-Coder) in jesseduffield/lazygit#3807
-   Fix range select -> stage failure when deleted file is already staged by [@&#8203;brandondong](https://github.com/brandondong) in jesseduffield/lazygit#3631
-   Scroll views up if needed to show all their content by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3839
-   Fix crash when filtering commits by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3838
-   Fix cancelled autostash resulting in stuck inline status by [@&#8203;brandondong](https://github.com/brandondong) in jesseduffield/lazygit#3860
-   Don't allow opening a menu while the search or filter prompt is open by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3878

##### Maintenance ⚙️

-   Reapply "Check for fixup commits on CI" by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3745
-   Some cleanups for APIs related to contexts by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3808
-   Improve fixup commits script by [@&#8203;jesseduffield](https://github.com/jesseduffield) in jesseduffield/lazygit#3853
-   Fix linter warnings by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3854
-   Add codespell support (config, workflow to detect/not fix) and make it fix few typos by [@&#8203;yarikoptic](https://github.com/yarikoptic) in jesseduffield/lazygit#3751
-   Get rid of a lot of error return values by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3890
-   Add a readme file for the JSON files in pkg/i18n/translations by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#3781

#### New Contributors

-   [@&#8203;yam-liu](https://github.com/yam-liu) made their first contribution in jesseduffield/lazygit#3784
-   [@&#8203;ppoum](https://github.com/ppoum) made their first contribution in jesseduffield/lazygit#3779
-   [@&#8203;Neko-Box-Coder](https://github.com/Neko-Box-Coder) made their first contribution in jesseduffield/lazygit#3815
-   [@&#8203;yarikoptic](https://github.com/yarikoptic) made their first contribution in jesseduffield/lazygit#3751
-   [@&#8203;susl](https://github.com/susl) made their first contribution in jesseduffield/lazygit#3886

**Full Changelog**: jesseduffield/lazygit@v0.43.1...v0.44.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Files panel is empty when (many) new/changed files exist
2 participants