Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor migrations to only marshall yaml twice #4318

Merged
merged 2 commits into from
Feb 28, 2025

Conversation

ChrisMcD1
Copy link
Contributor

  • PR Description

In the solving of #4194 we had some discussion about the repeated marshalling and unmarshalling of the content in our migrations. #4194 (comment).

I was feeling the itch to see if there was a meaningful performance impact to that, and this is the results! I added a benchmark first that uses the entirety of Config.md. This ensures a worst-case scenario for the repeated parsing, and probably at least 1 user has done this for their config. Here were the benchmark results prior and after this PR:

Before

$ go test ./pkg/config/ -bench=. -benchmem -count=10
goos: linux
goarch: amd64
pkg: github.com/jesseduffield/lazygit/pkg/config
cpu: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
BenchmarkMigrationOnLargeConfiguration-8              85          18165916 ns/op         1501442 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              50          21024442 ns/op         1501473 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              75          18814769 ns/op         1501509 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              60          17886812 ns/op         1501434 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              97          17767498 ns/op         1501448 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              62          18749923 ns/op         1501478 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              62          19248265 ns/op         1501467 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              92          14771199 ns/op         1501423 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              67          16345152 ns/op         1501440 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8              98          16785737 ns/op         1501472 B/op      19316 allocs/op
PASS
ok      github.com/jesseduffield/lazygit/pkg/config     14.474s

so somewhere in the range of 14 to 21 ms added to startup time.

After

$ go test ./pkg/config/ -bench=. -benchmem -count=10
goos: linux
goarch: amd64
pkg: github.com/jesseduffield/lazygit/pkg/config
cpu: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
BenchmarkMigrationOnLargeConfiguration-8             440           2936145 ns/op          265800 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             428           3099183 ns/op          265787 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             355           3074069 ns/op          265785 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             478           3031144 ns/op          265792 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             375           2795470 ns/op          265785 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             406           2688491 ns/op          265791 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             375           3092990 ns/op          265791 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             379           2789022 ns/op          265785 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             417           3108530 ns/op          265785 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             487           2848502 ns/op          265788 B/op       3928 allocs/op
PASS
ok      github.com/jesseduffield/lazygit/pkg/config     15.352s

So somewhere between 2.6 and 3.1 ms.

Is that a meaningful speedup? I'm not sure, but it's something! I had to add the bit of complexity mentioned in the comment to track whether we had made any changes, but that doesn't feel fundamentally much more complex to me, since we were already implicitly tracking that fact with whether the string file changed.

  • Please check if the PR fulfills these requirements
  • Cheatsheets are up-to-date (run go generate ./...)
  • Code has been formatted (see here)
  • Tests have been added/updated (see here for the integration test guide)
  • Text is internationalised (see here)
  • If a new UserConfig entry was added, make sure it can be hot-reloaded (see here)
  • Docs have been updated if necessary
  • You've read through your own file changes for silly mistakes etc

@ChrisMcD1
Copy link
Contributor Author

Created as draft because I have some inline docs changes I want to make, but this is the meat and potatoes.

@ChrisMcD1 ChrisMcD1 force-pushed the refactor-migrations branch 3 times, most recently from 26434f9 to 61756b7 Compare February 24, 2025 04:51
@ChrisMcD1
Copy link
Contributor Author

Okay, cleaned those things up, ready for review!

@ChrisMcD1 ChrisMcD1 marked this pull request as ready for review February 24, 2025 04:51
@jesseduffield
Copy link
Owner

21ms on each startup is indeed something, albeit a small improvement. But the complexity cost is high and there's a lot more chance of errors now that various places need to report on whether a change occurred, and any one of those might report incorrectly.

What's it like if we just do a deep comparison at the end before marshalling back? I suspect that would give us roughly the same speed up, without the extra complexity.

@ChrisMcD1 ChrisMcD1 force-pushed the refactor-migrations branch 2 times, most recently from 10ae48f to 8e93a68 Compare February 25, 2025 22:57
@ChrisMcD1
Copy link
Contributor Author

Pushed a new version that does the unmarshalling twice. That's the only way I know to do a deep copy! Alternatively happily welcome.

My computer was seemingly feeling snappier today, so the test results were as follows:

Existing code

goos: linux
goarch: amd64
pkg: github.com/jesseduffield/lazygit/pkg/config
cpu: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
BenchmarkMigrationOnLargeConfiguration-8             246           4828826 ns/op         1501479 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             244           4767374 ns/op         1501489 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             247           4888204 ns/op         1501483 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             241           4746350 ns/op         1501507 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             247           4737518 ns/op         1501484 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             249           4813931 ns/op         1501491 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             244           4749415 ns/op         1501504 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             246           4883387 ns/op         1501523 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             255           4785092 ns/op         1501493 B/op      19316 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             242           4764061 ns/op         1501510 B/op      19317 allocs/op

Previous Implementation

goos: linux
goarch: amd64
pkg: github.com/jesseduffield/lazygit/pkg/config
cpu: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
BenchmarkMigrationOnLargeConfiguration-8             913           1147582 ns/op          265811 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1316            857591 ns/op          265801 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1274            885818 ns/op          265792 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1282            920565 ns/op          265789 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1347            945753 ns/op          265794 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1353            861908 ns/op          265794 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1286            852052 ns/op          265796 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1401            845480 ns/op          265797 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1402            822486 ns/op          265795 B/op       3928 allocs/op
BenchmarkMigrationOnLargeConfiguration-8            1461            820271 ns/op          265799 B/op       3928 allocs/op

New implementation

goos: linux
goarch: amd64
pkg: github.com/jesseduffield/lazygit/pkg/config
cpu: Intel(R) Core(TM) i7-8705G CPU @ 3.10GHz
BenchmarkMigrationOnLargeConfiguration-8             530           2237977 ns/op          597309 B/op       7034 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             560           2119147 ns/op          597395 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             559           2224450 ns/op          597347 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             570           2302126 ns/op          597431 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             601           1952700 ns/op          597492 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             607           1969020 ns/op          597366 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             565           2019408 ns/op          597369 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             613           1949337 ns/op          597403 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             592           1997986 ns/op          597363 B/op       7035 allocs/op
BenchmarkMigrationOnLargeConfiguration-8             603           1955303 ns/op          597433 B/op       7035 allocs/op

I'm still happy with new implementation though, since it'll make future migrations cost basically nothing.

@ChrisMcD1 ChrisMcD1 changed the title Refactor migrations to only marshall yaml once Refactor migrations to only marshall yaml ~once~ twice Feb 25, 2025
@ChrisMcD1 ChrisMcD1 changed the title Refactor migrations to only marshall yaml ~once~ twice Refactor migrations to only marshall yaml twice Feb 25, 2025
@stefanhaller
Copy link
Collaborator

Pushed a new version that does the unmarshalling twice. That's the only way I know to do a deep copy! Alternatively happily welcome.

Might be worth looking into https://github.com/brunoga/deep instead? I have no experience with it myself, just found it with a cursory google search.

@jesseduffield
Copy link
Owner

Something that just occurred to me: what if we just had some check at the start to see if the current yaml satisfies the schema? if so, we don't need to do any migrating. Then it won't matter how long it takes to do the migration because it will only need to run occasionally (after a user updates lazygit)

@ChrisMcD1
Copy link
Contributor Author

@jesseduffield That worked quite smoothly! I spun it off into its own PR, because it ends up being a totally separate section of code: #4334

I've removed the benchmark commit from this branch, since it is no longer a relevant thing to be tracking.

I still think this refactor stands on its own merits. It feels more elegant for me for the individual migrations to be expressed in terms of acting on the root yaml.Node, instead of on a []byte, and each of them no longer need to have error handling for invalid YAML marshalling or unmarshalling. See ChrisMcD1#1 for what the N+1 of a new migration looks like if the interface is in terms of a yaml node. (That PR will be turned into a PR on this repo once we decide the fate of this PR, as it is on a stacked branch)

@stefanhaller
Copy link
Collaborator

Something that just occurred to me: what if we just had some check at the start to see if the current yaml satisfies the schema? if so, we don't need to do any migrating. Then it won't matter how long it takes to do the migration because it will only need to run occasionally (after a user updates lazygit)

I'm against this approach, because I'm not sure we can really trust the schema validation enough for this. Here's one example: the schema declares the context of a custom command as a string, because we can't do any better; conceptually it's a bunch of enums separated by commas. Now suppose we decide to rename one of the enum entries (e.g. we find a better name for "commitFiles"). So we write a migrator that splits the context by comma, and renames every occurrence of "commitFiles" to that better name. Schema validation doesn't catch this, because it's still a string before and after.

Maybe this is a contrived example, but it does illustrate the problem. I'd much rather we use the actual migrations that we perform to tell us whether anything happened.

I still find the deepCopy/deepEquals approach the most promising.

@stefanhaller
Copy link
Collaborator

@jesseduffield That worked quite smoothly! I spun it off into its own PR, because it ends up being a totally separate section of code: #4334

@ChrisMcD1 I'm confused, can you explain more what you're doing in that PR? It doesn't seem to have anything to do with Jesse's suggestion as far as I can see.

@jesseduffield
Copy link
Owner

Good points @stefanhaller

@stefanhaller
Copy link
Collaborator

Oh and btw, I wouldn't be surprised if doing a schema validation is actually more expensive than running the migrators and seeing if they changed something. That's just a guess though, would have to be measured of course.

@ChrisMcD1
Copy link
Contributor Author

I'm now realizing the idea of #4334 doesn't really work in the general case, as you point out with the string example. And also doesn't even work in many normal cases. The case where it did work, and the one I manually tested it on was:

  1. User has a deprecated commitPrefix entry that is a nested object, instead of a sequence.
  2. This passes the yaml.Unmarshal because it is still valid YAML, but returns an error on node.Decode(base), since the actual type, a mapping, does not match the expected type, a sequence.
  3. We enter the error path, do the migration, and on the 2nd decoding it returns happy.

In my head, node.Decode(base) was a much stronger guarantee that the actual document matched the schema object, but I imagine it doesn't guarantee that all really, and will do sensible decoding behaviors like silently skipping over unexpected keys.

Closing that PR, cause it makes no sense!

@ChrisMcD1
Copy link
Contributor Author

Added back the benchmark, since this is now in the normal startup path.

Copy link
Owner

@jesseduffield jesseduffield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending one comment

if !reflect.DeepEqual(rootNode, originalCopy) {
newContent, err := yaml_utils.YamlMarshal(&rootNode)
if err != nil {
return nil, fmt.Errorf("Failed to remarsall!")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("Failed to remarsall!")
return nil, fmt.Errorf("Failed to remarshal!")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and added a %w for the error as well

@stefanhaller stefanhaller added the maintenance For refactorings, CI changes, tests, version bumping, etc label Feb 28, 2025
@stefanhaller
Copy link
Collaborator

Ok, looks good to me too. Nice improvement.

I'd still find it interesting to play with using a deep copy instead of unmarshalling twice, but we can do that later if we find it important enough. Merging for now.

@stefanhaller stefanhaller merged commit a7b0ccf into jesseduffield:master Feb 28, 2025
14 of 15 checks passed
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 6, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [jesseduffield/lazygit](https://github.com/jesseduffield/lazygit) | minor | `v0.47.2` -> `v0.48.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>jesseduffield/lazygit (jesseduffield/lazygit)</summary>

### [`v0.48.0`](https://github.com/jesseduffield/lazygit/releases/tag/v0.48.0)

[Compare Source](jesseduffield/lazygit@v0.47.2...v0.48.0)

<!-- Release notes generated using configuration in .github/release.yml at v0.48.0 -->

#### What's Changed

##### Enhancements 🔥

-   Custom commands submenus by [@&#8203;stefanhaller](https://github.com/stefanhaller) in jesseduffield/lazygit#4324

##### Maintenance ⚙️

-   Refactor migrations to only marshall yaml twice by [@&#8203;ChrisMcD1](https://github.com/ChrisMcD1) in jesseduffield/lazygit#4318

**Full Changelog**: jesseduffield/lazygit@v0.47.2...v0.48.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODIuNCIsInVwZGF0ZWRJblZlciI6IjM5LjE4Mi40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance For refactorings, CI changes, tests, version bumping, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants