watcher refactor #3471

Byron · 2024-04-09T19:31:57Z

This PR attemps to minimize the watcher implementation to only the needed complexity.

Based on the analysis in #3447.

Tasks

Notes for the Reviewer

The delta computation consumes the most CPU, and if it's truly unused, I think it should be removed from the handler at least.
I removed the rate-limited, as it never kicked in during my testing due to the handling of multiple paths at once. This was an issue to me as well as simply not calculating something would have meant that sometimes, it's not arriving at the correct result.
I am pretty sure that I never got to test the fetch and push GB repo functionality, maybe much more. Better test everything that is triggered by watchers.
LOG_LEVEL=info should probably be preferred when testing the performance of git checkout @~100 -- src/ or similar as it will still emit a lot of debug messages that are buried deeply in the core part. This makes it seem like a lot is happening, even though that's not the case anymore in comparison to the before-times.
I adjusted the log level of a few traces in core to allow launching the application in DEBUG mode without being spammed with low-level information. Instead, DEBUG is now used for select information, and everything else was demoted to the TRACE level. It's probably the way things go, what's useful in DEBUG one day is better as TRACE in another as the focus changes.

Application Bugs 📋

These bugs were encountered during testing.

Adding a file is registered, but removing it often is not. Creating and removing empty files often doesn't register the removal (MacOS) #3511
Worktrees probably aren't supported - judging by the assumption in file_monitor.rs, .git has to be a directory which isn't the case for worktrees.
- still needs to be formally reproduced, but "Add new project" in a worktree fails due to not being able to read the objects directory. #3062 already talks about this. and what's mentioned here is definitely related.
- possible fix - detect this case and install a separate notification stream on the extra git directory.
Trying to open the linux kernel freezes for a long time, probably because it wants to download gravatar images of 30k or so contributors. See Adding the Linux Kernel as project takes more than an hour to complete #3512 .

Out of Scope but TODO

git::credential::Helpers::from_path() is used in tests, but depends on the actual HOME environment variables, isn't isolated
- Maybe check other from_path() implementations as well.

My Notes

Definitely not a mandatory read to review this PR.

There seem to be redundant calls to is_path_ignored() in many places, given that the dispatcher filter already avoids ignored files (and can do so much more efficiently, but we must assure that it always gets most recent information .gitignore files as the watcher is long-running).
In theory, since it's going to be communicating with channels to the outside, there should be no fear of parallelism at all, so all Handlers should have no need. One can hope they are isolated which they are probably not as they are all managed in tauri. This should probably be rolled back as much as possible.
Send+Sync + shared state and Clone are poison, and a way to feel unsafe even without the keyword. In such an environment, one has no garantuees anymore, so one will want all code not to be dealing with that.
- This is probably the entrypoint to unravelling Send+Sync in core as well, even though it might also be a good hint at how these core instances can be used. My feeling is that many will need a tauri-specific wrapper, so that core can be pure.
There is a large amount of boilerplate just to deal with all the blanket 'protections' needed for Send+Sync, many handlers are just a couple of lines. One handler is just sending an event, effectively.
The current event queue system is actually very cleverly multi-threading everything. Each event coming in is processed in its own (or pooled) thread, and as each even can spawn new events, these will equally be queued and run in parallel. This is why everything is Send + Sync in the watcher.
Seeing the .git/GB_Flush file, I wonder if this file-based notification system is truly necessary, or if that could be kept in tauri. Maybe the plan is to support multiple windows to be open on the same repository (rather than prevent it), which probably would make this file-based approach necessary, and maybe even the most simple solution if filesystem events can be assumed to be reliable.
There definitely is some business logic in the watcher that probably wants to move to core at some point,.
There is generally a lot of rather generic string-only context() calls and it feels these are 'just there to be there' even though the callee probably could deliver decent error messages from the beginning (and maybe already does so).
Tests are definitely easier to write with the Events system as side-effects are nicely decoupled via returned events. Now that everything is coupled again, it becomes clear how much state is required for certain functions, making the initial test setup more involved.

`try_new()` here is used as constructor, which is what `new` is for with less boilerplate.

They don't actually need it.

That way, all objects go away and it will be nothing more than a task around a channel.

…ly needed

As `Watcher` really adds nothing.

The idea is that we don't parallelize over a channel anymore, but instead just process filesystem events, one at a time. This would allow each handler to become a function that gets its state passed, and makes all the necessary calls verbatim, which in turn makes it easy to follow what's happening. If anything becomes to slow due to the serialization of event processing, selective parallelization can be re-added.

That way, we get `tauri::Event`, without the somewhat confusing module name `events`.

Previously, each file change both in `.git` as well as in the worktree would cause a complete recomputation. This computation included opening a git repository at least once (probaby more often), to make an 'is-ignored' check. The latter is very expensive in `git2` and gets more expensive the more files there are. Now the repository is opened when needed, and we re-use it for all applicable file paths.

…tation for correctness

Previously it would watch every registered project, which could incur more work on all parts of the application than necessary. Now UI sends an event that indicates which project is active, allowing the watch to be setup in that very moment. It's worth noting that the previously watched project is automatically deregistered.

app/src/routes/[projectId]/+layout.ts

crates/gitbutler-core/src/deltas/writer.rs

crates/gitbutler-tauri/src/watcher/handler/calculate_deltas.rs

Qix- · 2024-04-15T11:20:08Z

crates/gitbutler-tauri/Cargo.toml

@@ -29,6 +29,7 @@ backoff = "0.4.0"
 backtrace = { version = "0.3.71", optional = true }
 chrono = { version = "0.4.37", features = ["serde"] }
 console-subscriber = "0.2.0"
+crossbeam-channel = "0.5.12"


Is there a reason for crossbeam channels vs using tokio's channels?

Actually, tokio channels might be usable here and I didn't try. It would certainly be preferable if it worked.

Let's try tokio then, lmk if it doesn't work for some reason. Just trying to reduce our number of dependencies.

I did take a look and realised that tokio::sync::broadcast is indeed able to provide the multi-multi case. Probably last time I didn't really look there though since it requires async, whereas this code is all non-async.

crossbeam-channel was previously used in the dependency graph by 5 other dependencies and isn't adding to the compile time.

Do you still think crossbeam-channel should be replaced with tokio::sync::broadcast? If so, I'd probably go with block_on where needed as this implementation relies on std::thread::scope(). Turning it into async-proper for parallelisation would be more effort.

That's probably fine, we can move more toward async over time.

Also TIL about std::thread::scope() - neat!

crates/gitbutler-tauri/src/events.rs

crates/gitbutler-tauri/src/projects.rs

crates/gitbutler-tauri/tests/watcher/handler/push_project_to_gitbutler.rs

Byron · 2024-04-15T12:40:22Z

Thanks for starting the review, @Qix- !

I recommend to gh pr checkout 3471 this PR and push fixes and changes directly into it. For bigger questions, I will answer everything that might be coming up here. Thanks again.

PS: @mtsgrd I replied on top of your replies in some comments as what I saw was out of date, not because I thought I had to add to them.

crates/gitbutler-tauri/src/watcher/file_monitor.rs

The `pure` functions were from a time where a `Handler` couldn't be instantiated in full for tests, but that is not the case anymore, so there isn't any use for the added complexity.

- turn `static` into `const`

Byron

Thanks for the review! I think I have addressed all comments thus far, please let me know if anything else comes up.

Qix- · 2024-04-15T15:59:03Z

Thanks!

Byron force-pushed the watcher-refactor branch 24 times, most recently from b025e4e to fd0ee99 Compare April 13, 2024 21:08

Byron added 6 commits April 13, 2024 23:09

remove try_new() in favor of new() in watcher.rs

de6fd55

`try_new()` here is used as constructor, which is what `new` is for with less boilerplate.

Don't consume instances that are Send+Sync+Clone

52c6375

They don't actually need it.

simplify dispatcher around the idea of a single channel

954d100

That way, all objects go away and it will be nothing more than a task around a channel.

Use the new dispatcher and make sure everything still works

efe03a9

Avoid managing every piece of the watcher, only manage what's current…

1476ff0

…ly needed

Turn WatcherInner into Watcher

cdae683

As `Watcher` really adds nothing.

Byron force-pushed the watcher-refactor branch 4 times, most recently from 070e3e1 to cd8e450 Compare April 14, 2024 13:21

This was referenced Apr 14, 2024

Watcher Refactor #3447

Closed

UI freeze when watcher cannot find folder #3001

Closed

Byron force-pushed the watcher-refactor branch 2 times, most recently from 50565d7 to 1cb6451 Compare April 14, 2024 16:49

Byron marked this pull request as ready for review April 14, 2024 17:03

Byron added 6 commits April 15, 2024 07:11

make events private and publicly export Event instead.

af225bd

That way, we get `tauri::Event`, without the somewhat confusing module name `events`.

remove rate-limit as it never kicks in, and also, we'd want the compu…

5664283

…tation for correctness

parallelize the delta computation

c173d80

Byron commented Apr 15, 2024

View reviewed changes

app/src/routes/[projectId]/+layout.ts Show resolved Hide resolved

crates/gitbutler-core/src/deltas/writer.rs Show resolved Hide resolved

crates/gitbutler-tauri/src/watcher/handler/calculate_deltas.rs Show resolved Hide resolved

Byron force-pushed the watcher-refactor branch from 1cb6451 to e2ef2dc Compare April 15, 2024 05:25

Qix- reviewed Apr 15, 2024

View reviewed changes

crates/gitbutler-tauri/src/events.rs Show resolved Hide resolved

Qix- reviewed Apr 15, 2024

View reviewed changes

crates/gitbutler-tauri/src/projects.rs Outdated Show resolved Hide resolved

Qix- reviewed Apr 15, 2024

View reviewed changes

crates/gitbutler-tauri/tests/watcher/handler/push_project_to_gitbutler.rs Outdated Show resolved Hide resolved

Qix- reviewed Apr 15, 2024

View reviewed changes

crates/gitbutler-tauri/src/watcher/file_monitor.rs Outdated Show resolved Hide resolved

Byron added 3 commits April 15, 2024 16:23

avoid &str in place where ProjectId could be used

62b1c49

remove '_pure' functions in favor of creating a full handler in tests.

fb9db89

The `pure` functions were from a time where a `Handler` couldn't be instantiated in full for tests, but that is not the case anymore, so there isn't any use for the added complexity.

address misc review comments

a3fd068

- turn `static` into `const`

Byron commented Apr 15, 2024

View reviewed changes

Qix- merged commit f9f1f3d into gitbutlerapp:master Apr 15, 2024
34 checks passed

Byron deleted the watcher-refactor branch April 15, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watcher refactor #3471

watcher refactor #3471

Byron commented Apr 9, 2024 •

edited

Loading

Qix- Apr 15, 2024

Byron Apr 15, 2024

Qix- Apr 15, 2024

Byron Apr 15, 2024

Qix- Apr 15, 2024 •

edited

Loading

Byron commented Apr 15, 2024 •

edited

Loading

Byron left a comment

Qix- commented Apr 15, 2024

watcher refactor #3471

watcher refactor #3471

Conversation

Byron commented Apr 9, 2024 • edited Loading

Tasks

Notes for the Reviewer

Application Bugs 📋

Out of Scope but TODO

My Notes

Qix- Apr 15, 2024

Choose a reason for hiding this comment

Byron Apr 15, 2024

Choose a reason for hiding this comment

Qix- Apr 15, 2024

Choose a reason for hiding this comment

Byron Apr 15, 2024

Choose a reason for hiding this comment

Qix- Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Byron commented Apr 15, 2024 • edited Loading

Byron left a comment

Choose a reason for hiding this comment

Qix- commented Apr 15, 2024

Byron commented Apr 9, 2024 •

edited

Loading

Qix- Apr 15, 2024 •

edited

Loading

Byron commented Apr 15, 2024 •

edited

Loading