Skip to content

VFS watcher impl #556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed

VFS watcher impl #556

wants to merge 24 commits into from

Conversation

vemoo
Copy link
Contributor

@vemoo vemoo commented Jan 15, 2019

Implements #433

For now I ended up referencing my forked notify with the fix until it's merged and released.

There are some things left to do, like what to do on Rescan or Error events, but the rest it's working.

root,
path: path.clone(),
filter: Box::new(filter),
};
res.worker.inp.send(task).unwrap();
res.watcher.watch(path).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add error handling and log a warning for errors.

@vemoo
Copy link
Contributor Author

vemoo commented Jan 16, 2019

I was missing the Task:HandleChange that lets the Vfs decide which changes should be loaded. And it was trying to load everything.

After fixing that I'm still seing high CPU usage and the ra_lsp_server sometimes hangs and doesn't close after closing the editor. I will investigate, but I think it's not ready to merge yet.

@matklad
Copy link
Member

matklad commented Jan 19, 2019

Note that I've had to rewrite the history recently, so expect storms while rebasing: #566 (comment)

Sorry about that: it's definitively my mistake 🙁

@vemoo
Copy link
Contributor Author

vemoo commented Jan 19, 2019

Didn't run into any problems when rebasing.

I found out what the issue was. Using the default recursive watching ment also watching the target folder, and during builds there's a lot of activity there. It was so bad that after running cargo build and then cargo clean my system started swapping after ra_lsp_server took all the memory.

I've changed the watching to non recursive and implemented recursive watching taking .gitignore into account, using the ignore crate. I'm not sure if that's the right way, since that wouldn't work on projects that are not on git. Maybe excluding build folders would be better, but I don't know if there's an easy way to do it.

@matklad
Copy link
Member

matklad commented Jan 20, 2019

Huh, that is getting more complex that I have anticipated...

I think “watch only specific stuff” is the right approach, but we must share that same filtering logic betweeen initial scan and subsequent watching.

Specifically RootFilter is the thing we want to use here:
https://github.com/rust-analyzer/rust-analyzer/blob/11a77d76a9759a24cfbe507550db262e22f830ad/crates/ra_vfs/src/lib.rs#L38

Conceptually, SourceRoot is the set of files in the directory, and RootFilter captures this concept. It currently only filters by .rs extension, but it should filter out /target directory as well. I think it’s fine to filter out target by default.

Using gitignore is also a good idea, but I think it should be read by the clients of vfs and supplied at a construction time as a RootFilter. Perhaps a RootGlob would be a better game even?

@vemoo
Copy link
Contributor Author

vemoo commented Jan 21, 2019

Ok I think it's almost there.

I have implemented recursive watching using the RootFilter but without .gitignore for now.

Right now in rust-analyzer there are around 218 roots (between cargo registry and workspace crates) so each time there's a change in a watched dir we iterate all of them to check if it belogs to any of them.
So right know even for changes in the .git folder or node_modules we do that, but when .gitignore is implemented that won't happen because we won't be watching those directories.
Still, I wonder if it's worth it to store the roots in a hierarchy to avoid always iterating on all of them.

I think this should be usable now. Maybe the .gitignore support can be implemented in a separate PR?

@matklad
Copy link
Member

matklad commented Jan 22, 2019

So right know even for changes in the .git folder or node_modules we do that

I'd just hard-code those for now as well :)

Maybe the .gitignore support can be implemented in a separate PR?

Makes sense to me! I'll also be fine with just hard-coding worst offenders and punting on this until notify 5.0

Still, I wonder if it's worth it to store the roots in a hierarchy to avoid always iterating on all of them.

Sure, long-term we need faster data structures everywhere. Everything in rust-analyzer currently uses a dumb linear scan and recomputed every time. However, I think that before starting with "obvious" optimizations, we need to put some integration profiling infrastructure in place. That is, I'd like to see info like "we've spent 5 CPU seconds on traversing the roots in the last minute of rust-analyzer usage".

@vemoo
Copy link
Contributor Author

vemoo commented Jan 22, 2019

Ok, I've hardcoded .git and node_modules in any path component aswell.

Sure, long-term we need faster data structures everywhere. Everything in rust-analyzer currently uses a dumb linear scan and recomputed every time. However, I think that before starting with "obvious" optimizations, we need to put some integration profiling infrastructure in place. That is, I'd like to see info like "we've spent 5 CPU seconds on traversing the roots in the last minute of rust-analyzer usage".

Ok, that makes sense. But I think it's related to how ignoring files should be implemented. Using this project as an example there's a .gitignore file at the root level that affets all of the crates. With the current structure all of the RootFilters should know about it. But if there's some hierarchy only the outer root should know about it.

What I'm thinking is a tree with nodes like:

struct Node {
    ignores: Vec<Ignore>,
    path: PathBuf,
    root: Option<VfsRoot>,
    children: Vec<Node>,
}

root is optional to be able to create nodes for common paths like cargo registry or the other ignore files are.

root: VfsRoot,
path: PathBuf,
root_filter: Arc<RootFilter>,
nested_roots: Vec<PathBuf>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nested_roots should just be a part of RootFilter, with a more generic name like excluded_dirs.

}
Task::HandleChange(change) => {
// forward as is because Vfs has to decide if we should load it
TaskResult::HandleChange(change)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this looks like one elaborate control flow! I wonder if we could simplify this... At least when I originally thought about it, I've thought of io as a separate actor which completely encapsulates file system access and gives very "clean" TaskResults outside:

enum TaskResult {
    BulkLoadRoot { root: VfsRoot, files: Vec<(Path, String)> },
    AddSingleFile { root: VfsRoot, path: Path, text: String},
    ChangeSingleFile { ... },
    RemoveSingleFile { ... },
}

That is, io handles all loading and watching of roots, and vfs itself then decides if it should ignore ChangeSingleFile due to memap or not. Watcher becomes and implementation detail of IO.

I think that probably Worker will need to know the set of all roots for this to work, so perhaps we should encapsulate roots: Arena<VfsRoot, Arc<RootFilter>>, into Roots struct and share it via an Arc between vfs and Worker?

Obviously, I don't know the specifics of notify API very well, so perhaps this wont' work for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, that sounds better and I think it should work, I will try to implement it tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Roots struct be in a RwLock? At the moment it can be implemented without it but if the vfs will support adding roots after initialization it will be needed.

Copy link
Member

@matklad matklad Jan 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Roots struct be in a RwLock? At the moment it can be implemented without it but if the vfs will support adding roots after initialization it will be needed.

I'd lean towards sharing an immutable snapshot for starters.

EDIT: that is, just Arc<Roots>, without RwLock. I think adding dynamically reconfigurable roots will be a giant can of worms, which actually might completely change the design here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now implemented.

I had to move the watcher to inside the io workers thread:
https://github.com/rust-analyzer/rust-analyzer/blob/1692a11d09cc3d70dcde10cc896c6f7ebb9eb3c3/crates/ra_vfs/src/io.rs#L50-L51
to have access to the output sender and be able to send the TaskResults directly from the watcher.

I've added excluded_dirs to RootFilter and also
https://github.com/rust-analyzer/rust-analyzer/blob/1692a11d09cc3d70dcde10cc896c6f7ebb9eb3c3/crates/ra_vfs/src/lib.rs#L63
to make using it with walkdir cleaner.

The rest is pretty much as you described it, which is a lot cleaner.

One thing I have found is that WalkDir by default doesn't follow symlinks which means that entry.file_type().is_[dir|file]() and entry.path().is_[dir|file]() won't necessarily match. I'm thinking we should probably follow symlinks but i'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this now looks superb! I’ll do a manual smoke test tomorrow and merge!

I think it’s ok not to follow simlnks for now

@matklad matklad mentioned this pull request Jan 26, 2019
bors bot added a commit that referenced this pull request Jan 26, 2019
665: Pr 556 r=matklad a=matklad

Rebased #556 

Thanks @vemoo, now I can change branches without reopening VS Code!

Co-authored-by: Bernardo <berublan@gmail.com>
@matklad
Copy link
Member

matklad commented Jan 26, 2019

Merged in #665!

@matklad matklad closed this Jan 26, 2019
@matklad
Copy link
Member

matklad commented Jan 30, 2019

Hm, I still get occasional 100% CPU hangs. Need to investigate this

@vemoo
Copy link
Contributor Author

vemoo commented Jan 30, 2019

In my case it happened when watching directories with a lot of notifications, and was accompanied by high memory usage. Maybe try adding logging for the all the notify events and see if there's something unusual there.

@vemoo
Copy link
Contributor Author

vemoo commented Feb 6, 2019

I've found something that may be related notify-rs/notify#177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants