-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
red-knot: VfsFile
input ingredient and a Vfs
#11802
Conversation
File
input ingredient and a Vfs
File
input ingredient and a Vfs
crates/ruff_db/src/vfs.rs
Outdated
permission, | ||
}) | ||
} | ||
VfsPath::Vendored(_) => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexWaygood what I have in mind here is that the implementation calls into the vendored
crate to read the file metadata (vendored_path.last_modification_time()
, vendored_path.exists()
, and ruff_venored::read_to_string(vendored_path)
?)
c730d27
to
9c450dd
Compare
|
File
input ingredient and a Vfs
VfsFile
input ingredient and a Vfs
crates/ruff_db/src/lib.rs
Outdated
/// Interns a path to a vendored file and returns a salsa `File` ingredient. | ||
fn vendored_file(&self, path: &camino::Utf8Path) -> Option<VfsFile>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexWaygood I think we ultimately want two different entry functions for regular files, and vendored files. Mainly because I think it's a bit annoying if you're working with a vendored path and you then need to convert it to a VfsPath
just to get the VfsFile
. Funnily, the first thing that the implementation would do is to dispatch on the path type.
fa3bb86
to
95d1254
Compare
// TODO support untitled files for the LSP use case. Wrap a `str` and `String` | ||
// The main question is how `as_std_path` would work for untitled files, that can only exist in the LSP case | ||
// but there's no compile time guarantee that a [`OsFileSystem`] never gets an untitled file path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @snowsignal I don't plan to support this as part of this PR but something we need to figure out for red-knot/LSP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging me on this!
95d1254
to
a98db3d
Compare
CodSpeed Performance ReportMerging #11802 will not alter performanceComparing Summary
|
Okay, not having a I have to figure out how we handle the signature difference between vendored paths and regular paths. I would very much like the possibility that I plan to address this as a separate PR to reduce my rebasing work. Edit: This is solved in #11826 |
VfsFile
input ingredient and a Vfs
VfsFile
input ingredient and a Vfs
[salsa 1..]
VfsFile
input ingredient and a Vfs
[salsa 1..]VfsFile
input ingredient and a Vfs
VfsFile
input ingredient and a Vfs
VfsFile
input ingredient and a Vfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 on the decision to make this its own crate. I think in general making new crates for red-knot functionality should be preferred over squeezing it into existing crates; I think the latter will cause more confusion. In particular I don't think we should try to put red-knot's semantic model into the existing ruff_python_semantic
crate.
crates/ruff_db/src/vfs.rs
Outdated
/// The unix permissions of the file. Only supported on unix systems. Always 0 on Windows | ||
/// or when the file has been deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 or None
?
Should this be an Option<NonZeroU32>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be None
.
The FS api uses a u32
as return type. I'm not aware that 0 is ever a valid permission but I want to avoid conversion errors and using a NonZeroU32
doesn't give us much here (other than the size savings)
/// ## Why do both the [`Vfs`] and [`FileSystem`](crate::FileSystem) trait exist? | ||
/// | ||
/// It would have been an option to define [`FileSystem`](crate::FileSystem) in a way that all its operation accept | ||
/// a [`VfsPath`]. This would have allowed to unify most of [`Vfs`] and [`FileSystem`](crate::FileSystem). The reason why they are | ||
/// separate is that not all operations are supported for all [`VfsPath`]s: | ||
/// | ||
/// * The only relevant operations for [`VendoredPath`]s are testing for existence and reading the content. | ||
/// * The vendored file system is immutable and doesn't support writing nor does it require watching for changes. | ||
/// * There's no requirement to walk the vendored typesystem. | ||
/// | ||
/// The other reason is that most operations know if they are working with vendored or file system paths. | ||
/// Requiring them to convert the path to an `VfsPath` to test if the file exist is cumbersome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do find this layering pretty confusing in trying to understand the code. I think partly it's just the naming: what we call FileSystem
is already "virtual" in that it can represent on-disk, memory, etc. And then we have another "virtual file system" layer on top of that.
I'm also not sure of the accuracy of this statement:
The other reason is that most operations know if they are working with vendored or file system paths.
Maybe it will turn out to be the case? It's just not clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can say from using the API that it's working pretty well so far. But I admit that the terminology is confusing. Do you have any suggestions on how the naming could be improved or what specifically you find confusing.
I can try to improve the documentation. The way I think about FileSystem is less that it is a virtual file system. It's rather more how we access it. Mwhat would you think of FileSystemDriver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some more time looking at it and I think it's fine, if it seems to you to be working out well. I'm not sure that a rename of FileSystem
to FileSystemDriver
would make much difference; I think "Driver" ends up being kind of an empty filler word there that doesn't really communicate much.
It was hard for me to figure out the model for how these two things interact, since neither one encapsulates the other. It seems the model is that a Db has both a Vfs and a FileSystem, and it will use the FileSystem for looking up filesystem paths, and manage looking up vendored paths itself, though that hasn't been implemented yet, just stubbed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what's confusing here is the naming of Vfs
? Maybe it should just be VfsFiles
?
Anyway, I do think your concern is valid and I went back and forth a couple of times between FileSystem
handling all VfsPath
s and the FileSystem
only handling specific paths. And maybe my reasoning that vendored paths are different enough to not pass them through FileSystem
is unjustified, considering that calling write_file
with a path pointing to a directory also fails. So there's anyway some extra care that needs to be taken when handling paths that aren't checked at compile time. But I find it kind of nice if we can catch some of them at compile time.
The way I think about it is that FileSystem
is a replacement for std::fs
, that's it. It doesn't add support for using multiple file systems at once (vendored and the regular one), which is what Vfs
does.
An entirely different design (not fully thought through) would be to ditch the FileSystem
trait all together. The main motivation for it is that we can support unsaved files and untitled files in the LSP use case. But we could support this in another way by adding a open_file
and close_file
function to Vfs
that allows to manually override the content of a file.
The only functionality we would looe is that we can't "mock out" the file system during tests with an in memory file system. But maybe that's not worth going through all that trouble. The WASM integration story would then require to provide a "stub" FS implementation. It's not a big detail and something that e.g. emscripten provides out of the box but it comes with a bit more boilerplate code than an option to just use a memory file system.
Either way, I don't consider the Vfs / FileSystem design that I proposed here as the final design. It requires some more iterating and I'm open to redesigning it completely.
- We merge what we have now, with the risk that we might need to refactor some of the calling code
- We strip out the
FileSystem
trait and revisit the design when implementing LSP support.
I'm leaning towards keeping what we have here. There's not a lot of downstream code depending on FileSystem
, so that a refactor should be fairly painless.
I'm a bit surprised by this comment because that's what I proposed in discord and I didn't see any objection to doing this for ruff_python_semantic I can see how it can cause confusion. Mainly for imports when you get both Symbols (the ruff and red_knot one). I don't think we can't avoid mixing them long term. The red knot crates at least will have to depend on their corresponding duff crates to use their functionality and it then becomes less clear to me what the benefit of keeping them separate really is. I also think that it is a motivation function to integrate early and more often compared to building this out entirely in different crates |
Oh sorry! I think I do remember that Discord comment now; I guess I didn't think very carefully about it then.
I think it will just be generally hard to figure out how to structure things within one crate to make it clear what is red-knot and what is "v1". E.g. there is a top-level
I agree that red knot will probably have to depend on v1, but the dependency should not go in the other direction, and to me this is enough reason to keep them separate. I'm not clear what sort of "integration" we actually envision between the v1 semantic model and the red-knot semantic model that this would encourage. Can you clarify what kind of integration you are thinking of? My understanding is that they will always remain separate, and rules will have to explicitly be ported in some way. |
I agree on this and something I have started thinking about as well. I don't think what I've done in my follow up PRs is ideal and I wanted to iterate on it. For now, the red knot code is gated behind the I think what could be improved is that we move all Regardless, this would then be very close to having two separate crates. The difference I see is that moving files in a crate tends to be easier and we can also already make use of the right crate visibilities (we don't need to make anything I overall don't have a strong preference and I think I would just go with one approach for now. We don't need to get it right just now, it's easy to split the code out later.
I at least want to try to come up with a rule API that would work for both |
I'll go ahead and merge this PR. This does not mean that we made a decision on how to proceed with |
VfsFile
input ingredient and a Vfs
VfsFile
input ingredient and a Vfs
I do see that visibilities could be a reason to share a crate, if there's a lot of use of "old" internals from red-knot code. Ultimately I think sharing a crate could work fine, too, as long as we have a structure that makes it clear what is v1 and what is red-knot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review here. It took me a little bit of time to get my head round some of this, but it LGTM. Happy to open my own followup PR for some of my docs nits, unless there are any you particularly disagree with!
|
||
/// Path to a file or directory stored in [`FileSystem`]. | ||
/// | ||
/// The path is guaranteed to be valid UTF-8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we won't provide type-checking for Python scripts if their filename contains non-utf8 characters? Is this a limit Ruff already has when linting Python?
I think we can guarantee that vendored files are always going to be valid utf-8 but I'm not sure about non-vendored files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it limits us to UTF8 paths only but Ruff already assumes UTF8 today (we have so many path.to_str().unwrap()
calls. Also, non UTF8 paths are extremely uncommon. I think all modern file system support UTF8 today (or at least, paths can be encoded to UTF8).
This PR adds the foundation for red-knots salsa integration. It starts by defining a virtual filesystem that is Salsa's view of the files on disk (metadata only for now, I'll add support for content in the next PR).
VFS
The
Vfs
supports operating on files from the local filesystem (disk, editor, WASM, in memory) or vendored stub files (this PR only adds a stub implementation). It is virtual because it supports multiple sources and it is Salsa's view of the file system state.The most notable change in terms of what we discussed for vendored files is that this PR exposes two methods on the
Vfs
:file
: To "intern" a file system path.vendored
: To intern a vendored pathThe motivation behind this is that the methods differ in their return-type (and argument, but that's less important). For
file
, it's important that the implementation returns aFile
object even for files that don't exist so that salsa can track the fact that this query needs to be re-executed if such a file gets created later. This isn't necessary for vendored files because the vendored file system is readonly. ReturningNone
in that case reduces the number of dependencies that salsa has to track, and in turn, can result in better performance. Splitting the methods also has the benefit that they're easier to call. I suspect that code paths will either exclusively work with file system or vendored path. It would be cumbersome if the callers need to wrap the path in aVfsPath
just for the interning.FileSystem
This PR also introduces a new
FileSystem
trait that abstracts away how filesystem files are read. The goal of this is that we can support different environments:Crate name
I ended up creating a new crate because I couldn't find an existing crate that fits well. I first wanted to use
ruff_source_file
which is very similar. However,ruff_notebook
depends on it and I suspect that this crate might depend onruff_notebook
in the long term (unless we use someArc<dyn Trait + Eq>
to represent file content).The other reason why I didn't put it in
ruff_source_file
is that it is intended to store low level data structures for files. A Vfs and a dependency on salsa feels a bit overkill for such a crateOpen Questions
walkdir
doesn't support virtual filesystems. Ideally, both the memory file system and the real implementation would use the same dir walking mechanism to avoid differences in behavior.apply_changes
method that takes a mutable self whenever it observed file system changes, so that theVfs
can update its metadata.