Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Add a read_directory() method to the ruff_db::system::System trait #12289

Merged
merged 9 commits into from
Jul 12, 2024
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crates/ruff_db/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ zip = { workspace = true }

[dev-dependencies]
insta = { workspace = true }
tempfile = { workspace = true }
54 changes: 54 additions & 0 deletions crates/ruff_db/src/system.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,34 @@ pub trait System {
/// Returns the current working directory
fn current_directory(&self) -> &SystemPath;

/// Iterate over the contents of the directory at `path`.
///
/// The returned iterator must have the following properties:
/// - It only iterates over the top level of the directory,
/// i.e., it does not recurse into subdirectories.
/// - It skips the current and parent directories (`.` and `..`
/// respectively).
/// - The iterator yields `std::io::Result<DirEntry>` instances.
/// For each instance, an `Err` variant may signify that the path
/// of the entry was not valid UTF8, in which case it should be an
/// [`std::io::Error`] with the ErrorKind set to
/// [`std::io::ErrorKind::InvalidData`] and the payload set to a
/// [`camino::FromPathBufError`]. It may also indicate that
/// "some sort of intermittent IO error occurred during iteration"
/// (language taken from the [`std::fs::read_dir`] documentation).
///
/// # Errors
/// Returns an error:
/// - if `path` does not exist in the system,
/// - if `path` does not point to a directory,
/// - if the process does not have sufficient permissions to
/// view the contents of the directory at `path`
/// - May also return an error in some other situations as well.
fn read_directory<'a>(
AlexWaygood marked this conversation as resolved.
Show resolved Hide resolved
&'a self,
path: &SystemPath,
) -> Result<Box<dyn Iterator<Item = Result<DirectoryEntry>> + 'a>>;

fn as_any(&self) -> &dyn std::any::Any;
}

Expand Down Expand Up @@ -98,3 +126,29 @@ impl FileType {
matches!(self, FileType::Symlink)
}
}

#[derive(Debug)]
pub struct DirectoryEntry {
path: SystemPathBuf,
file_type: Result<FileType>,
}

impl DirectoryEntry {
pub fn new(path: SystemPathBuf, file_type: Result<FileType>) -> Self {
Self { path, file_type }
}

pub fn path(&self) -> &SystemPath {
&self.path
}

pub fn file_type(&self) -> &Result<FileType> {
&self.file_type
}
}

impl PartialEq for DirectoryEntry {
fn eq(&self, other: &Self) -> bool {
self.path == other.path
}
}
76 changes: 74 additions & 2 deletions crates/ruff_db/src/system/memory_fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use std::sync::{Arc, RwLock, RwLockWriteGuard};
use camino::{Utf8Path, Utf8PathBuf};
use filetime::FileTime;

use crate::system::{FileType, Metadata, Result, SystemPath, SystemPathBuf};
use crate::system::{DirectoryEntry, FileType, Metadata, Result, SystemPath, SystemPathBuf};

/// File system that stores all content in memory.
///
Expand Down Expand Up @@ -237,6 +237,34 @@ impl MemoryFileSystem {
let normalized = SystemPath::absolute(path, &self.inner.cwd);
normalized.into_utf8_path_buf()
}

pub fn read_directory(
&self,
path: impl AsRef<SystemPath>,
) -> Result<impl Iterator<Item = Result<DirectoryEntry>> + '_> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have named return types (over impl) because callers can then name those types (and e.g. store it in a Struct).

I would suggest introducing a ReadDirectory struct similar to std::fs::read_dir that is a thin wrapper around std::vec::IntoIter (or whatever the type of your wild iter chain below is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in doing that generally, but here I wonder if an opaque return type might actually be better? If you're e.g. relying on the exact type being returned from this method in a test (for example), you might get into difficulties if you later switch the test to using the OsSystem and find that a different type is returned from that struct's read_directory() method. I think there's some value in saying that the only API guarantee we give here is that some kind of iterator is returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the argument is API compatibility with System than the method should also return a Box<dyn...> (and accept &SystemPath instead of AsRef<SystemPath> as the argument). I think it's rare that we would switch between systems in tests and the change would be minimal. But having a concrete type can be helpful for a system implementation that's based on the memory file system (e.g. WASM).

Anyway, I don't feel strongly (except that we shouldn't change the return type to Box<dyn>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly either. But I propose we add it when we need it

let by_path = self.inner.by_path.read().unwrap();
let normalized = self.normalize_path(path.as_ref());
let entry = by_path.get(&normalized).ok_or_else(not_found)?;
if entry.is_file() {
return Err(not_a_directory());
};
Ok(by_path
.range(normalized.clone()..)
.skip(1)
.take_while(|(path, _)| path.starts_with(&normalized))
.filter_map(|(path, entry)| {
if path.parent()? == normalized {
Some(Ok(DirectoryEntry {
path: SystemPathBuf::from_utf8_path_buf(path.to_owned()),
file_type: Ok(entry.file_type()),
}))
} else {
None
}
})
.collect::<Vec<_>>()
.into_iter())
}
}

impl Default for MemoryFileSystem {
Expand Down Expand Up @@ -268,6 +296,13 @@ impl Entry {
const fn is_file(&self) -> bool {
matches!(self, Entry::File(_))
}

const fn file_type(&self) -> FileType {
match self {
Self::File(_) => FileType::File,
Self::Directory(_) => FileType::Directory,
}
}
}

#[derive(Debug)]
Expand Down Expand Up @@ -349,7 +384,9 @@ mod tests {
use std::io::ErrorKind;
use std::time::Duration;

use crate::system::{MemoryFileSystem, Result, SystemPath};
use crate::system::{
DirectoryEntry, FileType, MemoryFileSystem, Result, SystemPath, SystemPathBuf,
};

/// Creates a file system with the given files.
///
Expand Down Expand Up @@ -612,4 +649,39 @@ mod tests {
let error = fs.remove_directory("a").unwrap_err();
assert_eq!(error.kind(), ErrorKind::Other);
}

#[test]
fn read_directory() {
let fs = with_files(["b.ts", "a/bar.py", "d.rs", "a/foo/bar.py", "a/baz.pyi"]);
let contents: Vec<DirectoryEntry> = fs
.read_directory("a")
.unwrap()
.map(Result::unwrap)
.collect();
let expected_contents = vec![
DirectoryEntry::new(SystemPathBuf::from("/a/bar.py"), Ok(FileType::File)),
DirectoryEntry::new(SystemPathBuf::from("/a/baz.pyi"), Ok(FileType::File)),
DirectoryEntry::new(SystemPathBuf::from("/a/foo"), Ok(FileType::Directory)),
];
assert_eq!(contents, expected_contents)
}

#[test]
fn read_directory_nonexistent() {
let fs = MemoryFileSystem::new();
let Err(error) = fs.read_directory("doesnt_exist") else {
panic!("Expected this to fail");
};
assert_eq!(error.kind(), std::io::ErrorKind::NotFound);
}

#[test]
fn read_directory_on_file() {
let fs = with_files(["a.py"]);
let Err(error) = fs.read_directory("a.py") else {
panic!("Expected this to fail");
};
assert_eq!(error.kind(), std::io::ErrorKind::Other);
assert!(error.to_string().contains("Not a directory"));
}
}
91 changes: 90 additions & 1 deletion crates/ruff_db/src/system/os.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
use crate::system::{FileType, Metadata, Result, System, SystemPath, SystemPathBuf};
use crate::system::{
DirectoryEntry, FileType, Metadata, Result, System, SystemPath, SystemPathBuf,
};
use filetime::FileTime;
use std::any::Any;
use std::sync::Arc;
Expand Down Expand Up @@ -68,6 +70,17 @@ impl System for OsSystem {
fn as_any(&self) -> &dyn Any {
self
}

fn read_directory(
&self,
path: &SystemPath,
) -> Result<Box<dyn Iterator<Item = Result<DirectoryEntry>>>> {
Ok(Box::new(
path.as_utf8_path()
.read_dir_utf8()?
.map(|res| res.map(DirectoryEntry::from)),
))
}
}

impl From<std::fs::FileType> for FileType {
Expand All @@ -81,3 +94,79 @@ impl From<std::fs::FileType> for FileType {
}
}
}

impl From<camino::Utf8DirEntry> for DirectoryEntry {
fn from(value: camino::Utf8DirEntry) -> Self {
let file_type = value.file_type().map(FileType::from);
Self {
path: SystemPathBuf::from_utf8_path_buf(value.into_path()),
file_type,
}
}
}

#[cfg(test)]
mod tests {
use tempfile::TempDir;

use super::*;

#[test]
fn read_directory() {
let tempdir = TempDir::new().unwrap();
let tempdir_path = tempdir.path();
std::fs::create_dir_all(tempdir_path.join("a/foo")).unwrap();
let files = &["b.ts", "a/bar.py", "d.rs", "a/foo/bar.py", "a/baz.pyi"];
for path in files {
std::fs::File::create(tempdir_path.join(path)).unwrap();
}

let tempdir_path = SystemPath::from_std_path(tempdir_path).unwrap();
let fs = OsSystem::new(tempdir_path);

let mut sorted_contents: Vec<DirectoryEntry> = fs
.read_directory(&tempdir_path.join("a"))
.unwrap()
.map(Result::unwrap)
.collect();
sorted_contents.sort_by(|a, b| a.path.cmp(&b.path));

let expected_contents = vec![
DirectoryEntry::new(tempdir_path.join("a/bar.py"), Ok(FileType::File)),
DirectoryEntry::new(tempdir_path.join("a/baz.pyi"), Ok(FileType::File)),
DirectoryEntry::new(tempdir_path.join("a/foo"), Ok(FileType::Directory)),
];
assert_eq!(sorted_contents, expected_contents)
}

#[test]
fn read_directory_nonexistent() {
let fs = OsSystem::new("");
let result = fs.read_directory(SystemPath::new("doesnt_exist"));
assert!(result.is_err_and(|error| error.kind() == std::io::ErrorKind::NotFound));
}

#[test]
fn read_directory_on_file() {
let tempdir = TempDir::new().unwrap();
let tempdir_path = tempdir.path();
std::fs::File::create(tempdir_path.join("a.py")).unwrap();

let tempdir_path = SystemPath::from_std_path(tempdir_path).unwrap();
let fs = OsSystem::new(tempdir_path);
let result = fs.read_directory(&tempdir_path.join("a.py"));
let Err(error) = result else {
panic!("Expected the read_dir() call to fail!");
};

// We can't assert the error kind here because it's apparently an unstable feature!
// https://github.com/rust-lang/rust/issues/86442
// assert_eq!(error.kind(), std::io::ErrorKind::NotADirectory);

// We can't even assert the error message on all platforms, as it's different on Windows,
// where the message is "The directory name is invalid" rather than "Not a directory".
if cfg!(unix) {
assert!(error.to_string().contains("Not a directory"));
}
}
}
6 changes: 6 additions & 0 deletions crates/ruff_db/src/system/path.rs
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,12 @@ impl SystemPath {
self.0.as_std_path()
}

/// Returns the [`Utf8Path`] for the file.
#[inline]
pub fn as_utf8_path(&self) -> &Utf8Path {
&self.0
}

pub fn from_std_path(path: &Path) -> Option<&SystemPath> {
Some(SystemPath::new(Utf8Path::from_path(path)?))
}
Expand Down
14 changes: 13 additions & 1 deletion crates/ruff_db/src/system/test.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use crate::files::File;
use crate::system::{MemoryFileSystem, Metadata, OsSystem, System, SystemPath};
use crate::system::{
DirectoryEntry, MemoryFileSystem, Metadata, OsSystem, Result, System, SystemPath,
};
use crate::Db;
use std::any::Any;

Expand Down Expand Up @@ -85,6 +87,16 @@ impl System for TestSystem {
fn as_any(&self) -> &dyn Any {
self
}

fn read_directory<'a>(
&'a self,
path: &SystemPath,
) -> Result<Box<dyn Iterator<Item = Result<DirectoryEntry>> + 'a>> {
match &self.inner {
TestFileSystem::Os(fs) => fs.read_directory(path),
TestFileSystem::Stub(fs) => Ok(Box::new(fs.read_directory(path)?)),
}
}
}

/// Extension trait for databases that use [`TestSystem`].
Expand Down
Loading