Skip to content

File IO with read_to_end and read_to_string is slower than possible #35823

Closed
@Mark-Simulacrum

Description

@Mark-Simulacrum

Rust's current implementation of read_to_end will read exponentially larger chunks of the buffer, but only up to 8192. This is slow when reading a greater than 16KB file if counting by syscalls.

NodeJS reads the entire file in one go, making 3 syscalls: open, fstat, read, and close.
Rust currently makes (for the same file) 58 syscalls not counting mmap/madvise which may be circumstantial (not related to IO). Specifically: open, ioctl, read * 37, mmap, madvise, read * 18, madvise * 14, close.

I would expect that functions such as read_to_end and read_to_string would do the minimal work necessary to read the entire file.

The below diff is what it takes to currently get Rust to do 4 syscalls (open, fstat, read, read, close). I think it's reasonable that similar code could be included (perhaps with specialization) in the filesystem reading code.

The difference between the two (with dropping caches):
fast version:

real    2m45.157s
user    0m11.484s
sys 0m1.516s

current, slower version:

real    3m9.654s
user    0m11.580s
sys 0m1.956s
-            let mut file = File::open(entry.path())?;
-            let mut file_contents = String::new();
+
+            let file = File::open(entry.path())?;
+            let capacity = match file.metadata() {
+                Ok(metadata) => metadata.len(),
+                Err(_) => 0,
+            };
+            let mut reader = BufReader::with_capacity(capacity as usize, file);
+            let mut file_contents = String::with_capacity(capacity as usize);
+            let len = match reader.fill_buf() {
+                Ok(buf) => {
+                    match ::std::str::from_utf8(buf) {
+                        Ok(s) => file_contents.push_str(s),
+                        Err(_) => {
+                            skipped += 1;
+                            continue;
+                        }
+                    };
+                    buf.len()
+                },
+                Err(_) => {
+                    skipped += 1;
+                    continue;
+                }
+            };
+            reader.consume(len);
+            reader.read_to_string(&mut file_contents)?;
+
             // Skip files whose size is 0.
-            if file.read_to_string(&mut file_contents)? == 0 {
+            if file_contents.is_empty() {

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions