Description
Rust's current implementation of read_to_end
will read exponentially larger chunks of the buffer, but only up to 8192. This is slow when reading a greater than 16KB file if counting by syscalls.
NodeJS reads the entire file in one go, making 3 syscalls: open, fstat, read, and close.
Rust currently makes (for the same file) 58 syscalls not counting mmap/madvise which may be circumstantial (not related to IO). Specifically: open, ioctl, read * 37, mmap, madvise, read * 18, madvise * 14, close.
I would expect that functions such as read_to_end and read_to_string would do the minimal work necessary to read the entire file.
The below diff is what it takes to currently get Rust to do 4 syscalls (open, fstat, read, read, close). I think it's reasonable that similar code could be included (perhaps with specialization) in the filesystem reading code.
The difference between the two (with dropping caches):
fast version:
real 2m45.157s
user 0m11.484s
sys 0m1.516s
current, slower version:
real 3m9.654s
user 0m11.580s
sys 0m1.956s
- let mut file = File::open(entry.path())?;
- let mut file_contents = String::new();
+
+ let file = File::open(entry.path())?;
+ let capacity = match file.metadata() {
+ Ok(metadata) => metadata.len(),
+ Err(_) => 0,
+ };
+ let mut reader = BufReader::with_capacity(capacity as usize, file);
+ let mut file_contents = String::with_capacity(capacity as usize);
+ let len = match reader.fill_buf() {
+ Ok(buf) => {
+ match ::std::str::from_utf8(buf) {
+ Ok(s) => file_contents.push_str(s),
+ Err(_) => {
+ skipped += 1;
+ continue;
+ }
+ };
+ buf.len()
+ },
+ Err(_) => {
+ skipped += 1;
+ continue;
+ }
+ };
+ reader.consume(len);
+ reader.read_to_string(&mut file_contents)?;
+
// Skip files whose size is 0.
- if file.read_to_string(&mut file_contents)? == 0 {
+ if file_contents.is_empty() {