- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
          Use BufReader for LocalFileReader to revert performance regression in parquet reading
          #1366
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
BufRead for ChunkObjectReader to improve performanceBufRead for ChunkObjectReader to revert performance regression in parquet reading
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Dandandan  -- I am not sure about the need for the BufRead trait, but otherwise looks good to me.
Very nice 🕵️ work
| use std::fs::{self, File, Metadata}; | ||
| use std::io::{Read, Seek, SeekFrom}; | ||
| use std::io::{BufRead, BufReader, Read, Seek, SeekFrom}; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL BufRead trait
| file.seek(SeekFrom::Start(start))?; | ||
| Ok(Box::new(file.take(length as u64))) | ||
|  | ||
| let file = BufReader::new(file.take(length as u64)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is the actual fix, right? Is the change to require the BufRead trait needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Rerunning some benchmarks now without the trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - looks like BufRead wasn't needed 🎉
BufRead for ChunkObjectReader to revert performance regression in parquet readingBufReader for ChunkObjectReader to revert performance regression in parquet reading
      BufReader for ChunkObjectReader to revert performance regression in parquet readingBufReader for LocalFileReader to revert performance regression in parquet reading
      | 🎉 | 
| Thanks @Dandandan ! Can you quickly explain what the reason for the slowdown was exactly? | 
| 
 As far as I can explain: The earlier code used the  By introducing the object storage abstraction, we were directly reading from a  By wrapping the  Maybe a potential improvement would be having a bit more control, such as setting the capacity of the buffer. | 
Which issue does this PR close?
Closes #1363
Rationale for this change
Parquet reading was slow. This recovers the performance regression in the TPC-H benchmark.
There is still a slowdown in query 10 - and other queries, but this is unrelated to Parquet reading #1367 (and performance still improves from roughly 10 to 7s on that query).
What changes are included in this PR?
Use theBufReadtrait instead ofRead`.Are there any user-facing changes?