-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve text file reading performance #961
Conversation
Update read_lines using binary readingI tried to read text files in Using binary reading ditches the encoding formatting process, and while the original
That's a 30.65% speedup, which I think is worth celebrating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @zoziha.
This PR changes the way fpm reads text files from reading characters by line to reading all binary bytes at once, which may reduce the time it takes to read files, and doesn't change much of fpm's other behavior:
There is nothing left to update in this PR, and if the change in the way the file is read is considered beneficial, then this PR is passable. |
@zoziha Is this PR ready to merge ? , I have resolved the conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zoziha , Looks good to me.
Thanks for reviewing, @henilp105 . Okay, nothing more to add, let's merge it. |
Description
getline
;read_lines
using binary reading;Use smaller buffer size in getline
I'm trying to improve the efficiency of reading text files:
number_of_rows
routine;advance='yes'
read.Local data proves that all three of them can improve read efficiency to some extent. However, they fail to have an order of magnitude improvement effect.
Among them, using a smaller buffer size is the least change to the
fpm
code, I tested in Windows OS and Ubuntu Linux environment, the two trends are basically the same, the following gives the time-consuming evaluation image under Windows OS and Ubuntu Linux environment:Time consumed to read a certain 177-line *.f90 file 1000 times:
Compared to
32768
, using a smaller line length buffer, such as1024
(toml-f using4096
), is more in line withfpm
's common file read scenarios, and at the same time we can get a 26%~52% read performance improvement.(Win: Windows OS; GFortran: GCC Fortran; IFX: Intel oneAPI ifx)
Pseudocode
Also see this repo.
Related links