-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve multibyte-split byte-range performance #16019
Improve multibyte-split byte-range performance #16019
Conversation
The benchmark only showed minor improvement
But running this on a 14GB file in 10 chunks showed a more significant improvement.
The 1st value in the parenthesis is the chunk starting position (byte offset) within the file. This is the output of the same code using
Here loading each chunk takes roughly the same amount time. |
/merge |
Description
Changes the
cudf::io::text::multibyte_split()
function to usestd::ifstream::seekg()
to skip bytes instead ofstd::ifstream::ignore()
for a file input source.The
seekg()
function is significantly faster for large files.Also fixed the multibyte-split benchmark to correctly access the chars buffer after generating an input strings column.
Checklist