-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seems to be 9x slower than data.table on a 6 core machine #45
Comments
The R data.table version seems to be fairly memory efficient as well using only about 30G of RAM. But it runs through the data twice. Once to count rowand and another to populate. |
The memory usage difference may come from the difference of integer encoding. Julia uses 64-bit integers but R uses 32-bit integers instead, doesn't it? |
Estimating the number of records in the first scan will reduce the required memory since it makes it possible to avoid expanding vectors. |
I have uploaded the largest CSV I can find in the wild and it takes 500 seconds to read. In data.table it takes about 50 seconds. So it's about 9~10 times slower but my machine is only 6 cores, so I was only expecting around 6x faster. Hopefully this will be useful for tuning performance for large files.
The text was updated successfully, but these errors were encountered: