-
Notifications
You must be signed in to change notification settings - Fork 4
Vectorized Array writing improves its speed #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR! I can confirm that I'm seeing roughly a 60-fold speed-up with this fix, which is quite surprising, since the old version did not copy the data, while this change does. I thought maybe there was a speed-up for How we should probably handle this is to allocate a large but reasonably sized buffer array of type Would you like to tackle this? If not, I can open an alternate PR. |
I come up with splitting a big array into 4kb(4096byte) vectors. why 4kb?The reason vectorized operation is more faster is that cpu offers the SIMD instructions. What I know is that the operation handles 128~258bytes of data at once. So I guess julia will fully utilize vectorized instructions when its size is over 512bytes. Since mmap loads file with 4kb increments, which is default memory page size, doing vectorized operation for each 4kb vectors seem fair enough. The following is the benchmark of
You see that allocation size become twice compared to simple vectorized operation. I believe it causes a slight performance drop for normal file writing, but still acceptable for me. I guess array to subarray operation makes a copy for preventing change of array size while writing. |
for compatibility with julia 1.3 version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review! The approach looks right. I had a few suggestions.
I'm not sure about the logic for writing in 4kb chunks. As suggested, can you make the chunk size a keyword argument? Also it would be nice to have benchmarks of different values of the chunk size ranging from small to e.g. 8mb using BenchmarkTools. @timeit
isn't that reliable for benchmarking. We don't need to benchmark each line, rather just the entire write
function for some MRCData
.
Lastly, unfortunately this repo currently doesn't have a good test suite, but could you add a test that for different chunk sizes, the same data is written to the test file? It could just be a roundtrip test that simulates a map, writes it to a file with two different chunk sizes, then checks that when read in they are the same.
src/io.jl
Outdated
@@ -99,8 +99,14 @@ function Base.write(io::IO, d::MRCData; compress = :none) | |||
T = datatype(h) | |||
data = parent(d) | |||
fswap = bswapfromh(h.machst) | |||
for i in eachindex(data) | |||
@inbounds sz += write(newio, fswap(T(data[i]))) | |||
unit_vsize = 4096 ÷ sizeof(T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the cache size (either 4096 or unit_vsize
) was a keyword argument that a user could select. This would also be handy for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback! It's good idea to set cache size keyword and it would be better for maintenence.
Co-authored-by: Seth Axen <seth.axen@gmail.com>
Co-authored-by: Seth Axen <seth.axen@gmail.com>
Co-authored-by: Seth Axen <seth.axen@gmail.com>
I haved benchmarked workstation spec
Laptop spec
x axis is log scale of unit_vsize, in which 12 mean 4kb(2^12 bytes = 4kb ). Let's see the laptop case first. You can see a peak at 2^14bytes' |
Codecov Report
@@ Coverage Diff @@
## master #6 +/- ##
==========================================
- Coverage 37.17% 36.49% -0.68%
==========================================
Files 6 6
Lines 269 274 +5
==========================================
Hits 100 100
- Misses 169 174 +5
Continue to review full report at Codecov.
|
Hello again!
I recently suffered from slow writing speed for 1.8Gb 2D-stacked tomography mrc file.
I figured out iterative array writing is the bottleneck, so I resolve this situation by utilizing vectorzied operations.
For benchmark, I put @time to every single lines of write method and compare the benchmark to my vectorizied code.
This is the test code
Here is the benchmark for 1.8G 2D-stacked tomography mrc file. I listed last of the three repeated test results:
I also have tested with http://ftp.rcsb.org/pub/emdb/structures/EMD-5778/map/emd_5778.map.gz: