-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very different compression times on different machines #125
Comments
Hi @thebioengineer, thanks a lot for sharing your benchmarks! At the moment, I have set the minimum blocksize of a compression job at 16kB to keep a reasonable compression ratio. That means that for your sample, at most 4 threads are used for compression. That would explain that you are not seeing any significant speedup when switching to the docker instance. Further, the default compression settings of the df = data.frame(matrix(runif(100000), ncol = 10))
df_raw <- serialize(df, NULL)
timings <- microbenchmark::microbenchmark(
fst::compress_fst(df_raw, "LZ4")
)
# Speed in GB/s
as.numeric(object.size(df_raw)) / median(timings$time)
#> [1] 1.014846 Here, the sample size is increased with a factor of 10 so that you can fully utilize your available cores. I'm curious to see what that benchmark would do on your systems! thanks for your report and testing |
Thanks @MarcusKlik for getting back to me. Whoops!, I mixed up which algorithm was the default. I will rerun and post the times/speeds. However, the main reason I posted was my concern with the large difference in times between my local machine and docker using compress_fst; 869.92 to 46155.66 microseconds. The other compression slowed down too on the docker, but not nearly as drastically. Have you found that the time of compressing the same object can vary that much across machines? I will post the compression speed shortly, but I wanted to reiterate the reason I posted. |
Hi @thebioengineer, I'm glad you did, the numbers made me flinch there for a moment :-). fst:::hasopenmp()
#> [1] TRUE If you have that and you are not swapping memory than at the moment I could think of no reason why the compression would slow down that much. Are you using one of the standard Docker containers (perhaps I can reproduce your result)? thanks |
The speeds as you requested, calculated based on your code :)
I just ran hasopenmp(), and it returned TRUE, and we aren't swapping memory. The docker container I am using is based on the rocker/rstudio image. I just checked with the base image of that, and speeds are around 0.001291902, with the median time being 62.20907 milliseconds. Thanks for looking into this! |
Hi @thebioengineer, your issue could also be related to #112. It looks like RStudio Server is forking the session sometimes (for example upon re-entering the session), and that triggers You can also check the number of active threads with: fst::threads_fst()
#> [1] 8 You could also 'force' the number of threads actively before running your benchmark code with: fst::threads_fst(32)
#> [1] 8 (but that doesn't change the fact that the code runs very slowly, even if Thanks a lot for testing this! |
Yes, what I sent to you was from exec-ing into the docker container and running as root. I just checked this morning, and fts::threads_fst() is returning 32 still. Thank you so much for your investigation into this! |
OK, thanks for checking that. I will try to reproduce the error and get back to you on that! |
Hey Guys! I recently found this package, and have been using the compression functions to compress and decompress raw vectors for me. I had been doing most of my work on a local machine, but recently I moved my code to a docker container running linux and noticed a rather large slowdown in the compression speeds. I thought that it was interesting considering that I moved the work from a 4 core windows machine to the docker container that has full access to 32 cores and faster/much more ram. Since the issue is with different machines, I took screen shots of the results of the two R environments running the same code.
I also included the lz4 package (https://github.com/bwlewis/lz4) which supposedly is implementing the same lz4 compression algorithm as another comparison that does not slow down anywhere near as drastically.
Local Windows machine:
Docker Container on Server:
Not sure if this is a bug, expected, or what, but thought I would let you all know!
This is the code I used on both:
The text was updated successfully, but these errors were encountered: