Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very different compression times on different machines #125

Open
thebioengineer opened this issue Jan 18, 2018 · 7 comments
Open

Very different compression times on different machines #125

thebioengineer opened this issue Jan 18, 2018 · 7 comments
Assignees
Labels
Milestone

Comments

@thebioengineer
Copy link

Hey Guys! I recently found this package, and have been using the compression functions to compress and decompress raw vectors for me. I had been doing most of my work on a local machine, but recently I moved my code to a docker container running linux and noticed a rather large slowdown in the compression speeds. I thought that it was interesting considering that I moved the work from a 4 core windows machine to the docker container that has full access to 32 cores and faster/much more ram. Since the issue is with different machines, I took screen shots of the results of the two R environments running the same code.

I also included the lz4 package (https://github.com/bwlewis/lz4) which supposedly is implementing the same lz4 compression algorithm as another comparison that does not slow down anywhere near as drastically.

Local Windows machine:
local_session

Docker Container on Server:
docker_session

Not sure if this is a bug, expected, or what, but thought I would let you all know!

This is the code I used on both:

library(fst)
library(lz4)
library(microbenchmark)

set.seed(9917)
sampleObject<-data.frame(matrix(runif(10000),ncol=10))
serializedObject<-serialize(sampleObject,NULL)

microbenchmark(
	fstCompression<-compress_fst(serializedObject),
	lz4Compression<-lzCompress(serializedObject),
	baseCompression<-memCompress(serializedObject,"bzip2"))
	
sessionInfo()
@MarcusKlik
Copy link
Collaborator

MarcusKlik commented Jan 18, 2018

Hi @thebioengineer, thanks a lot for sharing your benchmarks! At the moment, I have set the minimum blocksize of a compression job at 16kB to keep a reasonable compression ratio. That means that for your sample, at most 4 threads are used for compression. That would explain that you are not seeing any significant speedup when switching to the docker instance.

Further, the default compression settings of the compress_fst method are set to the ZSTD compressor at a level of 0. ZSTD is a stronger but slower compressor (from the same author) as you can probably see from the compression ratio's. For comparing with the LZ4 compressor in fst you could use:

df = data.frame(matrix(runif(100000), ncol = 10))
df_raw <- serialize(df, NULL)

timings <- microbenchmark::microbenchmark(
  fst::compress_fst(df_raw, "LZ4")
)

# Speed in GB/s
as.numeric(object.size(df_raw)) / median(timings$time)
#> [1] 1.014846

Here, the sample size is increased with a factor of 10 so that you can fully utilize your available cores. I'm curious to see what that benchmark would do on your systems!

thanks for your report and testing

@thebioengineer
Copy link
Author

Thanks @MarcusKlik for getting back to me. Whoops!, I mixed up which algorithm was the default. I will rerun and post the times/speeds.

However, the main reason I posted was my concern with the large difference in times between my local machine and docker using compress_fst; 869.92 to 46155.66 microseconds. The other compression slowed down too on the docker, but not nearly as drastically. Have you found that the time of compressing the same object can vary that much across machines?

I will post the compression speed shortly, but I wanted to reiterate the reason I posted.

@MarcusKlik
Copy link
Collaborator

Hi @thebioengineer, I'm glad you did, the numbers made me flinch there for a moment :-).
But it should definitely not slow down with a factor that large, very strange. I see you are using the CRAN version, perhaps you could check the availability of OpenMP on your Docker system?

fst:::hasopenmp()
#> [1] TRUE

If you have that and you are not swapping memory than at the moment I could think of no reason why the compression would slow down that much. Are you using one of the standard Docker containers (perhaps I can reproduce your result)?

thanks

@thebioengineer
Copy link
Author

thebioengineer commented Jan 19, 2018

The speeds as you requested, calculated based on your code :)
as.numeric(object.size(serializedObject))/median(microbenchmark$time)

Method Local Machine Docker Container
compress_fst(serializedObject,"LZ4") 0.7069731 0.001799747
memCompress(serializedObject,"bzip2") 0.01209687 0.006614682

I just ran hasopenmp(), and it returned TRUE, and we aren't swapping memory.

The docker container I am using is based on the rocker/rstudio image. I just checked with the base image of that, and speeds are around 0.001291902, with the median time being 62.20907 milliseconds.

Thanks for looking into this!

@MarcusKlik
Copy link
Collaborator

Hi @thebioengineer, your issue could also be related to #112. It looks like RStudio Server is forking the session sometimes (for example upon re-entering the session), and that triggers fst to switch back to 1 thread to avoid other problems that have been reported (on the data.table repository). Are you getting identical results if you run your benchmark on the Docker instance itself (so not with the RStudio server interface in between)?

You can also check the number of active threads with:

fst::threads_fst()
#> [1] 8

You could also 'force' the number of threads actively before running your benchmark code with:

fst::threads_fst(32)
#> [1] 8

(but that doesn't change the fact that the code runs very slowly, even if fst is using only a single thread it should be much faster)

Thanks a lot for testing this!

@thebioengineer
Copy link
Author

Yes, what I sent to you was from exec-ing into the docker container and running as root. I just checked this morning, and fts::threads_fst() is returning 32 still.

Thank you so much for your investigation into this!

@MarcusKlik
Copy link
Collaborator

OK, thanks for checking that. I will try to reproduce the error and get back to you on that!

@MarcusKlik MarcusKlik added this to the fst v0.8.6 milestone Jan 19, 2018
@MarcusKlik MarcusKlik self-assigned this Jan 19, 2018
@MarcusKlik MarcusKlik added the bug label Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants