Description
I was studying why FASTX.jl on gzipped FASTQ files was so slow in the benchmark at https://github.com/lh3/biofast. It appears the unzipping is at least 2x slower than whatever he uses for his C code.
I confirmed this by timing this code:
function bar(path)
stream = GzipDecompressorStream(open(path))
v = read(stream)
close(stream)
return length(v)
end
On a gzipped FASTQ file and comparing it with $ gunzip -dc my_file.fq.gz > test
. Whereas the Julia code took around 2.1 seconds, the Bash code took 0.6 seconds. Profiling confirms that nearly 100% of the time is spent in the ccall
to libz
. Hence, it appears to be a problem with the binary itself.
Jaakko Ruohio on the Julia Slack suggested this might have something to do with how the Libz artifact is compiled. Perhaps it's missing some compile flags which makes it poorly optimized?
Update: JuliaPackaging/Yggdrasil#1051 sped up the Julia code by 2x, but it remains around 2x slower than both gunzip
and C code that calls into zlib.h
.