Skip to content

Upstream zlib library not well optimized #53

Closed
@jakobnissen

Description

@jakobnissen

I was studying why FASTX.jl on gzipped FASTQ files was so slow in the benchmark at https://github.com/lh3/biofast. It appears the unzipping is at least 2x slower than whatever he uses for his C code.

I confirmed this by timing this code:

function bar(path)
	stream = GzipDecompressorStream(open(path))
	v = read(stream)
	close(stream)
	return length(v)
end

On a gzipped FASTQ file and comparing it with $ gunzip -dc my_file.fq.gz > test. Whereas the Julia code took around 2.1 seconds, the Bash code took 0.6 seconds. Profiling confirms that nearly 100% of the time is spent in the ccall to libz. Hence, it appears to be a problem with the binary itself.

Jaakko Ruohio on the Julia Slack suggested this might have something to do with how the Libz artifact is compiled. Perhaps it's missing some compile flags which makes it poorly optimized?

Update: JuliaPackaging/Yggdrasil#1051 sped up the Julia code by 2x, but it remains around 2x slower than both gunzip and C code that calls into zlib.h.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions