-
Notifications
You must be signed in to change notification settings - Fork 194
Description
(first issue opened here; if I'm off the mark or in the wrong spot, my apologies and please let me know)
JSON-parsing data read from a gzipped file with Zlib::GzipReader is very slow. Parsing the same byte-for-byte data read from a non-gzipped file is not slow. Parsing a copy of the data read from a gzipped file is also not slow.
This gist gives a small program that creates a bunch of arrays-of-arrays and dumps json output into two files, one plain and one gzipped. It then reads those data back in via File.read
or Zlib::GzipReader
, additionally making a copy of the formerly-gzipped-data.
Note that I'm not parsing every time -- I'm just getting a copy of the data (as seen in this snippet from the gist):
plain_data = File.read(PLAIN)
previously_gzipped_data = Zlib::GzipReader.open(GZIP) do |f|
f.read
end
forced_copy = previously_gzipped_data + " "
Oj doesn't show this same issue. I know benchmarking this stuff is treacherous, but the pattern seems pretty consistent.
truffleruby 20.3.0, like ruby 2.6.6, GraalVM CE JVM [x86_64-darwin]
Testing with an array of 1000 20-element arrays
BEGIN STDLIB JSON
user system total real
Plain 6.761722 0.218166 6.979888 ( 1.597190)
Previously gzipped 15.290809 1.759818 17.050627 ( 5.812477)
Gzipped/forced copy 0.112019 0.052004 0.164023 ( 0.069019)
Plain 0.060666 0.040497 0.101163 ( 0.061761)
Previously gzipped 4.385712 1.001380 5.387092 ( 4.677956)
Gzipped/forced copy 0.135513 0.002502 0.138015 ( 0.062301)
Plain 0.126698 0.004074 0.130772 ( 0.067788)
Previously gzipped 4.375435 0.766456 5.141891 ( 4.427382)
Gzipped/forced copy 0.055923 0.000771 0.056694 ( 0.057266)
BEGIN Oj
user system total real
Plain 8.310956 0.346334 8.657290 ( 1.903706)
Previously gzipped 0.849619 0.016991 0.866610 ( 0.225177)
Gzipped/forced copy 0.753260 0.026104 0.779364 ( 0.228311)
Plain 0.540689 0.011607 0.552296 ( 0.206072)
Previously gzipped 0.417603 0.008346 0.425949 ( 0.195774)
Gzipped/forced copy 0.396726 0.013893 0.410619 ( 0.206867)
Plain 0.400436 0.010877 0.411313 ( 0.218068)
Previously gzipped 0.421178 0.009404 0.430582 ( 0.204174)
Gzipped/forced copy 0.34264
My original data was just a file on disk as gzipped with gzip
, so I don't think this has anything to do with Zlib::GzipWriter
-- that's just to make it a self-contained bug reproduction. I'm assuming that GzipReader
is hanging on to the string in some way that makes things nasty?
-Bill-