Skip to content

Data read with Zlib::GzipReader (but not copy of same data) slow to parse with JSON.parse #2193

@billdueber

Description

@billdueber

(first issue opened here; if I'm off the mark or in the wrong spot, my apologies and please let me know)

JSON-parsing data read from a gzipped file with Zlib::GzipReader is very slow. Parsing the same byte-for-byte data read from a non-gzipped file is not slow. Parsing a copy of the data read from a gzipped file is also not slow.

This gist gives a small program that creates a bunch of arrays-of-arrays and dumps json output into two files, one plain and one gzipped. It then reads those data back in via File.read or Zlib::GzipReader, additionally making a copy of the formerly-gzipped-data.

Note that I'm not parsing every time -- I'm just getting a copy of the data (as seen in this snippet from the gist):

plain_data = File.read(PLAIN)

previously_gzipped_data = Zlib::GzipReader.open(GZIP) do |f|
   f.read
end

forced_copy = previously_gzipped_data + " "

Oj doesn't show this same issue. I know benchmarking this stuff is treacherous, but the pattern seems pretty consistent.


truffleruby 20.3.0, like ruby 2.6.6, GraalVM CE JVM [x86_64-darwin]
Testing with an array of 1000 20-element arrays


BEGIN STDLIB JSON
                           user     system      total        real

Plain                      6.761722   0.218166   6.979888 (  1.597190)
Previously gzipped        15.290809   1.759818  17.050627 (  5.812477)
Gzipped/forced copy        0.112019   0.052004   0.164023 (  0.069019)

Plain                      0.060666   0.040497   0.101163 (  0.061761)
Previously gzipped         4.385712   1.001380   5.387092 (  4.677956)
Gzipped/forced copy        0.135513   0.002502   0.138015 (  0.062301)

Plain                      0.126698   0.004074   0.130772 (  0.067788)
Previously gzipped         4.375435   0.766456   5.141891 (  4.427382)
Gzipped/forced copy        0.055923   0.000771   0.056694 (  0.057266)


BEGIN Oj
                           user     system      total        real
Plain                      8.310956   0.346334   8.657290 (  1.903706)
Previously gzipped         0.849619   0.016991   0.866610 (  0.225177)
Gzipped/forced copy        0.753260   0.026104   0.779364 (  0.228311)
Plain                      0.540689   0.011607   0.552296 (  0.206072)
Previously gzipped         0.417603   0.008346   0.425949 (  0.195774)
Gzipped/forced copy        0.396726   0.013893   0.410619 (  0.206867)
Plain                      0.400436   0.010877   0.411313 (  0.218068)
Previously gzipped         0.421178   0.009404   0.430582 (  0.204174)
Gzipped/forced copy        0.34264

My original data was just a file on disk as gzipped with gzip, so I don't think this has anything to do with Zlib::GzipWriter -- that's just to make it a self-contained bug reproduction. I'm assuming that GzipReader is hanging on to the string in some way that makes things nasty?

-Bill-

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions