Description
Describe the bug
I have a 266 MB zip file, from which I only need to extract a 1 kB file. The rest of the files in the archive are irrelevant at this stage in the program.
However, opening the zip file using ZipArchive::new(file)
takes about 7 seconds. It's a lot faster the second time round, because of Linux's filesystem cache.
I traced the root cause to Zip32CentralDirectoryEnd::find_and_parse
, which locates the "end of central directory record" very quickly at the end of the file, but then keeps scanning backwards through the entire file to find another one.
To Reproduce
Have a large zip file:
$ ls -lh archive.zip
-rw-r--r-- 1 thomas thomas 266M Aug 8 12:20 archive.zip
$ cargo build --release
$ echo 3 | sudo tee /proc/sys/vm/drop_caches # Flush filesystem cache (Linux only)
$ time target/release/repro
real 0m6.714s
user 0m0.560s
sys 0m1.293s
Use this as the main program:
fn main() {
let file = std::fs::File::open("archive.zip").unwrap();
let archive = zip::ZipArchive::new(file).unwrap();
}
Expected behavior
Extracting a single 1 kB file from a large archive should be possible quickly. unzip
can do it:
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time unzip -l archive.zip
Archive: archive.zip
Length Date Time Name
--------- ---------- ----- ----
...
--------- -------
1228949561 9 files
real 0m0.012s
user 0m0.005s
sys 0m0.000s
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time unzip archive.zip some_file.txt
Archive: archive.zip
inflating: some_file.txt
real 0m0.012s
user 0m0.000s
sys 0m0.005s
Version
zip 2.1.6. This is also happening in 2.1.4, but not in 2.1.3. I think cb2d7ab or 9bf914d is the cause, but I haven't dug deeper.