Skip to content

Regression: opening large zip files is slow since 2.1.4 because the entire file is scanned #231

Closed
@ttencate

Description

@ttencate

Describe the bug

I have a 266 MB zip file, from which I only need to extract a 1 kB file. The rest of the files in the archive are irrelevant at this stage in the program.

However, opening the zip file using ZipArchive::new(file) takes about 7 seconds. It's a lot faster the second time round, because of Linux's filesystem cache.

I traced the root cause to Zip32CentralDirectoryEnd::find_and_parse, which locates the "end of central directory record" very quickly at the end of the file, but then keeps scanning backwards through the entire file to find another one.

To Reproduce

Have a large zip file:

$ ls -lh archive.zip
-rw-r--r-- 1 thomas thomas 266M Aug  8 12:20 archive.zip
$ cargo build --release
$ echo 3 | sudo tee /proc/sys/vm/drop_caches  # Flush filesystem cache (Linux only)
$ time target/release/repro
real	0m6.714s
user	0m0.560s
sys	0m1.293s

Use this as the main program:

fn main() {
    let file = std::fs::File::open("archive.zip").unwrap();
    let archive = zip::ZipArchive::new(file).unwrap();
}

Expected behavior

Extracting a single 1 kB file from a large archive should be possible quickly. unzip can do it:

$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time unzip -l archive.zip
Archive:  archive.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
...
---------                     -------
1228949561                     9 files

real	0m0.012s
user	0m0.005s
sys	0m0.000s
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time unzip archive.zip some_file.txt
Archive:  archive.zip
  inflating: some_file.txt           

real	0m0.012s
user	0m0.000s
sys	0m0.005s

Version

zip 2.1.6. This is also happening in 2.1.4, but not in 2.1.3. I think cb2d7ab or 9bf914d is the cause, but I haven't dug deeper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions