Skip to content

Commit

Permalink
patterns/zip: Change how the end of central directory record is found (
Browse files Browse the repository at this point in the history
…WerWolv#60)

Previously, the zip pattern was searching for the end-of-central-directory
header signature (50 4B 05 06) by searching the entire file for it. This is
*very* slow for large files, and risks false positives since those bytes
could randomly appear in compressed data. I had this happen on the first
large (>2GB) zip file I tried.

I'm now checking for the EOCD signature at exactly 22 bytes from the end of
the file (in the common case there is no zip comment), and if that fails
I search for it in the last 64KB of the file (in case there *is* a comment
at the end of the EOCD, which can't be larger than 64KB). This is much
faster, and fixes loading my zip file where it was spuriously finding the
signature in the wrong place.

This still has a low risk of false positives (what if the comment has the
50 4B 05 06 bytes? what if there is a short comment but the signature
appears in the last 64KB of compressed data?), but I don't know what's the
"right" way to find the EOCD, or how proper zip-reading tools handle the
ambiguity...
  • Loading branch information
nicolas17 authored Nov 19, 2022
1 parent fbb6a84 commit 01a1bd0
Showing 1 changed file with 19 additions and 1 deletion.
20 changes: 19 additions & 1 deletion patterns/zip.hexpat
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#pragma MIME application/zip

#include <std/mem.pat>
#include <std/math.pat>

struct EndOfCentralDirectory {
u32 headerSignature [[color("00000000")]];
Expand All @@ -14,7 +15,24 @@ struct EndOfCentralDirectory {
char coment[commentLength] [[name("Comment")]];
};

EndOfCentralDirectory fileInfo @ std::mem::find_sequence(0,0x50,0x4B,0x05,0x06) [[name("End of Central Directory Record")]];
fn find_eocd() {
// If there is no zip comment, which is the common case,
// the end-of-central-directory record will be 22 bytes long
// at the end of the file; check if size-22 has the signature.
if (std::mem::read_unsigned(std::mem::size()-22, 4, std::mem::Endian::Little) == 0x06054B50) {
return std::mem::size()-22;
} else {
// If it's not there, then there's probably a zip comment;
// search the last 64KB of the file for the signature.
// This is not entirely reliable, since the signature could
// randomly appear in compressed data before the actual EOCD,
// but it should be good enough...
u128 last64k = std::math::max(0, std::mem::size()-65536-22);
return std::mem::find_sequence_in_range(0, last64k, std::mem::size(), 0x50,0x4B,0x05,0x06);
}
};

EndOfCentralDirectory fileInfo @ find_eocd() [[name("End of Central Directory Record")]];

struct CentralDirectoryFileHeader {
u32 headerSignature [[color("00000000")]];
Expand Down

0 comments on commit 01a1bd0

Please sign in to comment.