Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress: add a deflate compressor #10552

Merged
merged 2 commits into from
Jan 26, 2022
Merged

compress: add a deflate compressor #10552

merged 2 commits into from
Jan 26, 2022

Conversation

hdorio
Copy link
Contributor

@hdorio hdorio commented Jan 9, 2022

Replaces the inflate API from inflateStream(reader: anytype, window_slice: []u8) to
decompressor(allocator: mem.Allocator, reader: anytype, dictionary: ?[]const u8) and
compressor(allocator: mem.Allocator, writer: anytype, options: CompressorOptions)

This is a translation of the Go code of the compress/flate package from
https://go.googlesource.com/go/+/refs/tags/go1.17/src/compress/flate/

#213

@squeek502
Copy link
Collaborator

squeek502 commented Jan 10, 2022

I assume you're going to need a lot more reasoning (benchmarks? use-cases? bugs in the old implementation?) for wholesale replacing the previous inflate implementation (note also that the previous inflate implementation was recently fuzzed in order to affirm that it matches puff.c).

In fact, it'd probably make much more sense to split this PR into a compression-only PR and a decompression-only PR.

@hdorio
Copy link
Contributor Author

hdorio commented Jan 10, 2022

you're going to need a lot more reasoning

Not really, any puff.c based implementation should be replaced.

puff.c is a simple inflate written to be an unambiguous way to specify the
deflate format.  It is not written for speed but rather simplicity.

benchmarks?

You mean like this one? 41f244b
They are not really useful in my opinion but if I must: https://gist.github.com/hdorio/5e4f26150198a7515d89f01be5e0a45f

// outputs
//
// deflate_old: 3.982s
// deflate_new: 2.08s

The Go implementation is based off https://github.com/madler/zlib/raw/master/doc/algorithm.txt .

I invite you to do your own benchmarks with your own sample data, just keep in mind this new implementation does a lot of small reads for the decompression, so don't pass in a file reader directly, use a buffer or read the whole input into memory to avoid lots of small fread() system calls.

@squeek502
Copy link
Collaborator

squeek502 commented Jan 11, 2022

I'm not able to run your benchmark with the latest commit of this branch, did something in the compressor change?

EDIT: Nevermind, found the problem, line 40 is using file.writer() when it should be using comp.writer():

https://gist.github.com/hdorio/5e4f26150198a7515d89f01be5e0a45f#file-decompress-zig-L40

Stacktraces from the broken benchmark program

(note: I'm running with ReleaseSafe just to get a stacktrace):

$ zig run bench.zig -O ReleaseSafe
error: InvalidStoredSize
/home/ryan/Programming/zig/zig/lib/std/os.zig:0:5: 0x2176ee in main (bench)
/home/ryan/Programming/zig/zig/lib/std/fs.zig:2055:9: 0x2176f6 in main (bench)
        return result;
        ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:2036:9: 0x2176fe in main (bench)
        return self.accessZ(&path_c, flags);
        ^
/home/ryan/Programming/zig/zig/lib/std/compress/deflate/decompressor.zig:0:37: 0x21a514 in main (bench)
/home/ryan/Programming/zig/zig/lib/std/compress/deflate_old.zig:0:34: 0x218ed3 in main (bench)
/home/ryan/Programming/zig/zig/lib/std/compress/deflate_old.zig:668:17: 0x218edb in main (bench)
                try self.step();
                ^
/home/ryan/Programming/zig/zig/lib/std/io/reader.zig:26:13: 0x218ee3 in main (bench)
            return readFn(self.context, buffer);
            ^
/home/ryan/Programming/zig/zig/lib/std/os/linux.zig:0:17: 0x219135 in main (bench)
/home/ryan/Programming/zig/zig/lib/std/Thread/Mutex.zig:0:35: 0x21a415 in main (bench)

and with the old deflate commented out, I still get an error when decompressing:

$ zig run bench.zig -O ReleaseSafe
error: CorruptInput
/home/ryan/Programming/zig/zig/lib/std/compress/deflate/decompressor.zig:0:0: 0x224d2f in std.compress.deflate.decompressor.Decompressor(std.io.reader.Reader(*std.io.fixed_buffer_stream.FixedBufferStream([]u8),std.io.fixed_buffer_stream.ReadError,std.io.fixed_buffer_stream.FixedBufferStream([]u8).read)).nextBlock (bench)
/home/ryan/Programming/zig/zig/lib/std/compress/deflate/decompressor.zig:0:0: 0x224d37 in std.compress.deflate.decompressor.Decompressor(std.io.reader.Reader(*std.io.fixed_buffer_stream.FixedBufferStream([]u8),std.io.fixed_buffer_stream.ReadError,std.io.fixed_buffer_stream.FixedBufferStream([]u8).read)).nextBlock (bench)
/home/ryan/Programming/zig/zig/lib/std/compress/deflate/decompressor.zig:0:38: 0x2082a7 in std.start.posixCallMainAndExit (bench)
/home/ryan/Programming/zig/zig/lib/std/io/reader.zig:26:13: 0x2087e7 in std.start.posixCallMainAndExit (bench)
            return readFn(self.context, buffer);
            ^
/home/ryan/Programming/zig/tmp/deflate-bench/bench.zig:0:0: 0x206808 in std.start.posixCallMainAndExit (bench)
/home/ryan/Programming/zig/tmp/deflate-bench/bench.zig:0:0: 0x206481 in std.start.posixCallMainAndExit (bench)

@squeek502
Copy link
Collaborator

squeek502 commented Jan 11, 2022

Here are the results I'm getting with a modified version of your benchmark to try a few different sizes (sizes shown are the uncompressed sizes):

$ zig run bench.zig -O ReleaseFast

size: 100B
deflate_old: 20.459ms
deflate_new: 38.101us
new is 536.98x faster

size: 1KiB
deflate_old: 30.71us
deflate_new: 15.159us
new is 2.03x faster

size: 1MiB
deflate_old: 2.824ms
deflate_new: 916.366us
new is 3.08x faster

size: 10MiB
deflate_old: 28.298ms
deflate_new: 8.246ms
new is 3.43x faster

size: 22MiB
deflate_old: 62.643ms
deflate_new: 24.133ms
new is 2.60x faster

size: 100MiB
deflate_old: 288.08ms
deflate_new: 107.465ms
new is 2.68x faster

size: 220MiB
deflate_old: 628.471ms
deflate_new: 232.427ms
new is 2.70x faster
The modified benchmark code
const std = @import("std");
const fmt = std.fmt;
const io = std.io;
const math = std.math;

const Allocator = std.mem.Allocator;
const GeneralPurposeAllocator = std.heap.GeneralPurposeAllocator;
const Timer = std.time.Timer;

const deflate = std.compress.deflate;
const compressor = deflate.compressor;
const decompressor = deflate.decompressor;

const old_deflate = @import("old_deflate.zig");

const compressed_filename = "data.deflate";

pub fn create(allocator: Allocator, uncompressed_size: usize) !void {
    const file = try std.fs.cwd().createFile(compressed_filename, .{ .read = true });
    defer file.close();

    var comp = try compressor(allocator, file.writer(), deflate.default_compression, null);
    defer comp.deinit();

    const writer = std.io.bufferedWriter(comp.writer()).writer();
    var i: usize = 0;
    while (i < uncompressed_size) : (i += 1) {
        try writer.writeByte(@truncate(u8, @truncate(u16, i) *| i));
    }
    try comp.close();
}

pub fn deflate_old(allocator: Allocator) !void {
    const file = try std.fs.cwd().openFile(compressed_filename, .{ .read = true });
    defer file.close();
    var data = try file.reader().readAllAlloc(allocator, math.maxInt(usize));
    defer allocator.free(data);
    var fb = io.fixedBufferStream(data);

    var window: [0x8000]u8 = undefined;
    var inflate = old_deflate.inflateStream(fb.reader(), &window);

    const output = try std.fs.cwd().createFile("out.bin", .{ .read = true });
    defer output.close();

    var buf = [1]u8{0} ** (1 << 15);
    var r: usize = 1;
    while (r > 0) {
        r = try inflate.reader().read(&buf);
        try output.writer().writeAll(buf[0..r]);
    }
}

pub fn deflate_new(allocator: Allocator) !void {
    const file = try std.fs.cwd().openFile(compressed_filename, .{ .read = true });
    defer file.close();
    var data = try file.reader().readAllAlloc(allocator, math.maxInt(usize));
    defer allocator.free(data);
    var fb = io.fixedBufferStream(data);

    var decomp = try decompressor(allocator, fb.reader(), null);
    defer decomp.deinit();

    const output = try std.fs.cwd().createFile("out.bin", .{ .read = true });
    defer output.close();

    var buf = [1]u8{0} ** (1 << 15);
    var r: usize = 1;
    while (r > 0) {
        r = try decomp.reader().read(&buf);
        try output.writer().writeAll(buf[0..r]);
    }
}

pub fn main() !void {
    var gpa = GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    var allocator = gpa.allocator();

    const file_sizes = &[_]usize{
        100,
        1024,
        1024 * 1024,
        10 * 1024 * 1024,
        22 * 1024 * 1024,
        100 * 1024 * 1024,
        220 * 1024 * 1024,
    };

    for (file_sizes) |file_size| {
        std.debug.print("\nsize: {}\n", .{std.fmt.fmtIntSizeBin(file_size)});
        try create(allocator, file_size);

        var timer = try Timer.start();
        var start: usize = 0;
        var stop: usize = 0;

        start = timer.lap();
        try deflate_old(allocator);
        stop = timer.lap();
        const old_duration = stop - start;
        std.debug.print("deflate_old: {}\n", .{fmt.fmtDuration(old_duration)});

        start = timer.lap();
        try deflate_new(allocator);
        stop = timer.lap();
        const new_duration = stop - start;
        std.debug.print("deflate_new: {}\n", .{fmt.fmtDuration(new_duration)});

        const faster = if (old_duration > new_duration) "new" else "old";
        const diff: f64 = @intToFloat(f64, std.math.max(old_duration, new_duration)) / @intToFloat(f64, std.math.min(old_duration, new_duration));
        std.debug.print("{s} is {d:.2}x faster\n", .{ faster, diff });
    }
}

Copy link
Member

@ifreund ifreund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get very far through the code review, just some superficial style fixes and general thoughts.

The benchmarks certainly support the fact that this implementation is faster than the status quo. However, I don't think that justifies removing the current implementation from the standard library. There other aspects of code that can be important beyond just speed. The full text of the comment from puff.c you referenced touches on this:

puff.c is a simple inflate written to be an unambiguous way to specify the
deflate format.  It is not written for speed but rather simplicity.  As a
side benefit, this code might actually be useful when small code is more
important than speed, such as bootstrap applications.  For typical deflate
data, zlib's inflate() is about four times as fast as puff().  zlib's
inflate compiles to around 20K on my machine, whereas puff.c compiles to
around 4K on my machine (a PowerPC using GNU cc).  If the faster decode()
function here is used, then puff() is only twice as slow as zlib's
inflate().

All dynamically allocated memory comes from the stack.  The stack required
is less than 2K bytes.  This code is compatible with 16-bit int's and
assumes that long's are at least 32 bits.  puff.c uses the short data type,
assumed to be 16 bits, for arrays in order to conserve memory.  The code
works whether integers are stored big endian or little endian.

To add on to that, simplicity is also highly desirable when code must be strictly audited.

Anyhow, I don't think removing the current implementation from the standard library is the right call here, perhaps there is room for both. This would also be nice as we could test them against each other.

@hdorio
Copy link
Contributor Author

hdorio commented Jan 11, 2022

perhaps there is room for both.

Sure, where should I put the slow one? Is compress/deflate/inflate_stream.zig ok?
Which one should be used by zlib.zig and gzip.zig?

@hdorio hdorio requested a review from ifreund January 11, 2022 17:48
@andrewrk
Copy link
Member

andrewrk commented Jan 11, 2022

Hi @hdorio! Thank you for providing this port. And thanks @squeek502 for the benchmark data.

Based on this data, it is clear that there is room for a more sophisticated implementation such as this. However, I also agree with @ifreund that there is room for a simpler implementation that requires less machine code, at the cost of performance - especially if the simpler implementation does not require dynamic heap memory allocation.

Before I take the time to do a detailed code review, let's figure out the higher level organization:

We have the concept of ReleaseSmall which can be used to select the puff implementation rather than the go port using conditional compilation, like this:

const builtin = @import("builtin");
const Deflate = switch (builtin.mode) {
    .ReleaseSmall => SimpleDeflate,
    else => FastDeflate,
};

There are two issues with this:

  1. For the corresponding Debug build, when one intends to use ReleaseSmall, we would still probably prefer to use the same implementation rather than a different one, however that intention is not collected from the user, so it would end up using one implementation in Debug and another in ReleaseSmall. This is a separate issue that I will handle; not something I expect this PR to address.
  2. This requires the APIs to be identical, the only difference being the implementations. As I understand it, there is an important difference, however, which is that one implementation requires an allocator parameter, while the other one does not. This interface incompatibility makes it not possible to trivially switch implementations like this.

@hdorio what do you think? Can the ported go code be adjusted to require a constant amount of memory? Or perhaps could the scratch memory buffer be provided by the caller for both implementations, thereby unifying the two interfaces?

To answer to your questions directly:

Sure, where should I put the slow one? Is compress/deflate/inflate_stream.zig ok?

First we need names for them. One suggestion for their names:

  • puff - the implementation based on puff.c
  • goflate - the implementation ported from go std lib

Or perhaps they can be named after their desired characteristics rather than tying them to particular implementations:

  • simple
  • fast

Once we have names for them, we can decide their fully qualified names:

  • std.compress.deflate.puff (or std.compress.deflate.simple)
  • std.compress.deflate.goflate (or std.compress.deflate.fast)

After looking at them for a couple minutes I think I prefer the latter naming style. Anyway, that tells you the file paths then:

  • compress/deflate/simple/inflate_stream.zig
  • compress/deflate/fast/inflate_stream.zig

Although I will note that probably the stream code can be shared between them and does not need to be implemented twice.

Which one should be used by zlib.zig and gzip.zig?

They should both use std.compress.deflate, which should do conditional compilation similar to the pseudocode above, to choose the fast or simple implementation based on the selected optimization mode.

@hdorio
Copy link
Contributor Author

hdorio commented Jan 12, 2022

@andrew thanks for taking some time on this

At first, your idea seemed good to me but after some preliminary tests, I would advise against putting the slow implementation under -OReleaseSmall.
because

  • The gain in size is not clear.
  • The performance gap is too big. As a user I don't expect ReleaseSmall to be so aggressive about memory usage for such poor performance.
  • ReleaseSmall does not say anything about fixed/dynamic heap memory allocation or how the memory is used. (currently only says: (Medium runtime performance, Safety checks disabled, Slow compilation speed, Small binary size, Reproducible build)

I think if we want to keep the slow one, we should let the user choose between the two.
Or optimize further the slow one to have a real gain in binary size.
Or add what ReleaseSmall is about, about memory allocation.

The gain in size is not clear

When both implementations use an allocator (for reading the whole file in memory) the code size of the fast implementation is smaller of ~3k (22984), when an allocator is not used for the slow one, the slow one is smaller of 2k (20144), when a custom allocator is used for the fast one, the fast one is smaller of 3k (17032).
So it seems the code for how the allocator is implemented is the problem.

  binaries sizes (-OReleaseSmall)
zig build-exe -OReleaseSmall

.strip = strip command was used
.zigstrip = zig build-exe --strip
.custom-alloc = use a fixed buffer allocator of 3mb from a stack buffer
.no-alloc = don't preload the file into memory
.buffered-reader = use the buffered-reader from std to read the file
.new = fast implementation
.old = slow implementation

-rwxr-xr-x  252584 decompress.new
-rwxr-xr-x  238312 decompress.old
-rwxr-xr-x   22984 decompress.strip.new
-rwxr-xr-x   26640 decompress.strip.old
-rwxr-xr-x   30568 decompress.zigstrip.new
-rwxr-xr-x   32848 decompress.zigstrip.old
-rwxr-xr-x   17032 decompress.zigstrip.strip.custom-alloc.new
-rwxr-xr-x   22712 decompress.zigstrip.strip.new
-rwxr-xr-x  128480 decompress.zigstrip.strip.no-alloc.buffered-reader.old
-rwxr-xr-x   20144 decompress.zigstrip.strip.no-alloc.old
-rwxr-xr-x   26240 decompress.zigstrip.strip.old

The performance gap

decompress.zigstrip.strip.no-alloc.buffered-reader.old vs decompress.zigstrip.strip.custom-alloc.new

I'm sure the slow one is perfect for some environments where memory constraints are very harsh. 348 kbytes (slow) vs 3120 kbytes (fast) that's very good but the time for a 660mb file of 1m2.166s (slow) vs 23.012s (new) is not going from good to medium runtime performance to me but from ok to poor (it's subjective). Here's the thing... the fast implementation is not that good. I expect someone to do much better than this, like around 2x the current inflate speed. And if the zlib documentation is correct, I expect someone to do this for a constant amount of memory of 40kb instead of around 3mb for mine.

  Maximum resident set size (time -v)
calgary$ /usr/bin/time -v ./decompress.zigstrip.strip.old
deflate_old: 9.447s
	Command being timed: "./decompress.zigstrip.strip.old"
	User time (seconds): 6.86
	System time (seconds): 1.14
	Percent of CPU this job got: 84%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.44
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1106112
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 578963
	Voluntary context switches: 59
	Involuntary context switches: 865
	Swaps: 0
	File system inputs: 0
	File system outputs: 1360568
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
calgary$ /usr/bin/time -v ./decompress.zigstrip.strip.new
deflate_new: 5.531s
	Command being timed: "./decompress.zigstrip.strip.new"
	User time (seconds): 2.60
	System time (seconds): 1.43
	Percent of CPU this job got: 72%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.53
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1106104
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 598491
	Voluntary context switches: 75
	Involuntary context switches: 417
	Swaps: 0
	File system inputs: 0
	File system outputs: 1360568
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
	
	
calgary$ /usr/bin/time -v ./decompress.zigstrip.strip.no-alloc.old 
deflate_old: 4m446.975ms
	Command being timed: "./decompress.zigstrip.strip.no-alloc.old"
	User time (seconds): 114.90
	System time (seconds): 125.30
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 4:00.44
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 352
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 38
	Voluntary context switches: 78
	Involuntary context switches: 23703
	Swaps: 0
	File system inputs: 0
	File system outputs: 1360568
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
	
calgary$ /usr/bin/time -v ./decompress.zigstrip.strip.no-alloc.buffered-reader.old
deflate_old: 1m2.166s
	Command being timed: "./decompress.zigstrip.strip.no-alloc.buffered-reader.old"
	User time (seconds): 60.80
	System time (seconds): 0.67
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.16
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 348
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 42
	Voluntary context switches: 50
	Involuntary context switches: 6079
	Swaps: 0
	File system inputs: 0
	File system outputs: 1360568
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
	
calgary$ /usr/bin/time -v ./decompress.zigstrip.strip.custom-alloc.new
deflate_new: 23.012s
	Command being timed: "./decompress.zigstrip.strip.custom-alloc.new"
	User time (seconds): 7.77
	System time (seconds): 15.00
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.01
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3120
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 785
	Voluntary context switches: 47
	Involuntary context switches: 2272
	Swaps: 0
	File system inputs: 0
	File system outputs: 1360568
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

unifying the two interfaces

@hdorio what do you think? Can the ported go code be adjusted to require a constant amount of memory?

Last time I tried, I failed. But It seems it's technically possible, zlib uses the same algorithm and end up with this
https://zlib.net/zlib_tech.html

inflate memory usage (bytes) = (1 << windowBits) + 7 KB
Typically, therefore, inflate() requires no more than 40 KB of storage on a 64-bit machine. This includes the 32768-byte sliding window and approximately 7 KB for the inflated data structure.

Note that the 7KB is dynamic memory.

Or perhaps could the scratch memory buffer be provided by the caller for both implementations, thereby unifying the two interfaces?

I don't know how much memory would be needed for the scratch memory buffer, this depends on the stream being decompressed.
To the best of my knowledge, it's not possible to have a decent decompressor without dynamic memory allocation.

Also the slow one would need to be modified to be able to take a dictionary.

bikeshedding

  • simple
  • fast

I also prefer simple / fast but what is considered fast today may be considered slow tomorrow.
As I said, I expect someone to do much better than the one in this pull request.

  • compress/deflate/simple/inflate_stream.zig
  • compress/deflate/fast/inflate_stream.zig

The whole shrink/unshrink, implode/explode, deflate/inflate trend is cute but as a non-native english speaker I would much prefer to have to remember the name of the algorithm (deflate) then choose to decompress_stream or compress_stream instead of remembering the opposite of 'deflate' is 'inflate'.

@sreehax
Copy link
Contributor

sreehax commented Jan 12, 2022

I think it's fair to assume most people would want a faster implementation so it makes sense to add it to the standard library. Small code size is still more important than "speed" in some cases so the original implementation should still be kept around. imo the best way to go about with this would be to name them different things like "puff" and "goflate" or "fast" and "small" or whatever and let the user decide which one to use.

@hdorio
Copy link
Contributor Author

hdorio commented Jan 13, 2022

Small code size is still more important than "speed" in some cases so the original implementation should still be kept around.

Unless I misunderstand what "small code size" means and unless my tests were wrong, the original is not smaller.
We are talking about a difference of 260 lines of source code which result in more machine code in some cases for the original.

https://gist.github.com/hdorio/17332012563cfa022622d26ef63b095b

@hdorio hdorio requested a review from andrewrk January 13, 2022 07:02
@andrewrk
Copy link
Member

Thanks for the follow-up, @hdorio! Please be patient with me 🙏 I took a vacation last week and now we are swimming in PRs :-)

@hdorio hdorio force-pushed the deflate branch 3 times, most recently from ae223a4 to 2899498 Compare January 19, 2022 01:19
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience on getting a review, @hdorio. This is some really nice work that you have done. Overall it looks quite reasonable.

I have a few requested changes, along with a bit of a spicy rant against Go - please take it in a lighthearted manner :-)

With all of these requests addressed, I would be satisfied with the code and it can be merged, entirely replacing the puff.c port as the only implementation of deflate in the Zig std lib.

I see some opportunities for further optimizations, but that is a separate concern, and need not distract us from this improvement that you have provided.

@hdorio hdorio force-pushed the deflate branch 2 times, most recently from b1e59f9 to 0f85096 Compare January 21, 2022 12:04
@hdorio hdorio requested a review from andrewrk January 21, 2022 14:29
@squeek502
Copy link
Collaborator

squeek502 commented Jan 22, 2022

Did a bit of fuzzing of the decompressor (fuzzer implementations here as deflate-new and deflate-new-puff).

Findings:

  • Certain inputs can lead to an infinite loop when decompressing: infinite-loops.zip
  • Certain inputs do not give an error, but puff.c gives error.EndOfStream: errors.zip

Besides that, everything seems good fuzzing-wise (not too comprehensive, though, just a few hours of fuzzing). Haven't tried fuzzing the compressor yet.

@hdorio
Copy link
Contributor Author

hdorio commented Jan 22, 2022

Thanks a lot @squeek502 👍

  • Certain inputs can lead to an infinite loop
  • Certain inputs do not give an error

This error was introduced by me and is a drift from the Go code.
Go stops with an error unexpected EOF on all tests from infinite-loops.zip and from errors.zip.

I modified the Zig code to have the same behavior, all tests output an error now, like Go.
I added a test in decompressor.zig:1063 (fuzzing section).

  inflate with Go
package main

import (
	"compress/flate"
	"fmt"
	"io"
	"log"
	"os"
)

func main() {
	file := os.Args[1]

	inputFile, err := os.Open(file)
	if err != nil {
		log.Fatal(err)
	}

	decomp := flate.NewReader(inputFile)

	fmt.Println(" extracting", file)
	output, err := os.Create("out.bin")
	if err != nil {
		log.Fatal(err)
	}
	_, err = io.Copy(output, decomp)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("closing", file)

	err = output.Close()
	if err != nil {
		log.Fatal(err)
	}
	err = inputFile.Close()
	if err != nil {
		log.Fatal(err)
	}
}

@hdorio hdorio force-pushed the deflate branch 2 times, most recently from 4ddf3b3 to 2834cab Compare January 23, 2022 18:27
Read bytes to check expected values instead of reading and hashing them.
Hashing is a waste of time when we can just read and compare.
This also removes a dependency on std.crypto.hash.sha2.Sha256 for tests.
Replaces the inflate API from `inflateStream(reader: anytype, window_slice: []u8)` to
`decompressor(allocator: mem.Allocator, reader: anytype, dictionary: ?[]const u8)` and
`compressor(allocator: mem.Allocator, writer: anytype, options: CompressorOptions)`
@squeek502
Copy link
Collaborator

squeek502 commented Jan 24, 2022

Ran a fuzzer that does input -> compressor -> decompressor -> output and checks that the output is the same as the input for a few hours and found no issues. The previous decompressor-specific fuzzers run cleanly now, too.

When running the simple decompressor fuzzer, though, I did notice that there are some inputs that take significantly longer to decompress than others. I haven't investigated why this is, but might be something worth looking into:

slow-inputs.zip

And here's an input for comparison. It's of roughly the same size as most of the slow inputs, but runs way faster (note: it's a single file within a zip so Github would allow the upload):

fast-input.zip

(the fast input was generated with the compressor in this PR acting on ~235KiB of random bytes)

@hdorio
Copy link
Contributor Author

hdorio commented Jan 24, 2022

The previous decompressor-specific fuzzers run cleanly now, too.

@squeek502 👍

I did notice that there are some inputs that take significantly longer to decompress than others.

The time taken to decompress does not depend on the size of the compressed data but instead depends on the uncompress result.
The time difference you see is because for fast-input the output size is ~240k while all slow-inputs outputs are ~5m or more.

deflate:  1.963ms size: 240640, slow-inputs/fast-input.deflate
deflate: 53.172ms size: 5428752, slow-inputs/id:000019,src:000003,time:3887406,op:havoc,rep:2
deflate: 52.198ms size: 5428752, slow-inputs/id:000020,src:000003,time:3887945,op:havoc,rep:2
deflate: 54.407ms size: 5428752, slow-inputs/id:000020,src:000284+000003,time:3969484,op:MOpt_splice,rep:2
deflate: 52.077ms size: 5428950, slow-inputs/id:000022,src:000310+000003,time:4010816,op:MOpt_core_splice,rep:4
deflate: 48.953ms size: 5428752, slow-inputs/id:000023,src:000323+000003,time:3962286,op:splice,rep:4
deflate: 52.895ms size: 5428752, slow-inputs/id:000024,src:000130+000003,time:4236011,op:splice,rep:2
deflate: 52.443ms size: 5434897, slow-inputs/id:000025,src:000088+000003,time:4279960,op:MOpt_core_splice,rep:2
deflate: 49.961ms size: 5427961, slow-inputs/id:000025,src:000130+000003,time:4236170,op:splice,rep:8
deflate: 69.007ms size: 7309877, slow-inputs/id:000026,src:000072+000003,time:4066371,op:splice,rep:2
deflate: 50.103ms size: 5441154, slow-inputs/id:000026,src:000136+000003,time:6711373,op:MOpt_core_splice,rep:4
deflate: 54.263ms size: 5847114, slow-inputs/id:000026,src:000334+000003,time:4287373,op:splice,rep:16
deflate: 51.619ms size: 5503102, slow-inputs/id:000027,src:000319+000003,time:4239148,op:splice,rep:4
deflate: 52.270ms size: 5425612, slow-inputs/id:000028,src:000141+000003,time:4682784,op:splice,rep:4
deflate: 51.983ms size: 5428752, slow-inputs/id:000028,src:000319+000003,time:4239340,op:splice,rep:8
deflate: 54.148ms size: 5436356, slow-inputs/id:000029,src:000108+000003,time:5796153,op:splice,rep:2
deflate: 52.676ms size: 5424864, slow-inputs/id:000033,src:000208+000003,time:4502965,op:splice,rep:8
deflate: 51.666ms size: 5428752, slow-inputs/id:000035,src:000134+000003,time:5741645,op:splice,rep:4
deflate: 53.454ms size: 5478974, slow-inputs/id:000036,src:000094+000003,time:7209724,op:splice,rep:8
deflate: 67.449ms size: 7357039, slow-inputs/id:000036,src:000108+000343,time:9255797,op:splice,rep:4
deflate: 52.679ms size: 5587524, slow-inputs/id:000038,src:000050+000003,time:9352788,op:splice,rep:2

@andrewrk
Copy link
Member

Awesome work, thank you both for working together on the fuzzed test cases.

@andrewrk andrewrk merged commit 0c1df96 into ziglang:master Jan 26, 2022
squeek502 added a commit to squeek502/zig-std-lib-fuzzing that referenced this pull request Jan 26, 2022
Now that ziglang/zig#10552 has been merged, we can replace the old fuzzers with the new ones.
@hdorio hdorio deleted the deflate branch January 26, 2022 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants