-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress: add a deflate compressor #10552
Conversation
I assume you're going to need a lot more reasoning (benchmarks? use-cases? bugs in the old implementation?) for wholesale replacing the previous inflate implementation (note also that the previous inflate implementation was recently fuzzed in order to affirm that it matches In fact, it'd probably make much more sense to split this PR into a compression-only PR and a decompression-only PR. |
Not really, any puff.c based implementation should be replaced.
You mean like this one? 41f244b
The Go implementation is based off https://github.com/madler/zlib/raw/master/doc/algorithm.txt . I invite you to do your own benchmarks with your own sample data, just keep in mind this new implementation does a lot of small reads for the decompression, so don't pass in a file reader directly, use a buffer or read the whole input into memory to avoid lots of small fread() system calls. |
I'm not able to run your benchmark with the latest commit of this branch, did something in the compressor change? EDIT: Nevermind, found the problem, line 40 is using https://gist.github.com/hdorio/5e4f26150198a7515d89f01be5e0a45f#file-decompress-zig-L40 Stacktraces from the broken benchmark program(note: I'm running with ReleaseSafe just to get a stacktrace):
and with the old deflate commented out, I still get an error when decompressing:
|
Here are the results I'm getting with a modified version of your benchmark to try a few different sizes (sizes shown are the uncompressed sizes):
The modified benchmark codeconst std = @import("std");
const fmt = std.fmt;
const io = std.io;
const math = std.math;
const Allocator = std.mem.Allocator;
const GeneralPurposeAllocator = std.heap.GeneralPurposeAllocator;
const Timer = std.time.Timer;
const deflate = std.compress.deflate;
const compressor = deflate.compressor;
const decompressor = deflate.decompressor;
const old_deflate = @import("old_deflate.zig");
const compressed_filename = "data.deflate";
pub fn create(allocator: Allocator, uncompressed_size: usize) !void {
const file = try std.fs.cwd().createFile(compressed_filename, .{ .read = true });
defer file.close();
var comp = try compressor(allocator, file.writer(), deflate.default_compression, null);
defer comp.deinit();
const writer = std.io.bufferedWriter(comp.writer()).writer();
var i: usize = 0;
while (i < uncompressed_size) : (i += 1) {
try writer.writeByte(@truncate(u8, @truncate(u16, i) *| i));
}
try comp.close();
}
pub fn deflate_old(allocator: Allocator) !void {
const file = try std.fs.cwd().openFile(compressed_filename, .{ .read = true });
defer file.close();
var data = try file.reader().readAllAlloc(allocator, math.maxInt(usize));
defer allocator.free(data);
var fb = io.fixedBufferStream(data);
var window: [0x8000]u8 = undefined;
var inflate = old_deflate.inflateStream(fb.reader(), &window);
const output = try std.fs.cwd().createFile("out.bin", .{ .read = true });
defer output.close();
var buf = [1]u8{0} ** (1 << 15);
var r: usize = 1;
while (r > 0) {
r = try inflate.reader().read(&buf);
try output.writer().writeAll(buf[0..r]);
}
}
pub fn deflate_new(allocator: Allocator) !void {
const file = try std.fs.cwd().openFile(compressed_filename, .{ .read = true });
defer file.close();
var data = try file.reader().readAllAlloc(allocator, math.maxInt(usize));
defer allocator.free(data);
var fb = io.fixedBufferStream(data);
var decomp = try decompressor(allocator, fb.reader(), null);
defer decomp.deinit();
const output = try std.fs.cwd().createFile("out.bin", .{ .read = true });
defer output.close();
var buf = [1]u8{0} ** (1 << 15);
var r: usize = 1;
while (r > 0) {
r = try decomp.reader().read(&buf);
try output.writer().writeAll(buf[0..r]);
}
}
pub fn main() !void {
var gpa = GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
var allocator = gpa.allocator();
const file_sizes = &[_]usize{
100,
1024,
1024 * 1024,
10 * 1024 * 1024,
22 * 1024 * 1024,
100 * 1024 * 1024,
220 * 1024 * 1024,
};
for (file_sizes) |file_size| {
std.debug.print("\nsize: {}\n", .{std.fmt.fmtIntSizeBin(file_size)});
try create(allocator, file_size);
var timer = try Timer.start();
var start: usize = 0;
var stop: usize = 0;
start = timer.lap();
try deflate_old(allocator);
stop = timer.lap();
const old_duration = stop - start;
std.debug.print("deflate_old: {}\n", .{fmt.fmtDuration(old_duration)});
start = timer.lap();
try deflate_new(allocator);
stop = timer.lap();
const new_duration = stop - start;
std.debug.print("deflate_new: {}\n", .{fmt.fmtDuration(new_duration)});
const faster = if (old_duration > new_duration) "new" else "old";
const diff: f64 = @intToFloat(f64, std.math.max(old_duration, new_duration)) / @intToFloat(f64, std.math.min(old_duration, new_duration));
std.debug.print("{s} is {d:.2}x faster\n", .{ faster, diff });
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get very far through the code review, just some superficial style fixes and general thoughts.
The benchmarks certainly support the fact that this implementation is faster than the status quo. However, I don't think that justifies removing the current implementation from the standard library. There other aspects of code that can be important beyond just speed. The full text of the comment from puff.c you referenced touches on this:
puff.c is a simple inflate written to be an unambiguous way to specify the
deflate format. It is not written for speed but rather simplicity. As a
side benefit, this code might actually be useful when small code is more
important than speed, such as bootstrap applications. For typical deflate
data, zlib's inflate() is about four times as fast as puff(). zlib's
inflate compiles to around 20K on my machine, whereas puff.c compiles to
around 4K on my machine (a PowerPC using GNU cc). If the faster decode()
function here is used, then puff() is only twice as slow as zlib's
inflate().
All dynamically allocated memory comes from the stack. The stack required
is less than 2K bytes. This code is compatible with 16-bit int's and
assumes that long's are at least 32 bits. puff.c uses the short data type,
assumed to be 16 bits, for arrays in order to conserve memory. The code
works whether integers are stored big endian or little endian.
To add on to that, simplicity is also highly desirable when code must be strictly audited.
Anyhow, I don't think removing the current implementation from the standard library is the right call here, perhaps there is room for both. This would also be nice as we could test them against each other.
Sure, where should I put the slow one? Is |
Hi @hdorio! Thank you for providing this port. And thanks @squeek502 for the benchmark data. Based on this data, it is clear that there is room for a more sophisticated implementation such as this. However, I also agree with @ifreund that there is room for a simpler implementation that requires less machine code, at the cost of performance - especially if the simpler implementation does not require dynamic heap memory allocation. Before I take the time to do a detailed code review, let's figure out the higher level organization: We have the concept of ReleaseSmall which can be used to select the puff implementation rather than the go port using conditional compilation, like this: const builtin = @import("builtin");
const Deflate = switch (builtin.mode) {
.ReleaseSmall => SimpleDeflate,
else => FastDeflate,
}; There are two issues with this:
@hdorio what do you think? Can the ported go code be adjusted to require a constant amount of memory? Or perhaps could the scratch memory buffer be provided by the caller for both implementations, thereby unifying the two interfaces? To answer to your questions directly:
First we need names for them. One suggestion for their names:
Or perhaps they can be named after their desired characteristics rather than tying them to particular implementations:
Once we have names for them, we can decide their fully qualified names:
After looking at them for a couple minutes I think I prefer the latter naming style. Anyway, that tells you the file paths then:
Although I will note that probably the stream code can be shared between them and does not need to be implemented twice.
They should both use |
@andrew thanks for taking some time on this At first, your idea seemed good to me but after some preliminary tests, I would advise against putting the slow implementation under -OReleaseSmall.
I think if we want to keep the slow one, we should let the user choose between the two. The gain in size is not clear When both implementations use an allocator (for reading the whole file in memory) the code size of the fast implementation is smaller of ~3k (22984), when an allocator is not used for the slow one, the slow one is smaller of 2k (20144), when a custom allocator is used for the fast one, the fast one is smaller of 3k (17032). binaries sizes (-OReleaseSmall)
The performance gap decompress.zigstrip.strip.no-alloc.buffered-reader.old vs decompress.zigstrip.strip.custom-alloc.new I'm sure the slow one is perfect for some environments where memory constraints are very harsh. 348 kbytes (slow) vs 3120 kbytes (fast) that's very good but the time for a 660mb file of 1m2.166s (slow) vs 23.012s (new) is not going from good to medium runtime performance to me but from ok to poor (it's subjective). Here's the thing... the fast implementation is not that good. I expect someone to do much better than this, like around 2x the current inflate speed. And if the zlib documentation is correct, I expect someone to do this for a constant amount of memory of 40kb instead of around 3mb for mine. Maximum resident set size (time -v)
unifying the two interfaces
Last time I tried, I failed. But It seems it's technically possible, zlib uses the same algorithm and end up with this
Note that the 7KB is dynamic memory.
I don't know how much memory would be needed for the scratch memory buffer, this depends on the stream being decompressed. Also the slow one would need to be modified to be able to take a dictionary. bikeshedding
I also prefer
The whole shrink/unshrink, implode/explode, deflate/inflate trend is cute but as a non-native english speaker I would much prefer to have to remember the name of the algorithm (deflate) then choose to decompress_stream or compress_stream instead of remembering the opposite of 'deflate' is 'inflate'. |
I think it's fair to assume most people would want a faster implementation so it makes sense to add it to the standard library. Small code size is still more important than "speed" in some cases so the original implementation should still be kept around. imo the best way to go about with this would be to name them different things like "puff" and "goflate" or "fast" and "small" or whatever and let the user decide which one to use. |
Unless I misunderstand what "small code size" means and unless my tests were wrong, the original is not smaller. https://gist.github.com/hdorio/17332012563cfa022622d26ef63b095b |
Thanks for the follow-up, @hdorio! Please be patient with me 🙏 I took a vacation last week and now we are swimming in PRs :-) |
ae223a4
to
2899498
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience on getting a review, @hdorio. This is some really nice work that you have done. Overall it looks quite reasonable.
I have a few requested changes, along with a bit of a spicy rant against Go - please take it in a lighthearted manner :-)
With all of these requests addressed, I would be satisfied with the code and it can be merged, entirely replacing the puff.c port as the only implementation of deflate in the Zig std lib.
I see some opportunities for further optimizations, but that is a separate concern, and need not distract us from this improvement that you have provided.
b1e59f9
to
0f85096
Compare
Did a bit of fuzzing of the decompressor (fuzzer implementations here as Findings:
Besides that, everything seems good fuzzing-wise (not too comprehensive, though, just a few hours of fuzzing). Haven't tried fuzzing the compressor yet. |
Thanks a lot @squeek502 👍
This error was introduced by me and is a drift from the Go code. I modified the Zig code to have the same behavior, all tests output an error now, like Go. inflate with Go
|
4ddf3b3
to
2834cab
Compare
Read bytes to check expected values instead of reading and hashing them. Hashing is a waste of time when we can just read and compare. This also removes a dependency on std.crypto.hash.sha2.Sha256 for tests.
Replaces the inflate API from `inflateStream(reader: anytype, window_slice: []u8)` to `decompressor(allocator: mem.Allocator, reader: anytype, dictionary: ?[]const u8)` and `compressor(allocator: mem.Allocator, writer: anytype, options: CompressorOptions)`
Ran a fuzzer that does When running the simple decompressor fuzzer, though, I did notice that there are some inputs that take significantly longer to decompress than others. I haven't investigated why this is, but might be something worth looking into: And here's an input for comparison. It's of roughly the same size as most of the slow inputs, but runs way faster (note: it's a single file within a zip so Github would allow the upload): (the fast input was generated with the compressor in this PR acting on ~235KiB of random bytes) |
The time taken to decompress does not depend on the size of the compressed data but instead depends on the uncompress result.
|
Awesome work, thank you both for working together on the fuzzed test cases. |
Now that ziglang/zig#10552 has been merged, we can replace the old fuzzers with the new ones.
Replaces the inflate API from
inflateStream(reader: anytype, window_slice: []u8)
todecompressor(allocator: mem.Allocator, reader: anytype, dictionary: ?[]const u8)
andcompressor(allocator: mem.Allocator, writer: anytype, options: CompressorOptions)
This is a translation of the Go code of the compress/flate package from
https://go.googlesource.com/go/+/refs/tags/go1.17/src/compress/flate/
#213