Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib: Allow zlib.gunzip features with zlib.unzip #5884

Closed

Conversation

addaleax
Copy link
Member

Pull Request check-list

  • Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
    this change (including linting)?
  • Is the commit message formatted according to CONTRIBUTING.md?
  • If this change fixes a bug (or a performance problem), is a regression
    test (or a benchmark) included?

Affected core subsystem(s)

zlib

Description of change

Detect whether a gzip file is being passed to unzip* by testing the first bytes for the gzip magic bytes, and setting the decompression mode to GUNZIP or INFLATE according to the result.

This enables gzip-only features like multi-member support to be used together with the unzip* autodetection support and thereby makes gunzip* and unzip* return identical results for gzip input again.

This is more or less a follow-up change to #5120.

@Fishrock123 Fishrock123 added the zlib Issues and PRs related to the zlib subsystem. label Mar 24, 2016
@addaleax addaleax changed the title Allow zlib.gunzip features with zlib.unzip zlib: Allow zlib.gunzip features with zlib.unzip Mar 24, 2016
@jasnell jasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 24, 2016
@bnoordhuis
Copy link
Member

LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/2051/

I asked you in the other PR to remove the defines but that was before I'd seen this PR.

@addaleax
Copy link
Member Author

Sorry, even if CI passes, I’ll have to update this with a somewhat different approach and more tests.
On the JS side, currently the assumption is made that availInAfter > 0 implies availOutAfter === 0, but with the break; statement introduced here that would no longer be the case.

@addaleax
Copy link
Member Author

Okay, added a new commit with tests for the case that even the magic bytes don’t come in a single chunk (I know it’s quite an unlikely edge case, but I guess even that should be supported). The new test will require the bugfix from #5883 to pass, though.

@kthelgason
Copy link
Contributor

This looks like something that should obviously be supported. The only relevant bytes here are the first two bytes of the file, right? When would these be in separate chunks?

@addaleax
Copy link
Member Author

Yes, these are the only inspected bytes. And honestly, I have no idea how they might get split into separate chunks. 😄 But it could, because we don’t have control about where the data comes from, and I think that means either not supporting this feature at all or creating an edge case in which it does not work.
And if you have a better idea on how to implement this, I’m all for it! I’m pretty sure there must be more beautiful solutions than the nested switch in node_zlib.cc.

@kthelgason
Copy link
Contributor

I guess if someone set the chunksize to 1 when constructing a Gunzip object it could happen, whether that's a very clever thing to do is another matter! In any case I agree that we should try to support every case that can occur or none at all.

@addaleax
Copy link
Member Author

> zlib.createGunzip({chunkSize:1})
Error: Invalid chunk size: 1

I guess you’re talking about something like receiving Z_STREAM_END from zlib exactly when the input buffer ends with a gzip member? I don’t think that’s actually a problem, Z_STREAM_END gets pretty much treated the same as Z_OK if I read the code correctly.

@kthelgason
Copy link
Contributor

Actually, your example is exactly what I was thinking of, but I'm happy to see that it isn't possible. But obviously one can construct pathological cases where this would happen, like passing one byte at a time to unzip.write.

Anyway, I give this a big 👍, thanks for improving on this 😃

@addaleax
Copy link
Member Author

I think that should be covered by the test cases here, but I guess you can feel free to create more of them.

And again, good to have you on board with this! 😄

@@ -236,6 +239,40 @@ class ZCtx : public AsyncWrap {
ctx->err_ = deflate(&ctx->strm_, ctx->flush_);
break;
case UNZIP:
next_expected_header_byte = ctx->strm_.next_in;

switch (ctx->gzip_id_bytes_read_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: would it be necessary to check that ctx->strm_.avail_in > 0 before trying to read the first byte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that vein, shouldn't there be a similar check for GZIP_HEADER_ID2?

Another question: what happens when the state machines sees GZIP_HEADER_ID1 followed by a byte that's not GZIP_HEADER_ID2? Is the first byte lost?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, you’re right – updated again. And no, no bytes are lost in that case; there is always the fall-through in the outer switch from case UNZIP: to case INFLATE:, so all input ends up being passed to inflate().

@addaleax addaleax force-pushed the allow-gunzip-features-with-unzip branch from fd10a48 to 2598a0f Compare March 31, 2016 12:04
@addaleax
Copy link
Member Author

Rebased and updated with your suggestion; The tests are passing now that #5883 is merged.

@addaleax addaleax force-pushed the allow-gunzip-features-with-unzip branch from 2598a0f to a388a5e Compare March 31, 2016 12:34
@jasnell
Copy link
Member

jasnell commented Apr 1, 2016

LGTM if @bnoordhuis is happy.

@jasnell
Copy link
Member

jasnell commented Apr 1, 2016

@bnoordhuis
Copy link
Member

LGTM with the style nit that comments should be capitalized and punctuated.

Detect whether a gzip file is being passed to `unzip*` by
testing the first bytes for the gzip magic bytes, and setting
the decompression mode to `GUNZIP` or `INFLATE` according to
the result.

This enables gzip-only features like multi-member support
to be used together with the `unzip*` autodetection support
and thereby makes `gunzip*` and `unzip*` return identical
results for gzip input again.

Add a simple test for checking that features specific to
`zlib.gunzip`, notably support for multiple members, also work
when using `zlib.unzip`.
@addaleax addaleax force-pushed the allow-gunzip-features-with-unzip branch from a388a5e to a8e1fb5 Compare April 2, 2016 01:03
@addaleax
Copy link
Member Author

addaleax commented Apr 2, 2016

Updated the comments. :)

@bnoordhuis
Copy link
Member

Thanks Anna, landed in 2d7e316.

@bnoordhuis bnoordhuis closed this Apr 5, 2016
bnoordhuis pushed a commit that referenced this pull request Apr 5, 2016
Detect whether a gzip file is being passed to `unzip*` by
testing the first bytes for the gzip magic bytes, and setting
the decompression mode to `GUNZIP` or `INFLATE` according to
the result.

This enables gzip-only features like multi-member support
to be used together with the `unzip*` autodetection support
and thereby makes `gunzip*` and `unzip*` return identical
results for gzip input again.

Add a simple test for checking that features specific to
`zlib.gunzip`, notably support for multiple members, also work
when using `zlib.unzip`.

PR-URL: #5884
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
@addaleax addaleax deleted the allow-gunzip-features-with-unzip branch April 5, 2016 18:05
MylesBorins pushed a commit that referenced this pull request Apr 20, 2016
Detect whether a gzip file is being passed to `unzip*` by
testing the first bytes for the gzip magic bytes, and setting
the decompression mode to `GUNZIP` or `INFLATE` according to
the result.

This enables gzip-only features like multi-member support
to be used together with the `unzip*` autodetection support
and thereby makes `gunzip*` and `unzip*` return identical
results for gzip input again.

Add a simple test for checking that features specific to
`zlib.gunzip`, notably support for multiple members, also work
when using `zlib.unzip`.

PR-URL: #5884
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Apr 21, 2016
@MylesBorins MylesBorins mentioned this pull request Apr 21, 2016
MylesBorins pushed a commit that referenced this pull request Apr 21, 2016
Detect whether a gzip file is being passed to `unzip*` by
testing the first bytes for the gzip magic bytes, and setting
the decompression mode to `GUNZIP` or `INFLATE` according to
the result.

This enables gzip-only features like multi-member support
to be used together with the `unzip*` autodetection support
and thereby makes `gunzip*` and `unzip*` return identical
results for gzip input again.

Add a simple test for checking that features specific to
`zlib.gunzip`, notably support for multiple members, also work
when using `zlib.unzip`.

PR-URL: #5884
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor PRs that contain new features and should be released in the next minor version. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants