-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Detect truncated GZip streams #61768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsFor context, please see:
TL;DR: The I initially used the approach of relying on the zlib But, after more troubleshooting with the help from @marcin-krystianc, it seems we could do without the new API if we don't rely on zlib buf errors. Some of the existing unit tests should actually be changed to account for the new behavior (e.g. throw on invalid data) as I've done in mfkl@196783b I'm eager to hear your feedback on this as I'm no compression expert.
|
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs
Outdated
Show resolved
Hide resolved
196783b
to
6adb821
Compare
Hmm the failed jobs look unrelated to this PR. This is ready for review /cc @adamsitnik @danmoseley @carlossanlop |
{ | ||
ThrowGenericInvalidData(); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 'else' needed here - eg codegen improves because it does not realize that the throw method never returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to properly check this but I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I checked things correctly (by codegen you meant IL and not x86, right?), this is the result. With the else
// [286 28 - 286 54]
IL_00bc: call void System.IO.Compression.DeflateStream::ThrowGenericInvalidData()
IL_00c1: nop
// [287 25 - 287 26]
IL_00c2: nop
IL_00c3: br.s IL_00c8
// [289 25 - 289 26]
IL_00c5: nop
// [290 29 - 290 35]
IL_00c6: br.s IL_0120
// [292 21 - 292 22]
IL_00c8: nop
IL_00c9: br.s IL_00fc
Without the else
// [286 28 - 286 54]
IL_00bc: call void System.IO.Compression.DeflateStream::ThrowGenericInvalidData()
IL_00c1: nop
// [287 25 - 287 26]
IL_00c2: nop
// [290 29 - 290 35]
IL_00c3: br.s IL_011a
So it seems you are correct! I updated the PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By codegen we generally mean assembly code. Usually the quickest way to verify that is with sharplab.io. Not sure how much the IL correlates. But it makes sense to change as you suggest.
data = compressed.ToArray(); | ||
} | ||
|
||
for (var i = 1; i < data.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this manage to hit all the 4 new throw statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does.
@@ -17,6 +17,7 @@ internal sealed class Inflater : IDisposable | |||
private const int MinWindowBits = -15; // WindowBits must be between -8..-15 to ignore the header, 8..15 for | |||
private const int MaxWindowBits = 47; // zlib headers, 24..31 for GZip headers, or 40..47 for either Zlib or GZip | |||
|
|||
private bool _nonZeroInput; // Whether there is any non zero input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, would nonEmptyInput be better name? a zero input might be a buffer of zeroes.
or perhaps this means nonEmptyOutput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to nonEmptyInput.
cc @adamsitnik @carlossanlop @jozkee are the full compression crew. |
9c54663
to
def1924
Compare
As mentioned in #61768 (comment), CI failure is unrelated, please feel free to restart a CI build, thanks! |
@mfkl you can restart CI by close and reopen actually .. |
Anything I can do to help move the review process forward? Even though it is quite an edge case, it'd be nice to get this merged in, as it fixes one really old bug with regard to gzip streams in dotnet. |
Hey there @danmoseley @jozkee, any updates from your side on this? |
if (!_deflateStream._inflater.Finished()) | ||
{ | ||
ThrowGenericInvalidData(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These conditions still hold true in CopyTo[Async].
if (!_deflateStream._inflater.Finished()) | |
{ | |
ThrowGenericInvalidData(); | |
} | |
if (!_deflateStream._inflater.Finished()) | |
{ | |
Debug.Assert(_deflateStream._inflater.NonEmptyInput()); | |
Debug.Assert(_deflateStream._inflater.AvailableOutput > 0); | |
ThrowGenericInvalidData(); | |
} |
@@ -418,6 +422,10 @@ async ValueTask<int> Core(Memory<byte> buffer, CancellationToken cancellationTok | |||
int n = await _stream.ReadAsync(new Memory<byte>(_buffer, 0, _buffer.Length), cancellationToken).ConfigureAwait(false); | |||
if (n <= 0) | |||
{ | |||
if (!_inflater.Finished() && _inflater.NonEmptyInput() && _inflater.AvailableOutput > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the condition as "if we haven't finished inflating AND there's partial input set AND there's already output available; we are in the bad state and we should throw".
Is it possible (i.e: an scenario) where there could be partial input but not available output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please double check test-code-coverage of the three conditions. If I remove _inflater.AvailableOutput > 0
all tests still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without _inflater.AvailableOutput > 0
, the following test fails.
System.IO.InvalidDataException : Found invalid data while decoding.
Stack Trace:
DeflateStream.ThrowGenericInvalidData() line 348
DeflateStream.ReadCore(Span`1 buffer) line 280
DeflateStream.Read(Byte[] buffer, Int32 offset, Int32 count) line 235
ZipFileTestBase.ReadAllBytes(Stream stream, Byte[] buffer, Int32 offset, Int32 count) line 83
ZipFileTestBase.StreamsEqual(Stream ast, Stream bst, Int32 blocksToRead) line 142
ZipFileTestBase.StreamsEqual(Stream ast, Stream bst) line 111
ZipFile_Create.CreateFromDirectory_IncludeBaseDirectory() line 42
This is the reason I added an async version of the test in this PR, to cover the async code path.
src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/Inflater.cs
Show resolved
Hide resolved
continue; | ||
} | ||
|
||
throw new XunitException($"Truncated stream was decompressed successfully: length={i}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside of truncation tests, could we test changing byte values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'll re-use the test from the original SO thread, if that is fine with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfkl it's fine to reuse that from SO per the SO license. If you end up essentially pasting as-is, please add a few lines to our THIRD-PARTY-NOTICES to ensure that the poster is recognized. If you write something that effectively does the same thing, that's probably not necessary.
Yes, I believe so @danmoseley. |
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
// - Inflation is not finished yet. | ||
// - Provided input wasn't completely empty | ||
// In such case, we are dealing with a truncated input stream. | ||
if (!buffer.IsEmpty && !_inflater.Finished() && _inflater.NonEmptyInput()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been rethinking it and I think it makes sense to not throw for buffer.IsEmpty
, we don't pinvoke inflate
in such case so there's no way to tell if there's still available output on the inflater.
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs
Outdated
Show resolved
Hide resolved
try | ||
{ | ||
while(ZipFileTestBase.ReadAllBytes(s, buffer, 0, buffer.Length) != 0) { }; | ||
|
||
Assert.Equal(source, buffer); | ||
} | ||
catch (InvalidDataException) | ||
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will eat-up false positives. You can do this instead:
Assert.Throws<InvalidDataException>(() => { while (ZipFileTestBase.ReadAllBytes(s, buffer, 0, buffer.Length) != 0) ; });
If there are cases where a corruption can't be expected (e.g: changing a header byte), we should skip them, but they should be deliberated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did this (code may not be elegant though, let me know if you have a more elegant version in mind).
Skipping though means that we do not check the decompression process was successful in these cases (e.g. comparing the decompressed buffer with the initial source). It may or may not be relevant to do so, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what you mean here, but I think the way you have it in the latest commits is the way to go, we should always assert for the exception when desired, not just ignore it (unless there's a reason for it).
runtime/src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.ZLib.cs
Lines 49 to 52 in e4c1d7f
Assert.Throws<InvalidDataException>(() => | |
{ | |
while (ZipFileTestBase.ReadAllBytes(decompressor, buffer, 0, buffer.Length) != 0); | |
}); |
other types of stream can't detect corruption properly
5f52914
to
0c492ab
Compare
zlib should be able to detect corruption as with gzip currently (but not brotli). I'm looking for a way to add this test without duplicating code (as some types of streams are unable to detect such corruption). |
For the truncated error case, should a new error message be created to more precisely describe the problem? Currently Aside from this, I think it's OK now. Let me know! |
// corrupting these bytes goes undetected by gzip, skip them | ||
int[] byteToSkip = { 3, 4, 5, 6, 7, 8, 9 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add, as a comment, the reason why these bytes don't corrupt the compressed data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
@@ -16,5 +18,44 @@ public class ZLibStreamUnitTests : CompressionStreamUnitTestBase | |||
public override Stream CreateStream(Stream stream, CompressionLevel level, bool leaveOpen) => new ZLibStream(stream, level, leaveOpen); | |||
public override Stream BaseStream(Stream stream) => ((ZLibStream)stream).BaseStream; | |||
protected override string CompressedTestFile(string uncompressedPath) => Path.Combine("ZLibTestData", Path.GetFileName(uncompressedPath) + ".z"); | |||
|
|||
[Fact] | |||
public void StreamCorruption_IsDetected() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add this test to CompressionStreamUnitTests.Deflate.cs
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly this is not really an option as DeflateStream
does not do CRC, hence fails to detect stream corruption (unlike zlib and gzip streams). Truncation detection works fine though.
I think we can do that, feel free to add your proposed error message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @mfkl
Added When you commit this breaking change:
Tagging @dotnet/compat for awareness of the breaking change. |
Thanks for the review feedback @jozkee :) |
Reverted in #72742 |
@ericstj, I've removed the breaking change labels as this was reverted. |
Thank you @stephentoub |
For context, please see:
TL;DR: The
System.IO.Compression.GZipStream
implementation over zlib does not currently do strict native error reporting when it comes to truncated streams. Other zlib .NET managed implementation or bindings over the native API do report issues on corrupt data, but they rely on an error code returned by zlib, which should be ignored sometimes. This PR tries to address this edge case.I initially used the approach of relying on the zlib
buf errors
to implement this (as some other community wrappers have done), but it failed some of your tests. So, I decided to add a new API parameter to enable "strict validation" as an opt-in.But, after more troubleshooting with the help from @marcin-krystianc, it seems we could do without the new API if we don't rely on zlib buf errors.
Some of the existing unit tests should actually be changed to account for the new behavior (e.g. throw on invalid data) as I've done in mfkl@196783bI'm eager to hear your feedback on this as I'm no compression expert.