Detect truncated GZip streams #61768

mfkl · 2021-11-18T08:26:28Z

For context, please see:

TL;DR: The System.IO.Compression.GZipStream implementation over zlib does not currently do strict native error reporting when it comes to truncated streams. Other zlib .NET managed implementation or bindings over the native API do report issues on corrupt data, but they rely on an error code returned by zlib, which should be ignored sometimes. This PR tries to address this edge case.

I initially used the approach of relying on the zlib buf errors to implement this (as some other community wrappers have done), but it failed some of your tests. So, I decided to add a new API parameter to enable "strict validation" as an opt-in.

But, after more troubleshooting with the help from @marcin-krystianc, it seems we could do without the new API if we don't rely on zlib buf errors. ~~Some of the existing unit tests should actually be changed to account for the new behavior (e.g. throw on invalid data) as I've done in mfkl@196783b~~

I'm eager to hear your feedback on this as I'm no compression expert.

ghost · 2021-11-18T08:26:36Z

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

For context, please see:

TL;DR: The System.IO.Compression.GZipStream implementation over zlib does not currently do strict native error reporting when it comes to truncated streams. Other zlib .NET managed implementation or bindings over the native API do report issues on corrupt data, but they rely on an error code returned by zlib, which should be ignored sometimes. This PR tries to address this edge case.

I initially used the approach of relying on the zlib buf errors to implement this (as some other community wrappers have done), but it failed some of your tests. So, I decided to add a new API parameter to enable "strict validation" as an opt-in.

But, after more troubleshooting with the help from @marcin-krystianc, it seems we could do without the new API if we don't rely on zlib buf errors. Some of the existing unit tests should actually be changed to account for the new behavior (e.g. throw on invalid data) as I've done in mfkl@196783b

I'm eager to hear your feedback on this as I'm no compression expert.

Author:	mfkl
Assignees:	-
Labels:	`area-System.IO.Compression`
Milestone:	-

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs

mfkl · 2021-11-26T03:22:29Z

Hmm the failed jobs look unrelated to this PR. This is ready for review /cc @adamsitnik @danmoseley @carlossanlop

danmoseley · 2021-11-26T23:17:37Z

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs

+                        {
+                            ThrowGenericInvalidData();
+                        }
+                        else


is the 'else' needed here - eg codegen improves because it does not realize that the throw method never returns?

I'm not sure how to properly check this but I'll look into it.

If I checked things correctly (by codegen you meant IL and not x86, right?), this is the result. With the else

// [286 28 - 286 54] IL_00bc: call void System.IO.Compression.DeflateStream::ThrowGenericInvalidData() IL_00c1: nop // [287 25 - 287 26] IL_00c2: nop IL_00c3: br.s IL_00c8 // [289 25 - 289 26] IL_00c5: nop // [290 29 - 290 35] IL_00c6: br.s IL_0120 // [292 21 - 292 22] IL_00c8: nop IL_00c9: br.s IL_00fc

Without the else

// [286 28 - 286 54] IL_00bc: call void System.IO.Compression.DeflateStream::ThrowGenericInvalidData() IL_00c1: nop // [287 25 - 287 26] IL_00c2: nop // [290 29 - 290 35] IL_00c3: br.s IL_011a

So it seems you are correct! I updated the PR accordingly.

By codegen we generally mean assembly code. Usually the quickest way to verify that is with sharplab.io. Not sure how much the IL correlates. But it makes sense to change as you suggest.

danmoseley · 2021-11-26T23:19:18Z

src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.Gzip.cs

+                data = compressed.ToArray();
+            }
+
+            for (var i = 1; i < data.Length; i++)


does this manage to hit all the 4 new throw statements?

Yes it does.

danmoseley · 2021-11-26T23:21:09Z

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/Inflater.cs

@@ -17,6 +17,7 @@ internal sealed class Inflater : IDisposable
        private const int MinWindowBits = -15;              // WindowBits must be between -8..-15 to ignore the header, 8..15 for
        private const int MaxWindowBits = 47;               // zlib headers, 24..31 for GZip headers, or 40..47 for either Zlib or GZip

+        private bool _nonZeroInput;                         // Whether there is any non zero input


nit, would nonEmptyInput be better name? a zero input might be a buffer of zeroes.
or perhaps this means nonEmptyOutput?

Changed to nonEmptyInput.

danmoseley · 2021-11-26T23:22:26Z

cc @adamsitnik @carlossanlop @jozkee are the full compression crew.

mfkl · 2022-01-21T09:34:49Z

As mentioned in #61768 (comment), CI failure is unrelated, please feel free to restart a CI build, thanks!

danmoseley · 2022-01-21T17:04:24Z

@mfkl you can restart CI by close and reopen actually ..

mfkl · 2022-02-07T10:45:25Z

Anything I can do to help move the review process forward?

Even though it is quite an edge case, it'd be nice to get this merged in, as it fixes one really old bug with regard to gzip streams in dotnet.

stackedsax · 2022-02-28T18:06:26Z

Hey there @danmoseley @jozkee, any updates from your side on this?

danmoseley · 2022-03-01T04:19:12Z

Apologies for the delay @mfkl . Perhaps @jozkee can set expectations as I know he's juggling several tasks.

jozkee · 2022-04-13T02:28:35Z

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs

+                    if (!_deflateStream._inflater.Finished())
+                    {
+                        ThrowGenericInvalidData();
+                    }


These conditions still hold true in CopyTo[Async].

Suggested change

if (!_deflateStream._inflater.Finished())

{

ThrowGenericInvalidData();

}

if (!_deflateStream._inflater.Finished())

{

Debug.Assert(_deflateStream._inflater.NonEmptyInput());

Debug.Assert(_deflateStream._inflater.AvailableOutput > 0);

ThrowGenericInvalidData();

}

jozkee · 2022-04-13T02:33:40Z

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs

@@ -418,6 +422,10 @@ async ValueTask<int> Core(Memory<byte> buffer, CancellationToken cancellationTok
                            int n = await _stream.ReadAsync(new Memory<byte>(_buffer, 0, _buffer.Length), cancellationToken).ConfigureAwait(false);
                            if (n <= 0)
                            {
+                                if (!_inflater.Finished() && _inflater.NonEmptyInput() && _inflater.AvailableOutput > 0)


I understand the condition as "if we haven't finished inflating AND there's partial input set AND there's already output available; we are in the bad state and we should throw".
Is it possible (i.e: an scenario) where there could be partial input but not available output?

Can you please double check test-code-coverage of the three conditions. If I remove _inflater.AvailableOutput > 0 all tests still pass.

Yes, without _inflater.AvailableOutput > 0, the following test fails.

System.IO.InvalidDataException : Found invalid data while decoding. Stack Trace: DeflateStream.ThrowGenericInvalidData() line 348 DeflateStream.ReadCore(Span`1 buffer) line 280 DeflateStream.Read(Byte[] buffer, Int32 offset, Int32 count) line 235 ZipFileTestBase.ReadAllBytes(Stream stream, Byte[] buffer, Int32 offset, Int32 count) line 83 ZipFileTestBase.StreamsEqual(Stream ast, Stream bst, Int32 blocksToRead) line 142 ZipFileTestBase.StreamsEqual(Stream ast, Stream bst) line 111 ZipFile_Create.CreateFromDirectory_IncludeBaseDirectory() line 42

This is the reason I added an async version of the test in this PR, to cover the async code path.

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/Inflater.cs

jozkee · 2022-04-13T02:57:26Z

src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.Gzip.cs

+                            continue;
+                        }
+
+                        throw new XunitException($"Truncated stream was decompressed successfully: length={i}");


Aside of truncation tests, could we test changing byte values?

sure, I'll re-use the test from the original SO thread, if that is fine with you.

@mfkl it's fine to reuse that from SO per the SO license. If you end up essentially pasting as-is, please add a few lines to our THIRD-PARTY-NOTICES to ensure that the poster is recognized. If you write something that effectively does the same thing, that's probably not necessary.

mfkl · 2022-05-02T08:10:14Z

Yes, I believe so @danmoseley.

src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs

jozkee · 2022-05-02T19:27:20Z

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs

+                        // - Inflation is not finished yet.
+                        // - Provided input wasn't completely empty
+                        // In such case, we are dealing with a truncated input stream.
+                        if (!buffer.IsEmpty && !_inflater.Finished() && _inflater.NonEmptyInput())


I've been rethinking it and I think it makes sense to not throw for buffer.IsEmpty, we don't pinvoke inflate in such case so there's no way to tell if there's still available output on the inflater.

src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs

jozkee · 2022-05-02T21:38:14Z

src/libraries/Common/tests/System/IO/Compression/CompressionStreamUnitTestBase.cs

+                        try
+                        {
+                            while(ZipFileTestBase.ReadAllBytes(s, buffer, 0, buffer.Length) != 0) { };
+
+                            Assert.Equal(source, buffer);
+                        }
+                        catch (InvalidDataException)
+                        {
+                        }


This will eat-up false positives. You can do this instead:

Assert.Throws<InvalidDataException>(() => { while (ZipFileTestBase.ReadAllBytes(s, buffer, 0, buffer.Length) != 0) ; });

If there are cases where a corruption can't be expected (e.g: changing a header byte), we should skip them, but they should be deliberated.

Ok, I did this (code may not be elegant though, let me know if you have a more elegant version in mind).

Skipping though means that we do not check the decompression process was successful in these cases (e.g. comparing the decompressed buffer with the initial source). It may or may not be relevant to do so, I'm not sure.

I'm not exactly sure what you mean here, but I think the way you have it in the latest commits is the way to go, we should always assert for the exception when desired, not just ignore it (unless there's a reason for it).

runtime/src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.ZLib.cs

Lines 49 to 52 in e4c1d7f

Assert.Throws<InvalidDataException>(() =>

{

while (ZipFileTestBase.ReadAllBytes(decompressor, buffer, 0, buffer.Length) != 0);

});

other types of stream can't detect corruption properly

mfkl · 2022-05-05T09:15:39Z

zlib should be able to detect corruption as with gzip currently (but not brotli). I'm looking for a way to add this test without duplicating code (as some types of streams are unable to detect such corruption).

mfkl · 2022-05-06T08:05:22Z

For the truncated error case, should a new error message be created to more precisely describe the problem?

Currently Found invalid data while decoding is used (and Decoder ran into invalid data for brotli). It could be Found truncated data while decoding to provide a more meaningful hint.

Aside from this, I think it's OK now. Let me know!

jozkee · 2022-05-10T00:30:02Z

src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.Gzip.cs

+            // corrupting these bytes goes undetected by gzip, skip them
+            int[] byteToSkip = { 3, 4, 5, 6, 7, 8, 9 };


Can you please add, as a comment, the reason why these bytes don't corrupt the compressed data?

jozkee · 2022-05-10T00:32:11Z

src/libraries/System.IO.Compression/tests/CompressionStreamUnitTests.ZLib.cs

@@ -16,5 +18,44 @@ public class ZLibStreamUnitTests : CompressionStreamUnitTestBase
        public override Stream CreateStream(Stream stream, CompressionLevel level, bool leaveOpen) => new ZLibStream(stream, level, leaveOpen);
        public override Stream BaseStream(Stream stream) => ((ZLibStream)stream).BaseStream;
        protected override string CompressedTestFile(string uncompressedPath) => Path.Combine("ZLibTestData", Path.GetFileName(uncompressedPath) + ".z");
+
+        [Fact]
+        public void StreamCorruption_IsDetected()


Shouldn't we add this test to CompressionStreamUnitTests.Deflate.cs as well?

Sadly this is not really an option as DeflateStream does not do CRC, hence fails to detect stream corruption (unlike zlib and gzip streams). Truncation detection works fine though.

jozkee · 2022-05-10T00:41:32Z

For the truncated error case, should a new error message be created to more precisely describe the problem?

Currently Found invalid data while decoding is used (and Decoder ran into invalid data for brotli). It could be Found truncated data while decoding to provide a more meaningful hint.

I think we can do that, feel free to add your proposed error message to the .\src\libraries\System.IO.Compression\src\Resources\Strings.resx file and use it as needed.

jozkee

LGTM, thanks @mfkl

ghost · 2022-05-16T20:26:35Z

Added needs-breaking-change-doc-created label because this PR has the breaking-change label.

When you commit this breaking change:

Create and link to this PR and the issue a matching issue in the dotnet/docs repo using the breaking change documentation template, then remove this needs-breaking-change-doc-created label.
Ask a committer to mail the .NET Breaking Change Notification DL.

Tagging @dotnet/compat for awareness of the breaking change.

mfkl · 2022-05-17T06:13:44Z

Thanks for the review feedback @jozkee :)

stephentoub · 2022-07-24T15:33:39Z

Reverted in #72742

stephentoub · 2022-09-23T17:24:17Z

@ericstj, I've removed the breaking change labels as this was reverted.

ericstj · 2022-09-23T17:37:52Z

Thank you @stephentoub

ghost added community-contribution Indicates that the PR has been added by a community member area-System.IO.Compression and removed community-contribution Indicates that the PR has been added by a community member labels Nov 18, 2021

mfkl commented Nov 18, 2021

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs Outdated Show resolved Hide resolved

mfkl commented Nov 18, 2021

View reviewed changes

src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs Outdated Show resolved Hide resolved

mfkl force-pushed the gzip-strict-validation branch from 196783b to 6adb821 Compare November 19, 2021 07:22

danmoseley reviewed Nov 26, 2021

View reviewed changes

danmoseley mentioned this pull request Nov 26, 2021

System.Diagnostics.Tests.StopwatchTests.GetTimestamp test fails in the CI #62021

Open

mfkl added 5 commits December 4, 2021 20:40

System.IO.Compression tests: add StreamTruncation_IsDetected

f3c171c

System.IO.Compression: detect and throw on truncated streams

984e58c

account for edge case

576ef2e

account for edge case in async version

0503aaf

rename nonZeroInput to nonEmptyInput

def1924

mfkl force-pushed the gzip-strict-validation branch from 9c54663 to def1924 Compare December 6, 2021 10:34

remove else to improve codegen

49bbcf4

jeffhandley assigned jozkee Jan 3, 2022

danmoseley closed this Jan 21, 2022

danmoseley reopened this Jan 21, 2022

jozkee reviewed Apr 13, 2022

View reviewed changes

jozkee reviewed May 2, 2022

View reviewed changes

mfkl added 5 commits May 5, 2022 13:03

review feedback - cosmetics

b0785b0

make BrotliStream detect truncation

fe84a6a

make StreamCorruption_IsDetected run for gzip only

5ae1a50

other types of stream can't detect corruption properly

skip byte corruption which results in correct decompression

931f49e

code style

0c492ab

mfkl force-pushed the gzip-strict-validation branch from 5f52914 to 0c492ab Compare May 5, 2022 08:57

add zlib corruption test, no skipping needed

e4c1d7f

jozkee reviewed May 10, 2022

View reviewed changes

mfkl added 2 commits May 13, 2022 09:55

clarify why we skip bytes in gzip test

1acd822

add and use truncated error data message

19f9b75

jozkee approved these changes May 16, 2022

View reviewed changes

jozkee added breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet labels May 16, 2022

jozkee merged commit e71a46b into dotnet:main May 16, 2022

jozkee mentioned this pull request May 16, 2022

Add strict validation to System.IO.Compression.GZipStream #47563

Closed

mfkl deleted the gzip-strict-validation branch May 17, 2022 06:13

ghost locked as resolved and limited conversation to collaborators Jun 16, 2022

stephentoub removed breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet labels Sep 23, 2022

	Assert.Throws<InvalidDataException>(() =>
	{
	while (ZipFileTestBase.ReadAllBytes(decompressor, buffer, 0, buffer.Length) != 0);
	});

		// corrupting these bytes goes undetected by gzip, skip them
		int[] byteToSkip = { 3, 4, 5, 6, 7, 8, 9 };

Detect truncated GZip streams #61768

Detect truncated GZip streams #61768

Uh oh!

Conversation

mfkl commented Nov 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 18, 2021

Uh oh!

Uh oh!

Uh oh!

mfkl commented Nov 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danmoseley commented Nov 26, 2021

Uh oh!

mfkl commented Jan 21, 2022

Uh oh!

danmoseley commented Jan 21, 2022

Uh oh!

mfkl commented Feb 7, 2022

Uh oh!

stackedsax commented Feb 28, 2022

Uh oh!

danmoseley commented Mar 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfkl commented May 2, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfkl commented May 5, 2022

Uh oh!

mfkl commented May 6, 2022

Uh oh!

mfkl commented Nov 18, 2021 •

edited

Loading