Skip to content

Comments

Add CRC32 validation when reading zip archive entries#124766

Open
alinpahontu2912 wants to merge 1 commit intodotnet:mainfrom
alinpahontu2912:crc32_validation
Open

Add CRC32 validation when reading zip archive entries#124766
alinpahontu2912 wants to merge 1 commit intodotnet:mainfrom
alinpahontu2912:crc32_validation

Conversation

@alinpahontu2912
Copy link
Member

Add CrcValidatingReadStream to ZipCustomStreams that wraps decompressed entry streams and validates CRC32 checksums as data is read. The stream computes a running CRC32 over all bytes read and compares it against the expected CRC from the zip entry header when the expected number of bytes has been read.

  • Seeking is delegated to the base stream but abandons CRC tracking, since CRC validation requires sequential reading from the start.
  • Flush throws NotSupportedException consistent with read-only streams.
  • Corrupted entries (tampered uncompressed size) now throws InvalidDataException when CRC mismatch is detected.

Update existing corrupted stream tests to expect InvalidDataException from CRC validation instead of silently accepting tampered data.

Add CrcValidatingReadStream to ZipCustomStreams that wraps decompressed
entry streams and validates CRC32 checksums as data is read. The stream
computes a running CRC32 over all bytes read and compares it against the
expected CRC from the zip entry header when the expected number of bytes
has been read.

Key design decisions:
- Seeking is delegated to the base stream but abandons CRC tracking,
  since CRC validation requires sequential reading from the start.
- Flush throws NotSupportedException consistent with read-only streams.
- Corrupted entries (tampered uncompressed size) now correctly throw
  InvalidDataException when CRC mismatch is detected.

Update existing corrupted stream tests to expect InvalidDataException
from CRC validation instead of silently accepting tampered data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @karelz, @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CRC32 validation when reading ZIP archive entries to detect data corruption. A new CrcValidatingReadStream wrapper class is introduced that computes a running CRC32 checksum as data is read and validates it against the expected CRC from the ZIP entry header once all expected bytes have been read.

Changes:

  • Added CrcValidatingReadStream class that wraps decompressed entry streams and validates CRC32 checksums
  • Updated ZipArchiveEntry to wrap entry streams with CrcValidatingReadStream in read mode
  • Updated corrupted stream tests to expect InvalidDataException from CRC validation instead of silently accepting tampered data
  • Added CrcMismatch resource string for the validation error message

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs Added new CrcValidatingReadStream class that validates CRC32 checksums as data is read from ZIP entry streams
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Updated OpenInReadMode and OpenInReadModeGetDataCompressor to return CrcValidatingReadStream and wrap decompressed streams with CRC validation
src/libraries/System.IO.Compression/src/Resources/Strings.resx Added CrcMismatch error message for CRC validation failures
src/libraries/System.IO.Compression/tests/ZipArchive/zip_InvalidParametersAndStrangeFiles.cs Updated corrupted stream tests to expect InvalidDataException when reading tampered entries, simplified tests to focus on CRC validation

Comment on lines +727 to +732
public CrcValidatingReadStream(Stream baseStream, uint expectedCrc, long expectedLength)
{
_baseStream = baseStream;
_expectedCrc = expectedCrc;
_expectedLength = expectedLength;
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating that expectedLength is non-negative in the constructor. Negative values don't make sense for a length and could cause unexpected behavior in the comparison at line 808 and 822.

Copilot uses AI. Check for mistakes.
Comment on lines +727 to +732
public CrcValidatingReadStream(Stream baseStream, uint expectedCrc, long expectedLength)
{
_baseStream = baseStream;
_expectedCrc = expectedCrc;
_expectedLength = expectedLength;
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor should validate that baseStream is not null. Without this check, a null baseStream would result in a NullReferenceException when any stream operation is attempted, rather than an ArgumentNullException at construction time.

Copilot uses AI. Check for mistakes.

public CrcValidatingReadStream(Stream baseStream, uint expectedCrc, long expectedLength)
{
_baseStream = baseStream;
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _runningCrc field should be explicitly initialized to 0 in the constructor for clarity, even though the default value for uint is 0. This makes the intent clear that CRC32 calculation starts with an initial value of 0, consistent with the pattern used in CheckSumAndSizeWriteStream (line 503).

Suggested change
_baseStream = baseStream;
_baseStream = baseStream;
_runningCrc = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +808 to +811
if (_totalBytesRead >= _expectedLength)
{
ValidateCrc();
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition should check for equality (_totalBytesRead == _expectedLength) instead of greater-than-or-equal. The current implementation allows ValidateCrc() to be called on every read after _expectedLength is reached, even though it will only validate once when exactly equal. Using equality would ensure validation happens exactly once and makes the logic clearer. Additionally, if _totalBytesRead somehow exceeds _expectedLength (which shouldn't happen given the decompressor limits), it would be better to detect and handle this case explicitly.

Suggested change
if (_totalBytesRead >= _expectedLength)
{
ValidateCrc();
}
if (_totalBytesRead == _expectedLength)
{
ValidateCrc();
}
else if (_totalBytesRead > _expectedLength)
{
throw new InvalidDataException(SR.CrcMismatch);
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant