Skip to content

Conversation

tmds
Copy link
Member

@tmds tmds commented Jul 8, 2025

The current implementation is trimming spaces and nulls from the end.

Instead we need to find the terminating null from the start, and we shouldn't trim spaces at the end.

Fixes #117331.

@ericstj @ViktorHofer ptal.

cc @bruce965

The current implementation is trimming spaces and nulls from the end.

Instead we need to find the terminating null from the start,
and we shouldn't trim spaces at the end.
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 8, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-formats-tar
See info in area-owners.md if you want to be subscribed.

@tmds
Copy link
Member Author

tmds commented Jul 8, 2025

I have not included a test because the PR is removing some special handling and not introducing any.
All tests are expected to continue to pass with these changes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates how null-terminated fields are handled by stopping at the first null byte rather than trimming trailing nulls and spaces, and it adjusts all consumers accordingly.

  • Introduced TrimNullTerminated to handle null-terminated spans
  • Updated ParseOctal to use the new trimming logic and only ignore spaces
  • Replaced all GetTrimmed* calls in TarHeader.Read.cs with ParseUtf8String

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
TarHelpers.cs Added TrimNullTerminated, updated ParseOctal and ParseUtf8String, and removed old helpers
TarHeader.Read.cs Replaced GetTrimmedUtf8String calls with ParseUtf8String
Comments suppressed due to low confidence (3)

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs:274

  • Add an XML doc comment for ParseUtf8String explaining that it returns the UTF-8 string up to the first null byte without trimming spaces.
        internal static string ParseUtf8String(ReadOnlySpan<byte> buffer)

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs:280

  • Add an XML doc comment for TrimNullTerminated to clarify that it slices the span at the first null byte and leaves trailing spaces intact.
        internal static ReadOnlySpan<byte> TrimNullTerminated(ReadOnlySpan<byte> buffer)

src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHelpers.cs:243

  • Add unit tests for ParseOctal and ParseUtf8String covering scenarios with fields that include trailing spaces after the null terminator and fields with no null byte.
            buffer = TrimNullTerminated(buffer);

@ericstj
Copy link
Member

ericstj commented Jul 8, 2025

I have not included a test because the PR is removing some special handling and not introducing any.
All tests are expected to continue to pass with these changes.

We should have a test that validates that we trim correctly since we did that wrong in an observable way before. So a tar with an embedded null should truncate the name at the null. You can add such a file to https://github.com/dotnet/runtime-assets/tree/main/src/System.Formats.Tar.TestData or you could produce a tar and then modify the bytes.

@ericstj ericstj self-assigned this Jul 8, 2025
@ericstj ericstj self-requested a review July 8, 2025 16:08
@tmds
Copy link
Member Author

tmds commented Jul 8, 2025

I'll add a test case. Also, CI found some failing tests which I'll need to look into.

@tmds
Copy link
Member Author

tmds commented Jul 9, 2025

We should have a test that validates that we trim correctly since we did that wrong in an observable way before. So a tar with an embedded null should truncate the name at the null.

@ericstj I've created an archive that looks like:

00000000  20 20 66 69 6c 65 20 20  00 6e 6f 6e 73 65 6e 73  |  file  .nonsens|
00000010  65 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |e...............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000060  00 00 00 00 30 30 30 30  36 34 34 00 30 30 30 30  |....0000644.0000|
00000070  30 30 30 00 30 30 30 30  30 30 30 00 30 30 30 30  |000.0000000.0000|
00000080  30 30 30 30 30 30 30 00  31 35 30 33 33 34 36 33  |0000000.15033463|
00000090  32 31 35 00 30 31 30 36  30 37 00 20 30 00 00 00  |215.010607. 0...|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  00 75 73 74 61 72 00 30  30 00 00 00 00 00 00 00  |.ustar.00.......|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000600

Note that the filename is file (with 2 spaces at the start and end), and after the terminating zero there is some nonsense in the name field.

The test will extract this entry and validate the name is file .

Is this sufficient?

To use this archive in a test, I need to add it to the https://github.com/dotnet/runtime-assets repo?

@ericstj
Copy link
Member

ericstj commented Jul 9, 2025

To use this archive in a test, I need to add it to the https://github.com/dotnet/runtime-assets repo?

You can either

  1. add it to that repo and it will get packaged up with the rest of the test archives - seems to me like this is what most of the tar tests are doing.
  2. construct it during the test
  3. base-64 encode it as a string and load from source code (if small enough)

We have cases of all of these. My preference would be 2 since it's most readable, but not sure if that's possible. Since it's so small I think 3 is acceptable and avoids the need to modify runtime-assets.

@tmds
Copy link
Member Author

tmds commented Jul 9, 2025

My preference would be 2 since it's most readable

I think this can work. I'll try tomorrow.

@tmds
Copy link
Member Author

tmds commented Jul 10, 2025

@ericstj I've added a test. Can you take another look?

@ericstj ericstj merged commit 0d7b88c into dotnet:main Jul 21, 2025
85 of 88 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Formats.Tar community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.Formats.Tar null-terminated strings not trimmed correctly
2 participants