-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect general-purpose bit flags when decoding ZipArchiveEntry names and comments #103271
Merged
Merged
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Resolving 92283
If bit 11 in the general purpose bit flags is set, forces the use of UTF-8 instead of the encoding specified in the ZipArchive constructor.
- Loading branch information
commit f2723da69a9218468546f0804497c45024aa0ef4
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're bringing back a condition similar to the one we used to have in .NET 6, as proposed in the issue description. I can see that it fixes the problem. I'd like to provide some context.
Before this change:
The archive's encoding was set either via the ZipArchive public constructors or the ZipFile Open or Create methods that take an entryNameEncoding argument. This encoding was then passed to the ZipArchiveEntry constructors that take an encoding argument as well. This is important because we need to respect that encoding when the uses passes it as an argument.
We would only read the encoding from each entry's general purpose bit when the user did not pass an entryNameEncoding argument or when the value was passed as null, in which case the archive entries were constructed using the internal ZipArchiveEntry constructor that take an archive and a central directory header as arguments.
What we were not doing was read this general purpose bit flag value when the public ZipArchiveEntry constructors were called, meaning that when the time came to check the encoding of the entry, we would check if the archive's EntryNameAndCommentEncoding had a value, but it would always be unset (because of the constructor used to create this entry). So the next step was to default to UTF8 and read the file comment using that encoding.
After this change:
We will always first read the entry's general purpose bit flag to see if the value was set. If not, then we will do what we were doing before: check the EntryNameAndCommentEncoding, and if that is unset, we default to UTF8 to read the file comment.
The bit flag will only be set when the internal entry constructor is called, which was the bug. This should not affect entries created with the public constructors, meaning the behavior remains unmodified for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context @carlossanlop. I think we've both read the pre-PR code for decoding existing
ZipArchiveEntry
s in the same way.I'll re-check and test for the correct behaviour when creating a new ZipArchiveEntry - I can see that L136 of ZipArchiveEntry sets FullName, and this property's setter uses the existing behaviour you've described to convert the value to a byte array. This looks correct to me, but I'll check.
Was prioritising the supplied encoding over the general-purpose bit flags deliberate? The ZIP file spec. seems to indicate that it's the other way around: if bit 11 of the flags is set, the filename must always be UTF-8. I'm looking at the latest version, sections 4.4.4 and appendix D.
Section 4.4.4:
Appendix D:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to sign-off. I looked at the change history in this area, I wanted to double check that we were not regressing this: #87883 Note that I added the tests for that fix in a follow-up PR: #88978
I can see that we did not break them in this PR.