Skip to content

Conversation

@alexdowad
Copy link
Contributor

This bug was found when I was fuzzing a patch related to mb_strpos. In some cases, the legacy text conversion code for UTF-7 (and UTF7-IMAP) would correctly recognize an error for a Base64-encoded section which was not correctly padded with zero bits, but the new (and faster) text conversion code would not.

Specifically, if the input string ended abruptly after the 4th or 7th byte of a Base64-encoded section, the new conversion code would confirm that the trailing padding bits from the previous byte (3rd or 6th) were zeroes, but would not check whether the 4th or 7th byte itself encoded any non-zero bits. The legacy conversion code did perform this check and would treat the input string as invalid.

Actually, even if the 4th or 7th byte does encode only (padding) zero bits, this is still a problem, because there is no reason to have a 4th (or 7th) byte in that case. The UTF-7 string should have ended on the previous byte instead.

Apply the same fix for both UTF-7 and UTF7-IMAP. Also add regression test cases.

FYA @cmb69 @Girgias @nikic

@alexdowad
Copy link
Contributor Author

Hmm, I actually intend to merge this into PHP-8.2 and then down into master, but forgot to set the target branch when opening the PR. Sigh.

…error

This bug was found when I was fuzzing a patch related to mb_strpos.
In some cases, the legacy text conversion code for UTF-7 (and
UTF7-IMAP) would correctly recognize an error for a Base64-encoded
section which was not correctly padded with zero bits, but the new
(and faster) text conversion code would not.

Specifically, if the input string ended abruptly after the 4th or 7th
byte of a Base64-encoded section, the new conversion code would
confirm that the trailing padding bits from the previous byte (3rd or
6th) were zeroes, but would not check whether the 4th or 7th byte
itself encoded any non-zero bits. The legacy conversion code did
perform this check and would treat the input string as invalid.

Actually, even if the 4th or 7th byte does encode only (padding) zero
bits, this is still a problem, because there is no reason to have a
4th (or 7th) byte in that case. The UTF-7 string should have ended
on the previous byte instead.

Apply the same fix for both UTF-7 and UTF7-IMAP.
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Hmm, I actually intend to merge this into PHP-8.2 and then down into master, but forgot to set the target branch when opening the PR.

You could still change the base branch (Edit button besides the PR title), but you would need to force push as well (other AppVeyor CI may not work). However, I think this is good to be applied to PHP-8.2 right away.

@alexdowad
Copy link
Contributor Author

Thanks so much for the review. Merged.

@alexdowad alexdowad closed this Nov 21, 2022
@alexdowad alexdowad deleted the utf7_fix branch December 14, 2022 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants