Skip to content

mb_detect_encoding() results for UTF-7 differ between PHP 8.0 and 8.1 (if UTF-7 is present in the encodings list and the string contains '+' character) #10192

Closed
@unix-world

Description

@unix-world

Description

This is a very serious bug that may impact all strings in PHP.

The following code:

<?php

ini_set('display_errors', '1');	// display runtime errors
error_reporting(E_ALL & ~E_NOTICE & ~E_STRICT & ~E_DEPRECATED); // error reporting
date_default_timezone_set('UTC');

function detect_encoding($ystr, $csetlist='UTF-8, ISO-8859-1, ISO-8859-15, ISO-8859-2, ISO-8859-9, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-16, UTF-7, ASCII, SJIS, EUC-JP, JIS, ISO-2022-JP, EUC-CN, GB18030, ISO-2022-KR, KOI8-R, KOI8-U') {
	return mb_detect_encoding((string)$ystr, (string)$csetlist, true); // mixed: (bool) FALSE or (string) 'CHARSET'
}

echo detect_encoding('A + B'); // expected output: UTF-8, but on PHP 8.1.x / 8.2.x returns UTF-7 if the '+' character is present in a string ; the PHP 8.0.x and 7.4.x behaves correctly and outputs UTF-8 also in this case
echo "\n";
echo detect_encoding('A - B'); // expected output: UTF-8 ; correct on all PHP versions 8.2.x / 8.1.x / 8.0.x / 7.4.x

Resulted in this output ; On PHP 8.2 and 8.1, if the plus (+) character is present in a string will detect UTF-7 instead of UTF-8 as expected:

UTF-7
UTF-8

But I expected this output instead ; on PHP 7.4 and PHP 8.0 works correctly. Output is this):

UTF-8
UTF-8

I tested here with different PHP versions.
https://onlinephp.io/
I also tested in my computer with PHP 8.1.12 and 8.0.25

PHP Version

8.1.x / 8.2.x

Operating System

All

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions