[BUG] identifies UTF16LE for a pair of ascii punctuation characters #509
Closed
Description
Describe the bug
Introducing conventional ascii text returns UTF-16LE encoding
To Reproduce
import chardet, charset_normalizer
charset_normalizer.detect(b");") # error also happens with b"(;"
# returns {'encoding': 'utf_16_le', 'language': '', 'confidence': 1.0}
chardet.detect(b");")
# {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
Expected behavior
These are standard ASCII characters, I expect a UTF-8 encoding
Desktop (please complete the following information):
- macOS 14.5
- Python version 3.12.1 (anaconda build)
- charset_normalizer version 3.3.2
Additional context
Evaluate either b"(", b")", b";" or b"()" produces the expected result. There are other combinations of punctuation characters that produce the same error, e.g. b".;".
I understand this is a very small string but perhaps a default to the minimum character set?