-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-101372: Fix unicodedata.is_normalized to properly handle the UCD 3… #101388
Conversation
All ranges of characters are candidates for testing. Test scriptimport unicodedata
with open('foo.out', 'w') as f:
for x in range(0x110000):
for form in ('NFC', 'NFD', 'NFKC', 'NFKD'):
norm = unicodedata.ucd_3_2_0.normalize(form, chr(x))
if not unicodedata.ucd_3_2_0.is_normalized(form, norm):
f.write(f'{str(x)},{form}\n') AS-IS
TO-BE
|
@serhiy-storchaka I will merge this PR by next week, please let me know if there need some changes |
I am not happy with provided tests. Testing all range of Unicode characters is slow (few seconds on my computer), it should be decorated with The test for multicharacter string is not what I meant. It should not only test all normalized sequences, but also non-normalized sequences. For example, I tried to write more interesting tests for I propose to merge your PR without tests. The bugfix itself is obvious, and the tests I will add later. |
Okay got it, Please let me know once you submit the patch for test codes. I may learn a lot from the patch. |
Thanks @corona10 for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11. |
GH-101597 is a backport of this pull request to the 3.11 branch. |
… UCD 3… (pythongh-101388) (cherry picked from commit 9ef7e75) Co-authored-by: Dong-hee Na <donghee.na@python.org>
GH-101598 is a backport of this pull request to the 3.10 branch. |
… UCD 3… (pythongh-101388) (cherry picked from commit 9ef7e75) Co-authored-by: Dong-hee Na <donghee.na@python.org>
….2.0