-
-
Notifications
You must be signed in to change notification settings - Fork 132
fix(core): normalization segment should end on NFC boundary, not NFD #15506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): normalization segment should end on NFC boundary, not NFD #15506
Conversation
User Test ResultsTest specification and instructions
|
f2cde17 to
82083f8
Compare
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
82083f8 to
766c699
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this write NFC now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, missed that, good catch!
core/tests/unit/ldml/ldml.cpp
Outdated
| return EXIT_FAILURE; | ||
| return rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return EXIT_FAILURE; | |
| return rc; | |
| return EXIT_FAILURE; |
|
|
||
| void print_context(std::u16string &text_store, km_core_state *&test_state, std::vector<km_core_context_item> &test_context); | ||
|
|
||
| bool g_beep_found = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to set g_beep_found = false; at the beginning of run_test to make things more robust.
@mcdurdin, could you confirm that U+09CC is suppose to be U+09CB? |
The output is supposed to be U+09CC. Apologies, the keystrokes should be deA. My mistake. I will update the test spec too. |
Co-authored-by: Darcy Wong <darcy_wong@sil.org> Co-authored-by: Eberhard Beilharz <ermshiperete@users.noreply.github.com>
Test Prerequisites
Test ResultsGROUP_WINDOWS:Test Specs:
GROUP_MAC:Test Specs:
|
Test ResultsTested on Ubuntu 24.04 X11 with Gnome Text Editor. GROUP_LINUX: Test on Linux
|
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Cherry-pick-of: #15506 Relates-to: CLDR-19218
|
Changes in this pull request will be available for download in Keyman version 19.0.198-alpha |
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending
U+09D7to a context ofU+0995 U+09C7should result inU+0995 U+09CC.The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218
This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger).
Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable.
Fixes: #15491
Fixes: #15505
Follows: #15488
Relates-to: CLDR-19218
Build-bot: release:windows,linux,mac
User Testing
Tests should be run with the following keyboard: bn_ldml.zip
GROUP_WINDOWS: Test on Windows
GROUP_MAC: Test on macOS
GROUP_LINUX: Test on Linux
TEST_NORMALIZATION: Type deA. Copy the output and paste it into a character viewer. The output should be
U+09A6 U+09CC.