fix(core): handle backspace decomposition#15488
Conversation
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification. This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe. Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior. Fixes: #15487
User Test ResultsTest specification and instructions
Test Artifacts
|
There was a problem hiding this comment.
Fix is entirely in this file; other changes are additional unit tests and enabling correct logging of keystrokes in the unit test log files.
|
@rc-swag @sgschantz @ermshiperete given this change is a significant fix to Keyman Core, I would like to request review from all of you. @srl295, feel free to ignore; just as you have capacity to take a look. |
| @@expected: abce | ||
| Comment: #15487: bksp will delete just acute (even though it will have combined to NFC, internally we do NFD bksp) | ||
|
|
||
| TODO: this seems to be working NFD on app context side also, do we need to set a flag? this may break other tests so care is needed |
There was a problem hiding this comment.
TODO: this seems to be working NFD on app context side also, do we need to set a flag? this may break other tests so care is needed.
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification. This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe. Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior. This cherry-pick only adds the direct normalization patch and unit test and skips the additional changes for logging. Fixes: #15487 Cherry-pick-of: #15488
| if(app_context_nfd == cached_context_string.substr(0, app_context_nfd.length())) { | ||
| context_final = cached_context_string.substr(app_context_nfd.length()); |
There was a problem hiding this comment.
maybe want some more comments here.
What we're trying to say is, if the app context matches the beginning of the cache,
break but remember if there was any content that was added after the cache.
So context_final will now have any 'suffix' that is NOT reflected in the app context.
| */ | ||
|
|
||
| auto output_nfc = output; | ||
| auto output_nfc = context_final + output; |
There was a problem hiding this comment.
Here, any of that 'final' suffix that was NOT already in the app context, needs to be prepended to the output.
so the full picture before the insertion point is:
<app_context><context_final><output>|
|
looks good! i think it could be good to add some comments and even put the case into the comments (the example case) to document what |
Co-authored-by: srl295@gmail.com
|
@mcdurdin For the E-ACUTE test cases, shouldn't the final unicode value be U+00E9? |
Thank you, good catch -- I had started with A-ACUTE... but then wanted to differentiate. |
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
| test_actions_normalize( | ||
| "One backspace to delete last NFD character (#15487)", | ||
| /* app context pre transform: */ u"abcê", // NFC | ||
| /* cached context post transform: */ u"abce", |
There was a problem hiding this comment.
If the app context pre transform is the same as l.327 (both have NFC \u00EA),
shouldn't cached context post transform match l.328? (u"abce\u0323\u0302")
There was a problem hiding this comment.
No, because the action line is different, for l.319, we have no output (U""), but for l.330, we have U"\u0323\u0302"
Prerequisite
GROUP_MAC:Test Specs
TEST_COMPLIANT_E_ACUTE_VERIFY_CONTEXT (PASSED):
TEST_COMPLIANT_E_ACUTE (PASSED):
TEST_NONCOMPLIANT_E_ACUTE_VERIFY_CONTEXT (PASSED):
TEST_NONCOMPLIANT_E_ACUTE (FAILED):
GROUP_WINDOWS:Test Specs
|
|
@Meng-Heng could you run all the tests for mac with the .kmn based keyboard attached below, and report back on results (not as a user test, just as a comment): I would like to try and isolate the cause of the failure on macOS -- whether it is something in the LDML core implementation or a separate compatibility issue. Thanks! |
|
@mcdurdin, hopefully this is easy to follow:
Keyman 19.0193-alpha-test-15488 |
@Meng-Heng thanks, this is great -- although it's sad that there are failures. I will follow up with @sgschantz; looks like it's probably a mac-specific issue rather than anything relating to this PR. |
Test ResultsGROUP_LINUX: Test on LinuxTests run on Ubuntu 24.04 under X11. Two issues noted on Wayland which do not pertain to this PR:
For compliant app, used Gnome Text Editor 46.3. For non-compliant app, used Chrome 144.0.7559.109.
Performance in non-compliant apps was very slow under X11, but this may have been a configuration issue on my test machine? (@ermshiperete FYI) |
|
Test-bot: retest GROUP_MAC TEST_NONCOMPLIANT_E_ACUTE TEST_COMPLIANT_D_ACUTE TEST_NONCOMPLIANT_A_UMLAUT |
Test ResultsGROUP_MAC: Test on macOS
|
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Cherry-pick-of: #15506 Relates-to: CLDR-19218
|
Changes in this pull request will be available for download in Keyman version 19.0.198-alpha |
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification.
This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe.
Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior.
Fixes: #15487
Build-bot: release:windows,linux,mac
.
User Testing
Preparation: install the keyboard bksp_ldml.kmp from the archive below.
For these tests, always type the following key sequences:
GROUP_WINDOWS: Test on Windows 11
GROUP_MAC: Test on macOS
GROUP_LINUX: Test on Linux
TEST_COMPLIANT_E_ACUTE_VERIFY_CONTEXT: In a compliant app, type the E-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E9.TEST_COMPLIANT_E_ACUTE: In a compliant app, type the E-ACUTE sequence. Then press bksp. The result should be
abc e.TEST_NONCOMPLIANT_E_ACUTE_VERIFY_CONTEXT: In a non-compliant app, type the E-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E9.TEST_NONCOMPLIANT_E_ACUTE: In a non-compliant app, type the E-ACUTE sequence. Then press bksp. The result should be
abc e.TEST_COMPLIANT_D_ACUTE_VERIFY_CONTEXT: In a compliant app, type the D-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last two characters are
U+0064 U+0301.TEST_COMPLIANT_D_ACUTE: In a compliant app, type the D-ACUTE sequence. Then press bksp. The result should be
abc d.TEST_NONCOMPLIANT_D_ACUTE_VERIFY_CONTEXT: In a non-compliant app, type the D-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last two characters are
U+0064 U+0301.TEST_NONCOMPLIANT_D_ACUTE: In a non-compliant app, type the D-ACUTE sequence. Then press bksp. The result should be
abc d.TEST_COMPLIANT_A_UMLAUT_VERIFY_CONTEXT: In a compliant app, type the A-UMLAUT sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E4.TEST_COMPLIANT_A_UMLAUT: In a compliant app, type the A-UMLAUT sequence. Then press bksp. The result should be
abc a.TEST_NONCOMPLIANT_A_UMLAUT_VERIFY_CONTEXT: In a non-compliant app, type the A-UMLAUT sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E4.TEST_NONCOMPLIANT_A_UMLAUT: In a non-compliant app, type the A-UMLAUT sequence. Then press bksp. The result should be
abc a.Note on compliant and non-compliant apps