Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix: error decoding some Korean Hangul graphemes (Mudlet#7431)
#### Brief overview of PR changes/additions Extends the testing for a range of sequences of UTF-8 encoded bytes to be "tighter" - so that a smaller (half the previous) range of code-points were rejected. This range should now only include those that would indicate UTF-16BE surrogate code points (UTF-16, like UTF-8, is a variable length encoding and that range is reserved for conveying the code-points that need a pair of UTF-16 values) and NOT include some of the range (inside U+D000 to U+D7FF) that is taken by graphemes particularly used for Korean. #### Motivation for adding to Mudlet A Korean user noticed that some graphemes in the text for the MUD they were on were not being displayed correctly by Mudlet as they were being replaced by the "Replacement Character" but a screen shot from a different client showed proper Korean (Hangul) ones. Testing revealed that a range of code-points were being rejected as being High or Low Surrogates when they were not. #### Other info (issues closed, discussion etc) This should close Mudlet#7429 Signed-off-by: Stephen Lyons <slysven@virginmedia.com> Co-authored-by: Vadim Peretokin <vperetokin@hey.com>
- Loading branch information