Description
Cause GB18030-2005 is already one-to-one mapping bettween Unicode & GBK18030 except
The 14 characters that still mapped into Unicode PUA that at 2005,
But nowadays, all the 14 characters have correlated mapping into Unicode,
So I suggest encoding standard mapping those characters to normal Unicode characters but PUA characters.
The following 80 characters are the GBK chracters that ever mapped to Unicode PUA, and
the corresponding Unicode non-PUA character
Han Character GBK Unicode PUA Unicode non-PUA
FE50 E815 2E81
FE51 E816 20087
FE52 E817 20089
FE53 E818 200CC
FE54 E819 2E84
FE55 E81A 3473
FE56 E81B 3447
FE57 E81C 2E88
FE58 E81D 2E8B
FE59 E81E 9FB4
FE5A E81F 359E
FE5B E820 361A
FE5C E821 360E
FE5D E822 2E8C
FE5E E823 2E97
FE5F E824 396E
FE60 E825 3918
FE61 E826 9FB5
FE62 E827 39CF
FE63 E828 39DF
FE64 E829 3A73
FE65 E82A 39D0
FE66 E82B 9FB6
FE67 E82C 9FB7
FE68 E82D 3B4E
FE69 E82E 3C6E
FE6A E82F 3CE0
FE6B E830 2EA7
FE6C E831 215D7
FE6D E832 9FB8
FE6E E833 2EAA
FE6F E834 4056
FE70 E835 415F
FE71 E836 2EAE
FE72 E837 4337
FE73 E838 2EB3
FE74 E839 2EB6
FE75 E83A 2EB7
FE76 E83B 2298F
FE77 E83C 43B1
FE78 E83D 43AC
FE79 E83E 2EBB
FE7A E83F 43DD
FE7B E840 44D6
FE7C E841 4661
FE7D E842 464C
FE7E E843 9FB9
FE80 E844 4723
FE81 E845 4729
FE82 E846 477C
FE83 E847 478D
FE84 E848 2ECA
FE85 E849 4947
FE86 E84A 497A
FE87 E84B 497D
FE88 E84C 4982
FE89 E84D 4983
FE8A E84E 4985
FE8B E84F 4986
FE8C E850 499F
FE8D E851 499B
FE8E E852 49B7
FE8F E853 49B6
FE90 E854 9FBA
FE91 E855 241FE
FE92 E856 4CA3
FE93 E857 4C9F
FE94 E858 4CA0
FE95 E859 4CA1
FE96 E85A 4C77
FE97 E85B 4CA2
FE98 E85C 4D13
FE99 E85D 4D14
FE9A E85E 4D15
FE9B E85F 4D16
FE9C E860 4D17
FE9D E861 4D18
FE9E E862 4D19
FE9F E863 4DAE
FEA0 E864 9FBB
The following 14 characters are the GB18030-2005 chracters that are still mapped to Unicode PUA, and
I suggest the encoding standard mapping those characters into Unicode non-PUA, cause we have no need
to waiting GB18030 to update it's spec just for those 14 chracters, and we could be sure those 14 chracters's
corresponding Unicode non-PUA characters are decided.
Han Character GBK Unicode PUA Unicode non-PUA
FE51 E816 20087
FE52 E817 20089
FE53 E818 200CC
FE59 E81E 9FB4
FE61 E826 9FB5
FE66 E82B 9FB6
FE67 E82C 9FB7
FE6C E831 215D7
FE6D E832 9FB8
FE76 E83B 2298F
FE7E E843 9FB9
FE90 E854 9FBA
FE91 E855 241FE
FEA0 E864 9FBB
And according to these, we can decode all GBK encoding family strings to non-PUA Unicode,
Besides these, we still have the need to convert all the historical Unicode PUA characters
to proper GBK(GB18030) characters.