Description
is_valid_char returns false for values which are valid Unicode codepoints.
This is due to a misunderstanding of the way the 66 Unicode "non character" codepoints are supposed to be handled. See: "FAQ - Private-Use Characters, Noncharacters, and Sentinels"
Here are the relevant sections:
Q: Are noncharacters invalid in Unicode strings and UTFs?
A: Absolutely not. Noncharacters do not cause a Unicode string to be ill-formed in any UTF. This can be seen explicitly in the table above, where every noncharacter code point has a well-formed representation in UTF-32, in UTF-16, and in UTF-8. An implementation which converts noncharacter code points between one UTF representation and another must preserve these values correctly. The fact that they are called "noncharacters" and are not intended for open interchange does not mean that they are somehow illegal or invalid code points which make strings containing them invalid.
Q: So how should libraries and tools handle noncharacters?
A: Library APIs, components, and tool applications (such as low-level text editors) which handle all Unicode strings should also handle noncharacters. Often this means simple pass-through, the same way such an API or tool would handle a reserved unassigned code point. Such APIs and tools would not normally be expected to interpret the semantics of noncharacters, precisely because the intended use of a noncharacter is internal. But an API or tool should also not arbitrarily filter out, convert, or otherwise discard the value of noncharacters, any more than they would do for private-use characters or reserved unassigned code points.
[@jiahao - edited formatting of hyperlink]