Skip to content

Unicode decode error for Non-English characters #430

Closed
@csoni111

Description

@csoni111

I am emitting some non English characters (in Hindi language) in json format from my node.js socket.io server to an android client. It all works fine in case of websocket connection but in case of polling it changes the Non-English characters to some garbage values.
On diving deep in the code I found this happens because in both of them it calls decodePacket() function. Now in case of websocket it passes the value of boolean utf8decode as False where as in case of polling it passes the value as True which ultimately calls UTF8.decode(data).

Now in UTF8.Java it first makes a new array of charPoints for all the characters in the message string.
Now for each charPoint in the array it evaluates decodeSymbol(), which returns the converted codePoint value. Now in my case the code point values for non english character is greater than 255 (>2000 actually), so when this gets passed through the function, it should process it till the third byte (I am not sure what exactly this algo is doing). But this ends at #L100 returning only the value of byte1.
This makes my characters change to garbage value.

If someone can help me understand why this utf8 decode then encode is exactly required or at least what this is doing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions