Description
I am emitting some non English characters (in Hindi language) in json format from my node.js socket.io server to an android client. It all works fine in case of websocket
connection but in case of polling
it changes the Non-English characters to some garbage values.
On diving deep in the code I found this happens because in both of them it calls decodePacket() function. Now in case of websocket
it passes the value of boolean utf8decode
as False where as in case of polling
it passes the value as True which ultimately calls UTF8.decode(data).
Now in UTF8.Java it first makes a new array of charPoints for all the characters in the message string.
Now for each charPoint in the array it evaluates decodeSymbol(), which returns the converted codePoint value. Now in my case the code point values for non english character is greater than 255 (>2000 actually), so when this gets passed through the function, it should process it till the third byte (I am not sure what exactly this algo is doing). But this ends at #L100 returning only the value of byte1.
This makes my characters change to garbage value.
If someone can help me understand why this utf8 decode then encode is exactly required or at least what this is doing?