Content is corrupted in case of utf8 multibyte characters #743

Biktop · 2019-11-15T23:15:33Z

I found that in version 5.1.0 downloaded cvs parsed incorrectly in case if source data contains utf8 multibyte characters (whereas 5.0.2 didn't have this issue). I noticed that PapaParse tried to download second chunk even if there no any extra bytes:

First chunk:

< Content-Length: 184027
< Content-Range: bytes 0-184026/184027

Second chunk (not necessary)

> Range: bytes=183546-5426425
< Content-Length: 481
< Content-Range: bytes 183546-184026/184027

After some digging I think that problem here
this._start += xhr.responseText.length;

Because Content-Length can be greater then text length (because some characters can be 2 or more bytes).

I think it should be something like this:
this._start += +xhr.getResponseHeader('Content-Length');

The text was updated successfully, but these errors were encountered:

pokoli · 2019-11-19T14:23:14Z

I think this should be fixed by #745 could you please test it?

Biktop · 2019-11-19T18:54:40Z

I've tested it and it seems that it works!

Fixes #736 #743

pokoli · 2019-11-20T08:01:17Z

Ok, so I've merged #745 which fixes this

Biktop · 2019-12-17T23:50:51Z

Are going to publish this fix soon?

pokoli · 2019-12-18T11:53:43Z

I've just published 5.1.1 which contains this fix!

pokoli pushed a commit that referenced this issue Nov 20, 2019

Use chunk size to determine the processed length

ae73d2a

Fixes #736 #743

pokoli closed this as completed Nov 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content is corrupted in case of utf8 multibyte characters #743

Content is corrupted in case of utf8 multibyte characters #743

Biktop commented Nov 15, 2019

pokoli commented Nov 19, 2019

Biktop commented Nov 19, 2019

pokoli commented Nov 20, 2019 •

edited

Loading

Biktop commented Dec 17, 2019

pokoli commented Dec 18, 2019

Content is corrupted in case of utf8 multibyte characters #743

Content is corrupted in case of utf8 multibyte characters #743

Comments

Biktop commented Nov 15, 2019

pokoli commented Nov 19, 2019

Biktop commented Nov 19, 2019

pokoli commented Nov 20, 2019 • edited Loading

Biktop commented Dec 17, 2019

pokoli commented Dec 18, 2019

pokoli commented Nov 20, 2019 •

edited

Loading