Break multi bytes UTF-8 characters when parsing in Node-style

PapaParse breaks multi bytes UTF-8 characters when they are sliced between different chunks of `Buffer`.
For example `ç` would become `��`.

To reproduce:

```js
const Papa = require('papaparse')
const {PassThrough} = require('stream')

const csvFileString = 'first_name,last_name\nFrançois,Mitterrand\n'

const input = new PassThrough()
const parser = Papa.parse(Papa.NODE_STREAM_INPUT, {header: true})

input.pipe(parser)

parser.on('data', row => console.log(row))

input.write(Buffer.from(csvFileString).slice(0, 26))
input.write(Buffer.from(csvFileString).slice(26))
input.end()
```

```js
{ first_name: 'Fran��ois', last_name: 'Mitterrand' }
```

A workaround is to ensure UTF-8 decoding with `string_decoder` (internal Node module), `WHATWG TextDecoder` or with `iconv-lite` (user-land dependency).
But a better answer is to use `string_decoder` or `TextDecoder` into `PapaParse`, in place of `chunk.toString()`.

Related to #751 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Break multi bytes UTF-8 characters when parsing in Node-style #908

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Break multi bytes UTF-8 characters when parsing in Node-style #908

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions