Description
Referencing #421 which has been closed:
I just attempted to open the referenced file in Excel, and it seems to do the job well of determining the quotes that are cell boundaries vs textual quotes.
Out of curiosity, how does this work? It seems quite complicated to make a Regex that reliably solves all such cases, and it's a bit of a chicken-and-egg to build a csv-parser that works on these types of strings if the quote: true
option is enabled.
If I want to gracefully handle unescaped / unmatched quotes in the middle of a cell value, what options do I have? I really appreciate your advice!
For reference, right now I'm doing something like this:
function replaceEmbeddedQuotes(
readStream: Readable,
): Readable {
// Create a transform stream to process the data
const transformer = new Transform({
objectMode: true,
transform(
chunk: Buffer | string,
encoding: string,
callback: Function
) {
// Convert chunk to string if it's a buffer
const line = chunk instanceof Buffer ? chunk.toString() : chunk;
// If a quote is found in the middle of a field, double it
const processedLine = line.replaceAll(/(\s)"(\s)/g, '$1""$2');
// Push the processed line to the output stream
this.push(processedLine);
callback();
},
});
// Pipe the input stream through our transformer
return readStream.pipe(transformer);
}
And then passing that readable to csv-parse, but this doesn't handle some cases, like if the extra quote is not surrounded by spaces on each side, and I assume tehre are also cases where a valid end quote could be surrounded by spaces on each side, like
"a " , "b","c"