Skip to content

handling invalid quotes gracefully #453

Open
@mmarkell

Description

Referencing #421 which has been closed:

I just attempted to open the referenced file in Excel, and it seems to do the job well of determining the quotes that are cell boundaries vs textual quotes.

Image

Out of curiosity, how does this work? It seems quite complicated to make a Regex that reliably solves all such cases, and it's a bit of a chicken-and-egg to build a csv-parser that works on these types of strings if the quote: true option is enabled.

If I want to gracefully handle unescaped / unmatched quotes in the middle of a cell value, what options do I have? I really appreciate your advice!

For reference, right now I'm doing something like this:

function replaceEmbeddedQuotes(
    readStream: Readable,
): Readable {
    // Create a transform stream to process the data
    const transformer = new Transform({
        objectMode: true,
        transform(
            chunk: Buffer | string,
            encoding: string,
            callback: Function
        ) {
            // Convert chunk to string if it's a buffer
            const line = chunk instanceof Buffer ? chunk.toString() : chunk;

            // If a quote is found in the middle of a field, double it
            const processedLine = line.replaceAll(/(\s)"(\s)/g, '$1""$2');

            // Push the processed line to the output stream
            this.push(processedLine);
            callback();
        },
    });

    // Pipe the input stream through our transformer
    return readStream.pipe(transformer);
}

And then passing that readable to csv-parse, but this doesn't handle some cases, like if the extra quote is not surrounded by spaces on each side, and I assume tehre are also cases where a valid end quote could be surrounded by spaces on each side, like
"a " , "b","c"

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions