Skip to content

Reading CSV file returns incorrect line break content #9797

Closed
@jeffvalk

Description

@jeffvalk

When multiple sequential new line characters appear inside a quoted CSV field, Pandoc coalesces these into a single SoftBreak in the resulting AST. According to RFC 4180, this would seem to be incorrect behavior. The RFC's grammar treats CR and LF like any other character inside a quoted field.

Shouldn't individual LineBreaks be returned for \r\n\r\n\r\n rather than a single SoftBreak by the CSV reader?

At minimum, I would think there should be no information loss during the read, which means encoding the original number of line breaks in some way. Currently, it's not possible to reconstruct the input data accurately from the AST.

Tested with Pandoc 3.1.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions