Closed
Description
When multiple sequential new line characters appear inside a quoted CSV field, Pandoc coalesces these into a single SoftBreak
in the resulting AST. According to RFC 4180, this would seem to be incorrect behavior. The RFC's grammar treats CR and LF like any other character inside a quoted field.
Shouldn't individual LineBreak
s be returned for \r\n\r\n\r\n
rather than a single SoftBreak
by the CSV reader?
At minimum, I would think there should be no information loss during the read, which means encoding the original number of line breaks in some way. Currently, it's not possible to reconstruct the input data accurately from the AST.
Tested with Pandoc 3.1.13