Skip to content

Conversation

@KamilBalwierz
Copy link
Contributor

I have encountered issue with multiline parsing.

str_getcsv for line ending with single instance of encloser will return as correct content, so if field starts by newline this causes parser to stop assuming it read full correct field

fgetcsv correctly looks for further input in case of following input:

one,"
two
",three

For now I am just checking for edgecase (single instance of encloser preceeded by delimiter) to handle this, but it looks like proper solution will have to stop relying on str_getcsv and do full text parsing instead.

@KamilBalwierz
Copy link
Contributor Author

Looked at internal implementation of str_getcsv and it just uses fgetcsv inside - but even if we were to swap it out issue of premature end of stream (as our buffer may not have read it all yet) exists as fgetcsv will not exit with any state encountering unexpected end of stream

src/Decoder.php Outdated
Comment on lines 112 to 114
\reset($data);
if ($last === "\n" && ($newline === 1 || $this->buffer[$newline - 1] !== $this->enclosure)) {
$edgeCase = \substr($this->buffer, $newline - 2, 3);
if ($last === "\n" && ($newline === 1 || $this->buffer[$newline - 1] !== $this->enclosure || $edgeCase === $this->delimiter . $this->enclosure ."\n")) {
Copy link
Contributor

@SimonFrings SimonFrings Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KamilBalwierz Thanks for looking into this 👍
Your casing seems a bit off, just a minor fix?

The rest looks fine to me :)

@SimonFrings
Copy link
Contributor

Thx for the update 👍. The only thing I have to note is that I don't think it's necessary to have two commits for these changes, maybe you could squash these two together?

@SimonFrings
Copy link
Contributor

Nice work 👍

@SimonFrings SimonFrings requested a review from clue October 20, 2021 11:06
@clue clue added the new feature New feature or request label Oct 21, 2021
@clue clue added this to the v1.2.0 milestone Oct 21, 2021
Copy link
Owner

@clue clue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KamilBalwierz Thank you for looking into this, changes LGTM! :shipit:

I agree that this is starting to look more complex than I'd like, but I agree that this would be a good first step to support parsing empty lines. Let's get this in and review future code improvements in a future PR 👍

@clue clue changed the title Multiline parsing issue Support parsing multiline values starting with quoted newline Oct 21, 2021
@clue clue merged commit 073ca2a into clue:master Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants