Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stripquoted doesn't know to strip quoted whitespace when space is the delimiter #115

Closed
nickrobinson251 opened this issue Apr 22, 2022 · 1 comment · Fixed by #116
Closed

Comments

@nickrobinson251
Copy link
Collaborator

julia> str = "{hey, there }"
"{hey, there }"

expected behaviour, with delim=',' (and default wh1, wh2):

julia> res = Parsers.xparse(String, str; openquotechar='{', closequotechar='}', stripquoted=true, delim=',')
Parsers.Result{PosLen}(37, 13, PosLen(0x000000000020000a))

julia> Parsers.getstring(str, res.val, 0x22)
"hey, there"

unexpected behaviour, with delim=' ' (and wh1 changed, since its required):

julia> res = Parsers.xparse(String, str; openquotechar='{', closequotechar='}', stripquoted=true, delim=' ', wh1=0x00)
Parsers.Result{PosLen}(37, 13, PosLen(0x000000000020000b))

julia> Parsers.getstring(str, res.val, 0x22)  # has trailing whitespace
"hey, there " 

The trailing whitespace in the quoted string isn't striped in the second case, because we explicitly set wh1 to a value that wasn't ' ' due to ' ' being the delimiter... but there's no way to tell Parsers "inside quotes treat ' ' as whitespace not the delimiter, just like you treat ',' as a regular character inside quotes even when comma is the delimiter"

One way to fix this would be to hardcode certain characters as always being whitespace when quoted e.g.

-                if options.stripquoted && b != options.wh1 && b != options.wh2
+                if options.stripquoted && b != options.wh1 && b != options.wh2 && b != UInt8(' ') && b != UInt8('\t')
                     lastnonwhitespacepos = pos
                 end

Parsers.jl/src/strings.jl

Lines 100 to 102 in 462fb55

if options.stripquoted && b != options.wh1 && b != options.wh2
lastnonwhitespacepos = pos
end

@nickrobinson251
Copy link
Collaborator Author

nickrobinson251 commented Apr 22, 2022

In case you're wondering, the data i'm dealling with looks like:

    8 'ABC     ' 138.00 1      .000      .000   1   1 1.00000      .0000     1
    9 'ABCDEFGH'  69.00 1      .000      .000   1   1 1.00000      .0000     1

where space is the delimiter, and values are padded to make them line up... with the "padding" occuring inside the quotes for quoted values, i.e. strings are quoted and padded with trailing whitespace to the size of the longest string, but we want to strip the padding when parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant