Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-transforms] Ambiguous syntax for translate, scale, rotate and matrix in SVG #4455

Open
mclegrand opened this issue Oct 27, 2019 · 8 comments
Labels

Comments

@mclegrand
Copy link
Member

mclegrand commented Oct 27, 2019

translate and scale have number ( comma-wsp? number )? which allows for number number 💥 without separators, which confusers parsers and seems like a bug, which should be number ( comma-wsp number )?. Same for rotate : ( comma-wsp? number comma-wsp? number )? should be ( comma-wsp number comma-wsp number )?

(The ambiguity is : transform(12) should not be parsed as transform(1 2) just because "1" and "2" are numbers and hence the parsing can be done that way. )

edit : same for matrix which has comma-wsp**?** between all attributes

@mclegrand mclegrand changed the title [css-transforms] Ambiguous syntax for translate, scale, and rotate [css-transforms] Ambiguous syntax for translate, scale, and rotate in SVG Oct 27, 2019
@mclegrand mclegrand changed the title [css-transforms] Ambiguous syntax for translate, scale, and rotate in SVG [css-transforms] Ambiguous syntax for translate, scale, rotate and matrix in SVG Oct 27, 2019
mclegrand added a commit to mclegrand/csswg-drafts that referenced this issue Oct 28, 2019
Fixes w3c#4455

Spacing between consecutive numbers should not be optional
@tabatkins
Copy link
Member

I assume that the SVG abstract parser is described in sufficient detail to guarantee a "greedy" approach, so that example would necessarily parse as a single 12, right? (I'm saying this without trying to dig down into the parser details...)

That's definitely the case for CSS, at least.

@mclegrand
Copy link
Member Author

mclegrand commented Oct 28, 2019

The problem arose when implementing a ragel parser for svg transforms (in Inkscape), watching it cut floats in absurd places, and when debugging, seeing that the tree allowed it, then seeing that the spec allows it.

I "fixed" the problem for me by removing the ? but I think the only non-ambiguous cases where a space is not necessary are too absurd to care and that the ? should always be dropped in the spec in those cases (like scale(1..1) == scale(1,0.1), rotate(1e11e11e1)==rotate(10,10,10) [edit : note that a greedy approach here won't work as reading 1e11 prevents reading a valid float after it, even though there exist a valid parsing] , etc, are technically allowed currently, but the spec is currently unclear on what happens on ambiguous cases like translate(1.1.1) and those horrors should better be left invalid than implementation-dependent )

Hence this bug and MR

@tabatkins
Copy link
Member

And hm, looking into the spec, I don't find any mention of whether the parsing is greedy-unto-failure like CSS or if it allows backtracking like a regex. None of the mentions of BNF or links to ABNF/EBNF seem to make that detail clear, unless I'm missing a detail somewhere.

@mclegrand
Copy link
Member Author

mmh, reading further, maybe if I also implement a CSS lexer/tokenizer closely following https://www.w3.org/TR/css-syntax-3/#consume-number (which is sort of greedy) I will get a non-ambiguous result, instead of "just" using the BNF for numbers

@tabatkins
Copy link
Member

All the attributes specified using CSS syntax are indeed well-specified in this regard; CSS tokenization is greedy (and is generic, with no influence from grammars or context), and then parsing over those tokens is non-greedy (well, as greedy as a standard regex).

But the transform attribute's syntax is specified in BNF with no clarification in this regard, unfortunately. I know the intention is that numbers parse greedily, tho.

@mclegrand
Copy link
Member Author

OK, thanks!
In light of this, with the precision I don't think the spec has ambiguous syntax, just that the parsing could be much more simpler to do without the ? as it would not require a tokenizer anymore 🙂 (this issue can be closed if that's ok)

@AmeliaBR
Copy link
Contributor

Making the separator optional was an intentional change based on browser behavior, see #2623 (Unfortunately, it looks like no one thought to test Inkscape at the time!)

(For reference: The original syntax in SVG 11 required the separator. )

Part of that discussion covers the similarity with CSS and also SVG path parsing when it comes to optional separators if tokenization is unambiguous. (@mclegrand, you might want to check those sections of the Inkscape codebase to see how the other parsers handle the tokenization!) It should match how those other parsers handle the unseparated tokens: e.g., 5-5 same as 5 -5, and 0.1.2 same as 0.1 0.2.

I definitely don't want to start talking about using non-greedy parsing to try to avoid parsing errors. The matrix() function requires six parameters, should the parser split up a single six-digit number to make it pass??? If you have any suggestions on wording to clarify that part, Tab, please contribute them. But otherwise I think we can close this issue.

@tabatkins tabatkins reopened this Oct 30, 2019
@tabatkins
Copy link
Member

Re-opening because the Transforms spec is ambiguous here. It specifies parsing of transform just by reference to BNF, which doesn't specify whether "123456" can match "six digit+ sequences" or not!

Impls, and agreement with CSS, dictates that it shouldn't - that's a single digit+ - but it's not clear in the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants