java: Use a more consistent definition of whitespace #442
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🤔 What's changed?
Define whitespace as Unicode category
Zsor its bidirectional classWS,B, orS.⚡️ What's your motivation?
Gherkin lines were trimmed according regex pattern
\s+ NEL + NBSP while comments on tags lines were assumed to be delimited by just\s#. This leads to some inconsistent behaviour where adding a comment to the end of a tag line can make the tag line invalid.Each Gherkin implementations uses different definitions of whitespace.
These can be roughly categorized as using Unicode:
Zs,ZlandZpand\t,\v,\f,\rand NEL[3].\swhich match the set used by C + BOM[4]Zsor its bidirectional classWS,B, orS[5].And the other category:
,\f,\n,\r,\tand\v[2].and\t.Within the Unicode categorization there is significant overlap. So for Java I have chosen to match the Python definition of whitespace as it is completely defined in Unicode terms.
🏷️ What kind of change is this?
📋 Checklist: