-
Notifications
You must be signed in to change notification settings - Fork 51
Limit custom character class ranges to single scalars #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
natecook1000
wants to merge
3
commits into
swiftlang:main
Choose a base branch
from
natecook1000:single-scalar-ranges
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev
Previous commit
Handle case insensitivity properly in CCC ranges
The prior implementation didn't make a lot of sense, and couldn't handle cases like `/(?i)[X-c]/`. This new approach uses simple case matching to test if the character is within the range, then tests if the uppercase or lowercase mappings are within the range. Fixes #395
- Loading branch information
commit b716d50c504fb1c37888db230584afee1f9f0180
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -276,54 +276,48 @@ extension DSLTree.CustomCharacterClass.Member { | |
} | ||
return c | ||
case let .range(low, high): | ||
// TODO: | ||
guard let lhs = low.literalCharacterValue, lhs.hasExactlyOneScalar else { | ||
throw Unsupported("\(low) in range") | ||
} | ||
guard let rhs = high.literalCharacterValue, rhs.hasExactlyOneScalar else { | ||
throw Unsupported("\(high) in range") | ||
} | ||
guard lhs <= rhs else { | ||
throw Unsupported("Invalid range \(low)-\(high)") | ||
} | ||
|
||
let isCaseInsensitive = opts.isCaseInsensitive | ||
let isCharacterSemantic = opts.semanticLevel == .graphemeCluster | ||
|
||
if opts.isCaseInsensitive { | ||
let lhsLower = lhs.lowercased() | ||
let rhsLower = rhs.lowercased() | ||
guard lhsLower <= rhsLower else { throw Unsupported("Invalid range \(lhs)-\(rhs)") } | ||
return { input, bounds in | ||
// TODO: check for out of bounds? | ||
let curIdx = bounds.lowerBound | ||
if isCharacterSemantic { | ||
guard input[curIdx].hasExactlyOneScalar else { return nil } | ||
if (lhsLower...rhsLower).contains(input[curIdx].lowercased()) { | ||
return input.index(after: curIdx) | ||
} | ||
} else { | ||
if (lhsLower...rhsLower).contains(input.unicodeScalars[curIdx].properties.lowercaseMapping) { | ||
return input.unicodeScalars.index(after: curIdx) | ||
} | ||
} | ||
return { input, bounds in | ||
// TODO: check for out of bounds? | ||
let curIdx = bounds.lowerBound | ||
let nextIndex = isCharacterSemantic | ||
? input.index(after: curIdx) | ||
: input.unicodeScalars.index(after: curIdx) | ||
if isCharacterSemantic && !input[curIdx].hasExactlyOneScalar { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this do to non-NFC characters? |
||
return nil | ||
} | ||
} else { | ||
guard lhs <= rhs else { throw Unsupported("Invalid range \(lhs)-\(rhs)") } | ||
return { input, bounds in | ||
// TODO: check for out of bounds? | ||
let curIdx = bounds.lowerBound | ||
if isCharacterSemantic { | ||
guard input[curIdx].hasExactlyOneScalar else { return nil } | ||
if (lhs...rhs).contains(input[curIdx]) { | ||
return input.index(after: curIdx) | ||
} | ||
} else { | ||
if (lhs...rhs).contains(Character(input.unicodeScalars[curIdx])) { | ||
return input.unicodeScalars.index(after: curIdx) | ||
} | ||
} | ||
let scalar = input.unicodeScalars[curIdx] | ||
let scalarRange = lhs.unicodeScalars.first! ... rhs.unicodeScalars.first! | ||
if scalarRange.contains(scalar) { | ||
return nextIndex | ||
} | ||
if !isCaseInsensitive { | ||
return nil | ||
} | ||
|
||
let stringRange = String(lhs)...String(rhs) | ||
if (scalar.properties.changesWhenLowercased | ||
&& stringRange.contains(scalar.properties.lowercaseMapping)) | ||
|| (scalar.properties.changesWhenUppercased | ||
&& stringRange.contains(scalar.properties.uppercaseMapping)) { | ||
return nextIndex | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this going back to lexicographical contains? |
||
} | ||
|
||
return nil | ||
} | ||
|
||
case let .custom(ccc): | ||
return try ccc.generateConsumer(opts) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really feel like this should be a helper function or fully refactored. It seems ripe for bugs if vigilance is required to remember to support scalar semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make it a helper function, but the real issue is that the two views share the same index type, and we can't enforce that you call the helper function. Maybe we could look at designing a wrapper type that would distinguish between the two? Seems like it add a lot of friction…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is rather than have a bunch of inner-most ifs, what would the code look like to have an outer-if? That is a more bottoms-up refactoring