-
Notifications
You must be signed in to change notification settings - Fork 81
Add parser for case-iterable, string raw-representable values #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
extension Parsers { | ||
public struct CaseIterableRawRepresentableParser< | ||
Input: Collection, Output: CaseIterable & RawRepresentable, Prefix: Collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been able to relax some constraints by adding a separate generic for the prefix type that is used to test the starts(with:)
logic in parse
. So the type is a lil more intense than in the printer branch...
I just added a commit that brings over the enum Person: Int, CaseIterable {
case blob = 4
case blobJr = 42
}
let peopleParser = Many {
Person.parser()
} separator: {
",".utf8
} terminator: {
End()
}
var input = "4,42"[...].utf8
XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr]) I still need to add docs the the raw value integer parsers. |
Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md
Outdated
Show resolved
Hide resolved
…le.md Co-authored-by: Thomas Grapperon <35562418+tgrapperon@users.noreply.github.com>
@@ -0,0 +1,133 @@ | |||
extension CaseIterable where Self: RawRepresentable, RawValue == Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrandonw would RawValue: LosslessStringConvertible & Comparable
also work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iampatbrown If I'm not mistaken, the restriction can theoretically be narrowed to being expressible by an integer or floating-point literal, because that's ultimately what will be allowed for RawRepresentable
enums (other than those with RawValue
being ExpressibleByStringLiteral)
. So we can cover every situation with 3 overloads (ExpressibleBy(Integer/Float/String)Literal
), but I don't know if we can easily extract parser/printers from it.
Being Comparable
allows to reach UInt
s for example, but it doesn't help with custom made types. For example, this compiles:
struct Value: RawRepresentable, Equatable {
var rawValue: Int
init(rawValue: Int) { self.rawValue = rawValue }
}
extension Value: ExpressibleByIntegerLiteral {
init(integerLiteral value: Int) {
self.rawValue = value
}
}
enum Enum: Value, CaseIterable {
case first = 1
}
String
and Int
are already covering the vast majority of the cases (!) though, but I 'm afraid that LosslessStringConvertible & Compable
will only partially improve the situation (the Enum
above would pass through).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgrapperon if I'm not mistaken, the reason RawValue == Int
works is because Int
conforms to LosslessStringConvertible & Comparable
. The String.init
being used in toPrefix: { String($0) }
is init<T>(_ value: T) where T : LosslessStringConvertible
.
RawValue: LosslessStringConvertible & Comparable
should work for FixedWidthInteger
, Double
and Substring
. As for printing, I'm pretty sure that would remain the same using toPrefix
:
input.prepend(contentsOf: self.toPrefix(output.rawValue))
Maybe I'm missing something though :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iampatbrown You're right about LosslessStringConvertible
. I was only reasoning at the RawRepresentable
enum level, without considering how we can extract parsers/printers from the RawValue
later.
You're probably right for the constraint, and the example I've provided is quite exotic (and a dedicated parser can always be implemented if it ever happens in the wild). Furthermore, the new RawValue
types that can be reached are all comparable, so it directly works with respect to the comment I made about sorting below.
I didn't check if the compiler was working the same with Int
/String
or LosslessStringConvertible&Comparable
, but if it does, it looks like a better alternative.
) { | ||
self.toPrefix = toPrefix | ||
self.areEquivalent = areEquivalent | ||
self.cases = Output.allCases.sorted(by: { $0.rawValue > $1.rawValue }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be sufficient to sort by length instead? This would allow dropping the Comparable
requirement, isn't it?
Edit: Nevermind, Output.RawValue
isn't a Collection
to begin with, and right now this PR only supports Int
and String
RawValue
s. Maybe we can sort only when RawValue
is a Collection
, and have another route for incomparable RawValue
s (It can happen if RawValue
types other than String
or Int
are involved, so it may be preferable to defer support for this in a later PR if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgrapperon as in something like this?
self.cases = Output.allCases.sorted(by: {
toPrefix($0.rawValue).endIndex > toPrefix($1.rawValue).endIndex
})
What if the rawValues have the same length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't really matter if they have the same length. The main issue would be if a shorter value is ordered before a longer one. This probably means that comparing .endIndex
would probably work for Double
but comparing .rawValue
might not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgrapperon actually the current implementation would fail for negative Int
s too. Maybe using endIndex
is actually the way to go. You can change the current test to negative values to test it out.
enum Person: Int, CaseIterable {
case blob = -4
case blobJr = -42
}
var input = "-4,-42"[...].utf8
XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr]) // XCTAssertEqual failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Edit, this was posted as the previous comment from @iampatbrown was written. Failing for negative ints is definitely a concern! I was responding to his comment above]
@iampatbrown With the current state of the PR, this is not really a concern. Sorting is useless but mostly harmless for numeric RawValue
s [Edit: wrong], and the other types are collections where sorting by count
would be equivalent to bin the lexical
sort.
So it's working indeed, but it performs a little more work than needed. The Comparable
requirement is fortuitously fulfilled by all supported RawValue
types. Being LosslessStringConvertible
automatically provides a canonical way to sort values by projecting them to .description
, so it can maybe be possible to drop the Comparable
requirement for a simple LosslessStringConvertible
, and perform the sort via .description
only when needed if we manage to exclude numbers (maybe by checking if they're IntegerLiteralType
or FloatLiteralType
, or using explicit types directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the current implementation would fail for negative Ints too.
Good catch @iampatbrown !
As I'm getting it, it would work using .description
, isn't it? I didn't have the time to check out the branch yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iampatbrown Ok, I've just checked, and reverse sorting lexically by .description
seems to do the trick too, but the toPrefix
seems more efficient/elegant. It doesn't seem to cause issues with same-length strings because they're always different numbers(e.g., "-4" vs "44").
The endIndex
seems to be enough to mesure the length of the prefix
. If it's possible to manually conform structs to CaseIterable
, this PR cleary aims enums, and I don't think it is possible to craft Substring
s that will not sort correctly when defining them using string literals (which is a requirement of the enum being RawRepresentable
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the sort happens inside the init of the parser where RawValue
has no constraints apart from being comparable. The LosslessStringConvertible
was only for a static initilizer. I think you're right about using length though. The sort is to ensure the parser tries to match longer cases first. For RawValue == String
using Comparable
works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iampatbrown That's a chance you did spot the negative Int
case. This is a very common situation!
@stephencelis @mbrandonw Here is a summary of the possible changes from the discussion @tgrapperon and I had:
self.cases = Output.allCases
.sorted(by: { toPrefix($0.rawValue).count > toPrefix($1.rawValue).count })
I've included the changes in the below branch if needed. case-iterable-raw-representable...iampatbrown:case-iterable-raw-representable |
@iampatbrown Since |
@tgrapperon Yep. It could be dropped. The main reason I didn't suggest it was the redundant call to |
Edit: |
Using |
@tgrapperon for some additional context here is the implementation from swift/String.swift ...
/// Creates an instance from the description of a given
/// `LosslessStringConvertible` instance.
@inlinable @inline(__always)
public init<T: LosslessStringConvertible>(_ value: T) {
self = value.description
}
}
extension String: CustomStringConvertible {
/// The value of this string.
///
/// Using this property directly is discouraged. Instead, use simple
/// assignment to create a new constant or variable equal to this string.
@inlinable
public var description: String { return self }
} I'd say using the lossless initializer for |
From String.swift:L899 and below, it looks as optimal as it can be. |
I'll add an edit to the summary. |
As I also wonder up to which point comparing indices from different In any case, we can always use the distance from Footnotes
|
I'm not sure how sort could access |
If The same argument could be used about |
Just want to clarify, right now the sorting occurs inside the init of
I'm not exactly sure of the inner workings of extension String.Index {
@inlinable @inline(__always)
internal var orderingValue: UInt64 { return _rawBits &>> 14 }
}
extension String.Index: Comparable {
@inlinable @inline(__always)
public static func < (lhs: String.Index, rhs: String.Index) -> Bool {
return lhs.orderingValue < rhs.orderingValue
}
} I don't think there's any issue using count though. I don't think it matters if it's O(n), I'm pretty sure |
Ah, sorry, I was thinking this internal parser could be constrained too without creating restrictions, as I supposed that all routes creating it are already constrained by
Yes, it would work in our case, but Furthermore, we need to go through "👨👨👧".count == "👨👨👧👧".count // == 1
"👨👨👧👧".starts(with: "👨👨👧") // false
"👨👨👧".utf8.count != "👨👨👧👧".utf8.count // 18 != 25
"👨👨👧👧".utf8.starts(with: "👨👨👧".utf8) // true If we sort by I'm sorry for the amount of discussion over this very benign matter! It is fine like you summarized, maybe with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix some typos.
Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md
Outdated
Show resolved
Hide resolved
Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md
Outdated
Show resolved
Hide resolved
…le.md Co-authored-by: Kth <K2968220169@outlook.com>
…le.md Co-authored-by: Kth <K2968220169@outlook.com>
@iampatbrown @tgrapperon Thanks for the in-depth conversation! Brandon and I spoke about this earlier today. We definitely want to ensure negative numbers are appropriately parsed, so thanks for calling that edge case out! I've pushed a test case and fix. As far as generalizing to enum Code: Double, CaseIterable {
case a = 5
...
}
Code.parser().parse("5.0") // ✅
Code.parser().parse("5") // ❌ Technically this is even a problem for enum Code: Int, CaseIterable {
case a = 5
...
}
Code.parser().parse("5") // ✅
Code.parser().parse("+5") // ❌ We think both of the above could be fixed if If either of you come up with a way of supporting |
@stephencelis @mbrandonw Good call. I think
Sounds like that could require passing a custom parser that incorporates Double.parser()/Int.parser() but also checks the different representations before moving on to the next case. Interesting problem....
Agreed. Nice work! |
Definitely down for those generalizations! Would you want to explore them? |
Backporting and updating this from another branch.