Add parser for case-iterable, string raw-representable values #176

stephencelis · 2022-03-04T21:31:11Z

Backporting and updating this from another branch.

stephencelis · 2022-03-04T21:32:19Z

Sources/Parsing/Parsers/CaseIterableRawRepresentable.swift

+
+extension Parsers {
+  public struct CaseIterableRawRepresentableParser<
+    Input: Collection, Output: CaseIterable & RawRepresentable, Prefix: Collection


I've been able to relax some constraints by adding a separate generic for the prefix type that is used to test the starts(with:) logic in parse. So the type is a lil more intense than in the printer branch...

mbrandonw · 2022-03-04T23:48:15Z

I just added a commit that brings over the CaseIterable.md article from the printer branch, and I realized we could make this stuff work when the raw value is an integer too:

enum Person: Int, CaseIterable {
  case blob = 4
  case blobJr = 42
}

let peopleParser = Many {
  Person.parser()
} separator: {
  ",".utf8
} terminator: {
  End()
}

var input = "4,42"[...].utf8
XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr])

I still need to add docs the the raw value integer parsers.

Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md

…le.md Co-authored-by: Thomas Grapperon <35562418+tgrapperon@users.noreply.github.com>

iampatbrown · 2022-03-05T00:37:45Z

Sources/Parsing/Parsers/CaseIterableRawRepresentable.swift

@@ -0,0 +1,133 @@
+extension CaseIterable where Self: RawRepresentable, RawValue == Int {


@mbrandonw would RawValue: LosslessStringConvertible & Comparable also work here?

@iampatbrown If I'm not mistaken, the restriction can theoretically be narrowed to being expressible by an integer or floating-point literal, because that's ultimately what will be allowed for RawRepresentable enums (other than those with RawValue being ExpressibleByStringLiteral). So we can cover every situation with 3 overloads (ExpressibleBy(Integer/Float/String)Literal), but I don't know if we can easily extract parser/printers from it.

Being Comparable allows to reach UInts for example, but it doesn't help with custom made types. For example, this compiles:

struct Value: RawRepresentable, Equatable { var rawValue: Int init(rawValue: Int) { self.rawValue = rawValue } } extension Value: ExpressibleByIntegerLiteral { init(integerLiteral value: Int) { self.rawValue = value } } enum Enum: Value, CaseIterable { case first = 1 }

String and Int are already covering the vast majority of the cases (!) though, but I 'm afraid that LosslessStringConvertible & Compable will only partially improve the situation (the Enum above would pass through).

@tgrapperon if I'm not mistaken, the reason RawValue == Int works is because Int conforms to LosslessStringConvertible & Comparable. The String.init being used in toPrefix: { String($0) } is init<T>(_ value: T) where T : LosslessStringConvertible.

RawValue: LosslessStringConvertible & Comparable should work for FixedWidthInteger, Double and Substring. As for printing, I'm pretty sure that would remain the same using toPrefix:

input.prepend(contentsOf: self.toPrefix(output.rawValue))

Maybe I'm missing something though :P

@iampatbrown You're right about LosslessStringConvertible. I was only reasoning at the RawRepresentable enum level, without considering how we can extract parsers/printers from the RawValue later.

You're probably right for the constraint, and the example I've provided is quite exotic (and a dedicated parser can always be implemented if it ever happens in the wild). Furthermore, the new RawValue types that can be reached are all comparable, so it directly works with respect to the comment I made about sorting below.

I didn't check if the compiler was working the same with Int/String or LosslessStringConvertible&Comparable, but if it does, it looks like a better alternative.

tgrapperon · 2022-03-05T03:01:27Z

Sources/Parsing/Parsers/CaseIterableRawRepresentable.swift

+    ) {
+      self.toPrefix = toPrefix
+      self.areEquivalent = areEquivalent
+      self.cases = Output.allCases.sorted(by: { $0.rawValue > $1.rawValue })


Wouldn't it be sufficient to sort by length instead? This would allow dropping the Comparable requirement, isn't it?

Edit: Nevermind, Output.RawValue isn't a Collection to begin with, and right now this PR only supports Int and String RawValues. Maybe we can sort only when RawValue is a Collection, and have another route for incomparable RawValues (It can happen if RawValue types other than String or Int are involved, so it may be preferable to defer support for this in a later PR if any.

@tgrapperon as in something like this?

self.cases = Output.allCases.sorted(by: { toPrefix($0.rawValue).endIndex > toPrefix($1.rawValue).endIndex })

What if the rawValues have the same length?

I guess it doesn't really matter if they have the same length. The main issue would be if a shorter value is ordered before a longer one. This probably means that comparing .endIndex would probably work for Double but comparing .rawValue might not

@tgrapperon actually the current implementation would fail for negative Ints too. Maybe using endIndex is actually the way to go. You can change the current test to negative values to test it out.

enum Person: Int, CaseIterable { case blob = -4 case blobJr = -42 } var input = "-4,-42"[...].utf8 XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr]) // XCTAssertEqual failed

[Edit, this was posted as the previous comment from @iampatbrown was written. Failing for negative ints is definitely a concern! I was responding to his comment above]

@iampatbrown With the current state of the PR, this is not really a concern. Sorting is useless but mostly harmless for numeric RawValues [Edit: wrong], and the other types are collections where sorting by count would be equivalent to bin the lexical sort.
So it's working indeed, but it performs a little more work than needed. The Comparable requirement is fortuitously fulfilled by all supported RawValue types. Being LosslessStringConvertible automatically provides a canonical way to sort values by projecting them to .description, so it can maybe be possible to drop the Comparable requirement for a simple LosslessStringConvertible, and perform the sort via .description only when needed if we manage to exclude numbers (maybe by checking if they're IntegerLiteralType or FloatLiteralType, or using explicit types directly).

actually the current implementation would fail for negative Ints too.

Good catch @iampatbrown !
As I'm getting it, it would work using .description, isn't it? I didn't have the time to check out the branch yet.

@iampatbrown Ok, I've just checked, and reverse sorting lexically by .description seems to do the trick too, but the toPrefix seems more efficient/elegant. It doesn't seem to cause issues with same-length strings because they're always different numbers(e.g., "-4" vs "44").

The endIndex seems to be enough to mesure the length of the prefix. If it's possible to manually conform structs to CaseIterable, this PR cleary aims enums, and I don't think it is possible to craft Substrings that will not sort correctly when defining them using string literals (which is a requirement of the enum being RawRepresentable).

Well the sort happens inside the init of the parser where RawValue has no constraints apart from being comparable. The LosslessStringConvertible was only for a static initilizer. I think you're right about using length though. The sort is to ensure the parser tries to match longer cases first. For RawValue == String using Comparable works fine.

@iampatbrown That's a chance you did spot the negative Int case. This is a very common situation!

iampatbrown · 2022-03-05T21:15:21Z

@stephencelis @mbrandonw Here is a summary of the possible changes from the discussion @tgrapperon and I had:

Sort by count instead of rawValue and drop Comparable requirement. Comparable works for String but not for Int because negative numbers will not get sorted by descending length.

self.cases = Output.allCases
  .sorted(by: { toPrefix($0.rawValue).count > toPrefix($1.rawValue).count })

RawValue: LosslessStringConvertible could be used in place of RawValue == Int. It would add support for additional types and would work with the current RawValue == Int implementation (if sorted by count)
Edit: RawValue == String could also be dropped in favor of RawValue: LosslessStringConvertible. Thanks @tgrapperon.

I've included the changes in the below branch if needed.

case-iterable-raw-representable...iampatbrown:case-iterable-raw-representable

tgrapperon · 2022-03-05T21:19:25Z

@iampatbrown Since StringProtocol: LosslessStringConvertible, aren't we be able to drop the RawValue: String specialization too?

iampatbrown · 2022-03-05T21:39:43Z

@tgrapperon Yep. It could be dropped. The main reason I didn't suggest it was the redundant call to String.init in toPrefix: { String($0) }. That's probably insignificant though.

tgrapperon · 2022-03-05T21:44:31Z

~~Wouldn't toPrefix: { $0 is String ? $0 as! String : String($0) } work? (but I feel we're entering the bikeshedding zone).~~

Edit: String($0) is almost certainly insignificant. These are small enum's RawValue which memory is probably reused if I'm getting Swift's management of String correctly.

iampatbrown · 2022-03-05T21:52:57Z

Using toPrefix: { String($0) } would be fine and would probably be the same as toPrefix: { $0 } in this case.

iampatbrown · 2022-03-05T21:57:05Z

@tgrapperon for some additional context here is the implementation from swift/String.swift

  ...
  /// Creates an instance from the description of a given
  /// `LosslessStringConvertible` instance.
  @inlinable @inline(__always)
  public init<T: LosslessStringConvertible>(_ value: T) {
    self = value.description
  }
}

extension String: CustomStringConvertible {
  /// The value of this string.
  ///
  /// Using this property directly is discouraged. Instead, use simple
  /// assignment to create a new constant or variable equal to this string.
  @inlinable
  public var description: String { return self }
}

I'd say using the lossless initializer for String would pretty much be the same. So I don't see anything wrong with dropping RawValue == String as well.

tgrapperon · 2022-03-05T21:57:16Z

From String.swift:L899 and below, it looks as optimal as it can be.
Edit: I was posting the same!

iampatbrown · 2022-03-05T21:58:49Z

I'll add an edit to the summary.

tgrapperon · 2022-03-06T01:17:42Z

As value.description is hit anyway, I wonder if we couldn't use this directly instead of going through toPrefix to sort the cases.
Are there representations where the ordering could be different, that is where a.description.endIndex < b.description.endIndex doesn't implies that toPrefix(String(a)).endIndex < toPrefix(String(b)).endIndex?
This is mostly by curiosity, and going through toPrefix is definitely a safe and acceptable solution.

I also wonder up to which point comparing indices from different Strings gives a meaningful result. I vaguely remember someone from the core team pointing this, but I can't figure out where yet¹, nor if it ever happened. It would make sense though.

In any case, we can always use the distance from .startIndex (that is .count), but the complexity is O(n) for String (but we're talking about enums' RawValues, not War and Peace).

Edit: The only occurence I was able to find yet is what Becca Royal-Gordon implies when pitching a partial order version of Comparable (4th bullet point from the end):
"For instance, with the <> operator in place, String.Index could be made incomparable with indices from other strings, but all String.Indexes would still have a total order. That design wouldn't be possible with an isIncomparable property." ↩

iampatbrown · 2022-03-06T02:23:41Z

As value.description is hit anyway, I wonder if we couldn't use this directly instead of going through toPrefix to sort the cases.

I'm not sure how sort could access .description unless some type info was passed to the parser init. That being said. I don't think using toPrefix($0.rawValue).endIndex is 100% either. Using toPrefix($0.rawValue).count is probably safer. We could deliberately break .endIndex if we really wanted to with a custom toPrefix, however, the parser init is not public so right now that shouldn't happen.

tgrapperon · 2022-03-06T03:55:34Z

If RawValue is LosslessStringConvertible, it has a .description that generates a String we can measure for sorting. But the more I think about it, and given the implementation from LosslessStringConvertible's String.init, I don't think it makes any difference in terms of performance, as all toPrefix implementations are trivial. It is probably safer like you proposed, through toPrefix, as we then "sort like we parse". I don't have a counterexample where this would be required instead of sorting the rawValues's descriptions, but it looks more resilient like this in any case.

The same argument could be used about count. From what I can see, String.Index comparison is only meaningful in the same String. They internally sort using some 2bits integer called transcodedIndex along with another integer called encodedIndex. I don't really grasp the correspondence with the internal representation, but I guess it's safer to assume that encodedIndex is string-dependent. In this specific use case, I can't see how this could lead to two rawValues having the same prefix and a wrong ordering, but it's most likely a bad practice in the general case. For this reason, I think it's worth the small expense of sorting using .count. The current implementation of the PR is using Comparable which is not better in terms of complexity.

iampatbrown · 2022-03-06T04:57:04Z

If RawValue is LosslessStringConvertible, it has a .description that generates a String we can measure for sorting.

Just want to clarify, right now the sorting occurs inside the init of CaseIterableRawRepresentableParser where there is no LosslessStringConvertible constraint. That's the reason for using toPrefix and why I keep saying .description can't be used (without making changes to the init).

From what I can see, String.Index comparison is only meaningful in the same String

I'm not exactly sure of the inner workings of String.Index but the following from StringIndex.swift makes me think it would work:

extension String.Index {
  @inlinable @inline(__always)
  internal var orderingValue: UInt64 { return _rawBits &>> 14 }
}

extension String.Index: Comparable {
  @inlinable @inline(__always)
  public static func < (lhs: String.Index, rhs: String.Index) -> Bool {
    return lhs.orderingValue < rhs.orderingValue
  }
}

I don't think there's any issue using count though. I don't think it matters if it's O(n), I'm pretty sure toPrefix is going to be O(n) for LosslessStringConvertible anyway (apart from when RawValue == String)

tgrapperon · 2022-03-06T06:22:42Z

Just want to clarify, right now the sorting occurs inside the init of CaseIterableRawRepresentableParser where there is no LosslessStringConvertible constraint.

Ah, sorry, I was thinking this internal parser could be constrained too without creating restrictions, as I supposed that all routes creating it are already constrained by LosslessStringConvertible (String variants were already seen as such in my mind). But this is not an issue, as sorting using simply.description is a bad idea (see below).

I'm not exactly sure of the inner workings of String.Index but the following from StringIndex.swift makes me think it would work

Yes, it would work in our case, but "👨‍👨‍👧‍👧".endIndex > "AB".endIndex shows it's not a reliable indicator to sort Strings by length. However, I can't form an example with the same prefixes and "wrong" endIndex'es, so endIndex doesn't seem to be problematic for our intent. But as you said, .count is not where time will be lost there.

Furthermore, we need to go through toPrefix to measure the cases:

"👨‍👨‍👧".count == "👨‍👨‍👧‍👧".count // == 1
"👨‍👨‍👧‍👧".starts(with: "👨‍👨‍👧") // false

"👨‍👨‍👧".utf8.count != "👨‍👨‍👧‍👧".utf8.count // 18 != 25
"👨‍👨‍👧‍👧".utf8.starts(with: "👨‍👨‍👧".utf8) // true

If we sort by .description which are Strings, we will get ambiguous results because both have the same count (albeit different endIndex'es!), and if we then parse utf8 views, we may match 👨‍👨‍👧 instead of 👨‍👨‍👧‍👧.
However, because cases go through toPrefix, they're correctly measured and ordered, so this shows that we must indeed "sort like we parse". This example answers one of my questions above.

I'm sorry for the amount of discussion over this very benign matter! It is fine like you summarized, maybe with .count instead of .endIndex for the sake of clarity.

KeithBird

Fix some typos.

Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md

…le.md Co-authored-by: Kth <K2968220169@outlook.com>

stephencelis · 2022-03-07T19:17:37Z

@iampatbrown @tgrapperon Thanks for the in-depth conversation!

Brandon and I spoke about this earlier today. We definitely want to ensure negative numbers are appropriately parsed, so thanks for calling that edge case out! I've pushed a test case and fix.

As far as generalizing to LosslessStringConvertible goes, we agree that it would be nice, but are worried that we can't do so as simply as proposed because the formatted string of the value doesn't always have a single representation. For integers, this isn't super problematic, but take an enum that is raw-representable by a Double: it would need to be parsed exactly as it is formatted:

enum Code: Double, CaseIterable {
  case a = 5
  ...
}

Code.parser().parse("5.0") // ✅
Code.parser().parse("5")   // ❌

Technically this is even a problem for Ints if we wanted to be lenient enough to support parsing a leading "+" sign, though this is arguably a lot less problematic:

enum Code: Int, CaseIterable {
  case a = 5
  ...
}

Code.parser().parse("5")  // ✅
Code.parser().parse("+5") // ❌

We think both of the above could be fixed if Code.parser() somehow knew to call out to Double.parser() and Int.parser() under the hood instead, which might be easier now that we unified all floating-point parsers in #177. For now, though, we think this is in good enough shape to merge.

If either of you come up with a way of supporting Double (or if you think you have a reasonable solution for generally parsing any LosslessStringConvertible values), please start a discussion or PR 😄

iampatbrown · 2022-03-07T20:04:09Z

As far as generalizing to LosslessStringConvertible goes, we agree that it would be nice, but are worried that we can't do so as simply as proposed because the formatted string of the value doesn't always have a single representation.

@stephencelis @mbrandonw Good call. I think String and Int gives decent coverage. Do you think using StringProtocol and FixedWidthInteger respectively would work as the constrains or does that have similar issues?

We think both of the above could be fixed if Code.parser() somehow knew to call out to Double.parser() and Int.parser() under the hood instead

Sounds like that could require passing a custom parser that incorporates Double.parser()/Int.parser() but also checks the different representations before moving on to the next case. Interesting problem....

we think this is in good enough shape to merge.

Agreed. Nice work!

stephencelis · 2022-03-07T22:06:49Z

@stephencelis @mbrandonw Good call. I think String and Int gives decent coverage. Do you think using StringProtocol and FixedWidthInteger respectively would work as the constrains or does that have similar issues?

Definitely down for those generalizations! Would you want to explore them?

Add parser for case-iterable, string raw-representable values

c4fff54

stephencelis requested a review from mbrandonw March 4, 2022 21:31

stephencelis commented Mar 4, 2022

View reviewed changes

wip

69a1eb8

tgrapperon reviewed Mar 5, 2022

View reviewed changes

Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md Outdated Show resolved Hide resolved

Update Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterab…

9c6ba0f

…le.md Co-authored-by: Thomas Grapperon <35562418+tgrapperon@users.noreply.github.com>

iampatbrown reviewed Mar 5, 2022

View reviewed changes

tgrapperon reviewed Mar 5, 2022

View reviewed changes

KeithBird suggested changes Mar 6, 2022

View reviewed changes

Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md Outdated Show resolved Hide resolved

Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterable.md Outdated Show resolved Hide resolved

stephencelis and others added 3 commits March 7, 2022 09:48

Update Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterab…

3ead044

…le.md Co-authored-by: Kth <K2968220169@outlook.com>

negatives

98c7749

Update Sources/Parsing/Documentation.docc/Articles/Parsers/CaseIterab…

08d23cc

…le.md Co-authored-by: Kth <K2968220169@outlook.com>

docs

7a4b71b

mbrandonw approved these changes Mar 7, 2022

View reviewed changes

stephencelis merged commit 1a0050f into main Mar 7, 2022

stephencelis deleted the case-iterable-raw-representable branch March 7, 2022 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser for case-iterable, string raw-representable values #176

Add parser for case-iterable, string raw-representable values #176

stephencelis commented Mar 4, 2022

stephencelis Mar 4, 2022

mbrandonw commented Mar 4, 2022 •

edited

Loading

iampatbrown Mar 5, 2022

tgrapperon Mar 5, 2022

iampatbrown Mar 5, 2022

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

iampatbrown Mar 5, 2022 •

edited

Loading

iampatbrown Mar 5, 2022

iampatbrown Mar 5, 2022

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

iampatbrown Mar 5, 2022

tgrapperon Mar 5, 2022

iampatbrown commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 5, 2022 •

edited

Loading

iampatbrown commented Mar 5, 2022

tgrapperon commented Mar 5, 2022 •

edited

Loading

iampatbrown commented Mar 5, 2022

iampatbrown commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 5, 2022 •

edited

Loading

iampatbrown commented Mar 5, 2022

tgrapperon commented Mar 6, 2022 •

edited

Loading

iampatbrown commented Mar 6, 2022 •

edited

Loading

tgrapperon commented Mar 6, 2022 •

edited

Loading

iampatbrown commented Mar 6, 2022 •

edited

Loading

tgrapperon commented Mar 6, 2022 •

edited

Loading

KeithBird left a comment

stephencelis commented Mar 7, 2022

iampatbrown commented Mar 7, 2022

stephencelis commented Mar 7, 2022

		@@ -0,0 +1,133 @@
		extension CaseIterable where Self: RawRepresentable, RawValue == Int {

Add parser for case-iterable, string raw-representable values #176

Add parser for case-iterable, string raw-representable values #176

Conversation

stephencelis commented Mar 4, 2022

Choose a reason for hiding this comment

mbrandonw commented Mar 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgrapperon Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

tgrapperon Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

iampatbrown Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgrapperon Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

tgrapperon Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

tgrapperon Mar 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iampatbrown commented Mar 5, 2022 • edited Loading

tgrapperon commented Mar 5, 2022 • edited Loading

iampatbrown commented Mar 5, 2022

tgrapperon commented Mar 5, 2022 • edited Loading

iampatbrown commented Mar 5, 2022

iampatbrown commented Mar 5, 2022 • edited Loading

tgrapperon commented Mar 5, 2022 • edited Loading

iampatbrown commented Mar 5, 2022

tgrapperon commented Mar 6, 2022 • edited Loading

Footnotes

iampatbrown commented Mar 6, 2022 • edited Loading

tgrapperon commented Mar 6, 2022 • edited Loading

iampatbrown commented Mar 6, 2022 • edited Loading

tgrapperon commented Mar 6, 2022 • edited Loading

KeithBird left a comment

Choose a reason for hiding this comment

stephencelis commented Mar 7, 2022

iampatbrown commented Mar 7, 2022

stephencelis commented Mar 7, 2022

mbrandonw commented Mar 4, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

iampatbrown Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

tgrapperon Mar 5, 2022 •

edited

Loading

iampatbrown commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 5, 2022 •

edited

Loading

iampatbrown commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 5, 2022 •

edited

Loading

tgrapperon commented Mar 6, 2022 •

edited

Loading

iampatbrown commented Mar 6, 2022 •

edited

Loading

tgrapperon commented Mar 6, 2022 •

edited

Loading

iampatbrown commented Mar 6, 2022 •

edited

Loading

tgrapperon commented Mar 6, 2022 •

edited

Loading