Skip to content

Add parser for case-iterable, string raw-representable values #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 7, 2022

Conversation

stephencelis
Copy link
Member

Backporting and updating this from another branch.

@stephencelis stephencelis requested a review from mbrandonw March 4, 2022 21:31

extension Parsers {
public struct CaseIterableRawRepresentableParser<
Input: Collection, Output: CaseIterable & RawRepresentable, Prefix: Collection
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been able to relax some constraints by adding a separate generic for the prefix type that is used to test the starts(with:) logic in parse. So the type is a lil more intense than in the printer branch...

@mbrandonw
Copy link
Member

mbrandonw commented Mar 4, 2022

I just added a commit that brings over the CaseIterable.md article from the printer branch, and I realized we could make this stuff work when the raw value is an integer too:

enum Person: Int, CaseIterable {
  case blob = 4
  case blobJr = 42
}

let peopleParser = Many {
  Person.parser()
} separator: {
  ",".utf8
} terminator: {
  End()
}

var input = "4,42"[...].utf8
XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr])

I still need to add docs the the raw value integer parsers.

…le.md

Co-authored-by: Thomas Grapperon <35562418+tgrapperon@users.noreply.github.com>
@@ -0,0 +1,133 @@
extension CaseIterable where Self: RawRepresentable, RawValue == Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrandonw would RawValue: LosslessStringConvertible & Comparable also work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iampatbrown If I'm not mistaken, the restriction can theoretically be narrowed to being expressible by an integer or floating-point literal, because that's ultimately what will be allowed for RawRepresentable enums (other than those with RawValue being ExpressibleByStringLiteral). So we can cover every situation with 3 overloads (ExpressibleBy(Integer/Float/String)Literal), but I don't know if we can easily extract parser/printers from it.

Being Comparable allows to reach UInts for example, but it doesn't help with custom made types. For example, this compiles:

struct Value: RawRepresentable, Equatable {
  var rawValue: Int
  init(rawValue: Int) { self.rawValue = rawValue }
}

extension Value: ExpressibleByIntegerLiteral {
  init(integerLiteral value: Int) {
    self.rawValue = value
  }
}

enum Enum: Value, CaseIterable {
  case first = 1
}

String and Int are already covering the vast majority of the cases (!) though, but I 'm afraid that LosslessStringConvertible & Compable will only partially improve the situation (the Enum above would pass through).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgrapperon if I'm not mistaken, the reason RawValue == Int works is because Int conforms to LosslessStringConvertible & Comparable. The String.init being used in toPrefix: { String($0) } is init<T>(_ value: T) where T : LosslessStringConvertible.

RawValue: LosslessStringConvertible & Comparable should work for FixedWidthInteger, Double and Substring. As for printing, I'm pretty sure that would remain the same using toPrefix:

input.prepend(contentsOf: self.toPrefix(output.rawValue))

Maybe I'm missing something though :P

Copy link
Contributor

@tgrapperon tgrapperon Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iampatbrown You're right about LosslessStringConvertible. I was only reasoning at the RawRepresentable enum level, without considering how we can extract parsers/printers from the RawValue later.

You're probably right for the constraint, and the example I've provided is quite exotic (and a dedicated parser can always be implemented if it ever happens in the wild). Furthermore, the new RawValue types that can be reached are all comparable, so it directly works with respect to the comment I made about sorting below.

I didn't check if the compiler was working the same with Int/String or LosslessStringConvertible&Comparable, but if it does, it looks like a better alternative.

) {
self.toPrefix = toPrefix
self.areEquivalent = areEquivalent
self.cases = Output.allCases.sorted(by: { $0.rawValue > $1.rawValue })
Copy link
Contributor

@tgrapperon tgrapperon Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be sufficient to sort by length instead? This would allow dropping the Comparable requirement, isn't it?

Edit: Nevermind, Output.RawValue isn't a Collection to begin with, and right now this PR only supports Int and String RawValues. Maybe we can sort only when RawValue is a Collection, and have another route for incomparable RawValues (It can happen if RawValue types other than String or Int are involved, so it may be preferable to defer support for this in a later PR if any.

Copy link
Contributor

@iampatbrown iampatbrown Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgrapperon as in something like this?

self.cases = Output.allCases.sorted(by: {
  toPrefix($0.rawValue).endIndex > toPrefix($1.rawValue).endIndex
})

What if the rawValues have the same length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't really matter if they have the same length. The main issue would be if a shorter value is ordered before a longer one. This probably means that comparing .endIndex would probably work for Double but comparing .rawValue might not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgrapperon actually the current implementation would fail for negative Ints too. Maybe using endIndex is actually the way to go. You can change the current test to negative values to test it out.

enum Person: Int, CaseIterable {
  case blob = -4
  case blobJr = -42
}

var input = "-4,-42"[...].utf8
XCTAssertEqual(try peopleParser.parse(&input), [.blob, .blobJr]) // XCTAssertEqual failed

Copy link
Contributor

@tgrapperon tgrapperon Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Edit, this was posted as the previous comment from @iampatbrown was written. Failing for negative ints is definitely a concern! I was responding to his comment above]

@iampatbrown With the current state of the PR, this is not really a concern. Sorting is useless but mostly harmless for numeric RawValues [Edit: wrong], and the other types are collections where sorting by count would be equivalent to bin the lexical sort.
So it's working indeed, but it performs a little more work than needed. The Comparable requirement is fortuitously fulfilled by all supported RawValue types. Being LosslessStringConvertible automatically provides a canonical way to sort values by projecting them to .description, so it can maybe be possible to drop the Comparable requirement for a simple LosslessStringConvertible, and perform the sort via .description only when needed if we manage to exclude numbers (maybe by checking if they're IntegerLiteralType or FloatLiteralType, or using explicit types directly).

Copy link
Contributor

@tgrapperon tgrapperon Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the current implementation would fail for negative Ints too.

Good catch @iampatbrown !
As I'm getting it, it would work using .description, isn't it? I didn't have the time to check out the branch yet.

Copy link
Contributor

@tgrapperon tgrapperon Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iampatbrown Ok, I've just checked, and reverse sorting lexically by .description seems to do the trick too, but the toPrefix seems more efficient/elegant. It doesn't seem to cause issues with same-length strings because they're always different numbers(e.g., "-4" vs "44").

The endIndex seems to be enough to mesure the length of the prefix. If it's possible to manually conform structs to CaseIterable, this PR cleary aims enums, and I don't think it is possible to craft Substrings that will not sort correctly when defining them using string literals (which is a requirement of the enum being RawRepresentable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the sort happens inside the init of the parser where RawValue has no constraints apart from being comparable. The LosslessStringConvertible was only for a static initilizer. I think you're right about using length though. The sort is to ensure the parser tries to match longer cases first. For RawValue == String using Comparable works fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iampatbrown That's a chance you did spot the negative Int case. This is a very common situation!

@iampatbrown
Copy link
Contributor

iampatbrown commented Mar 5, 2022

@stephencelis @mbrandonw Here is a summary of the possible changes from the discussion @tgrapperon and I had:

  1. Sort by count instead of rawValue and drop Comparable requirement. Comparable works for String but not for Int because negative numbers will not get sorted by descending length.
self.cases = Output.allCases
  .sorted(by: { toPrefix($0.rawValue).count > toPrefix($1.rawValue).count })
  1. RawValue: LosslessStringConvertible could be used in place of RawValue == Int. It would add support for additional types and would work with the current RawValue == Int implementation (if sorted by count)
    Edit: RawValue == String could also be dropped in favor of RawValue: LosslessStringConvertible. Thanks @tgrapperon.

I've included the changes in the below branch if needed.

case-iterable-raw-representable...iampatbrown:case-iterable-raw-representable

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 5, 2022

@iampatbrown Since StringProtocol: LosslessStringConvertible, aren't we be able to drop the RawValue: String specialization too?

@iampatbrown
Copy link
Contributor

@tgrapperon Yep. It could be dropped. The main reason I didn't suggest it was the redundant call to String.init in toPrefix: { String($0) }. That's probably insignificant though.

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 5, 2022

Wouldn't toPrefix: { $0 is String ? $0 as! String : String($0) } work? (but I feel we're entering the bikeshedding zone).

Edit: String($0) is almost certainly insignificant. These are small enum's RawValue which memory is probably reused if I'm getting Swift's management of String correctly.

@iampatbrown
Copy link
Contributor

Using toPrefix: { String($0) } would be fine and would probably be the same as toPrefix: { $0 } in this case.

@iampatbrown
Copy link
Contributor

iampatbrown commented Mar 5, 2022

@tgrapperon for some additional context here is the implementation from swift/String.swift

  ...
  /// Creates an instance from the description of a given
  /// `LosslessStringConvertible` instance.
  @inlinable @inline(__always)
  public init<T: LosslessStringConvertible>(_ value: T) {
    self = value.description
  }
}

extension String: CustomStringConvertible {
  /// The value of this string.
  ///
  /// Using this property directly is discouraged. Instead, use simple
  /// assignment to create a new constant or variable equal to this string.
  @inlinable
  public var description: String { return self }
}

I'd say using the lossless initializer for String would pretty much be the same. So I don't see anything wrong with dropping RawValue == String as well.

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 5, 2022

From String.swift:L899 and below, it looks as optimal as it can be.
Edit: I was posting the same!

@iampatbrown
Copy link
Contributor

I'll add an edit to the summary.

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 6, 2022

As value.description is hit anyway, I wonder if we couldn't use this directly instead of going through toPrefix to sort the cases.
Are there representations where the ordering could be different, that is where a.description.endIndex < b.description.endIndex doesn't implies that toPrefix(String(a)).endIndex < toPrefix(String(b)).endIndex?
This is mostly by curiosity, and going through toPrefix is definitely a safe and acceptable solution.

I also wonder up to which point comparing indices from different Strings gives a meaningful result. I vaguely remember someone from the core team pointing this, but I can't figure out where yet1, nor if it ever happened. It would make sense though.

In any case, we can always use the distance from .startIndex (that is .count), but the complexity is O(n) for String (but we're talking about enums' RawValues, not War and Peace).

Footnotes

  1. Edit: The only occurence I was able to find yet is what Becca Royal-Gordon implies when pitching a partial order version of Comparable (4th bullet point from the end):
    "For instance, with the <> operator in place, String.Index could be made incomparable with indices from other strings, but all String.Indexes would still have a total order. That design wouldn't be possible with an isIncomparable property."

@iampatbrown
Copy link
Contributor

iampatbrown commented Mar 6, 2022

As value.description is hit anyway, I wonder if we couldn't use this directly instead of going through toPrefix to sort the cases.

I'm not sure how sort could access .description unless some type info was passed to the parser init. That being said. I don't think using toPrefix($0.rawValue).endIndex is 100% either. Using toPrefix($0.rawValue).count is probably safer. We could deliberately break .endIndex if we really wanted to with a custom toPrefix, however, the parser init is not public so right now that shouldn't happen.

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 6, 2022

If RawValue is LosslessStringConvertible, it has a .description that generates a String we can measure for sorting. But the more I think about it, and given the implementation from LosslessStringConvertible's String.init, I don't think it makes any difference in terms of performance, as all toPrefix implementations are trivial. It is probably safer like you proposed, through toPrefix, as we then "sort like we parse". I don't have a counterexample where this would be required instead of sorting the rawValues's descriptions, but it looks more resilient like this in any case.

The same argument could be used about count. From what I can see, String.Index comparison is only meaningful in the same String. They internally sort using some 2bits integer called transcodedIndex along with another integer called encodedIndex. I don't really grasp the correspondence with the internal representation, but I guess it's safer to assume that encodedIndex is string-dependent. In this specific use case, I can't see how this could lead to two rawValues having the same prefix and a wrong ordering, but it's most likely a bad practice in the general case. For this reason, I think it's worth the small expense of sorting using .count. The current implementation of the PR is using Comparable which is not better in terms of complexity.

@iampatbrown
Copy link
Contributor

iampatbrown commented Mar 6, 2022

If RawValue is LosslessStringConvertible, it has a .description that generates a String we can measure for sorting.

Just want to clarify, right now the sorting occurs inside the init of CaseIterableRawRepresentableParser where there is no LosslessStringConvertible constraint. That's the reason for using toPrefix and why I keep saying .description can't be used (without making changes to the init).

From what I can see, String.Index comparison is only meaningful in the same String

I'm not exactly sure of the inner workings of String.Index but the following from StringIndex.swift makes me think it would work:

extension String.Index {
  @inlinable @inline(__always)
  internal var orderingValue: UInt64 { return _rawBits &>> 14 }
}

extension String.Index: Comparable {
  @inlinable @inline(__always)
  public static func < (lhs: String.Index, rhs: String.Index) -> Bool {
    return lhs.orderingValue < rhs.orderingValue
  }
}

I don't think there's any issue using count though. I don't think it matters if it's O(n), I'm pretty sure toPrefix is going to be O(n) for LosslessStringConvertible anyway (apart from when RawValue == String)

@tgrapperon
Copy link
Contributor

tgrapperon commented Mar 6, 2022

Just want to clarify, right now the sorting occurs inside the init of CaseIterableRawRepresentableParser where there is no LosslessStringConvertible constraint.

Ah, sorry, I was thinking this internal parser could be constrained too without creating restrictions, as I supposed that all routes creating it are already constrained by LosslessStringConvertible (String variants were already seen as such in my mind). But this is not an issue, as sorting using simply.description is a bad idea (see below).

I'm not exactly sure of the inner workings of String.Index but the following from StringIndex.swift makes me think it would work

Yes, it would work in our case, but "👨‍👨‍👧‍👧".endIndex > "AB".endIndex shows it's not a reliable indicator to sort Strings by length. However, I can't form an example with the same prefixes and "wrong" endIndex'es, so endIndex doesn't seem to be problematic for our intent. But as you said, .count is not where time will be lost there.

Furthermore, we need to go through toPrefix to measure the cases:

"👨‍👨‍👧".count == "👨‍👨‍👧‍👧".count // == 1
"👨‍👨‍👧‍👧".starts(with: "👨‍👨‍👧") // false

"👨‍👨‍👧".utf8.count != "👨‍👨‍👧‍👧".utf8.count // 18 != 25
"👨‍👨‍👧‍👧".utf8.starts(with: "👨‍👨‍👧".utf8) // true

If we sort by .description which are Strings, we will get ambiguous results because both have the same count (albeit different endIndex'es!), and if we then parse utf8 views, we may match 👨‍👨‍👧 instead of 👨‍👨‍👧‍👧.
However, because cases go through toPrefix, they're correctly measured and ordered, so this shows that we must indeed "sort like we parse". This example answers one of my questions above.

I'm sorry for the amount of discussion over this very benign matter! It is fine like you summarized, maybe with .count instead of .endIndex for the sake of clarity.

Copy link
Contributor

@KeithBird KeithBird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix some typos.

stephencelis and others added 3 commits March 7, 2022 09:48
…le.md

Co-authored-by: Kth <K2968220169@outlook.com>
…le.md

Co-authored-by: Kth <K2968220169@outlook.com>
@stephencelis
Copy link
Member Author

@iampatbrown @tgrapperon Thanks for the in-depth conversation!

Brandon and I spoke about this earlier today. We definitely want to ensure negative numbers are appropriately parsed, so thanks for calling that edge case out! I've pushed a test case and fix.

As far as generalizing to LosslessStringConvertible goes, we agree that it would be nice, but are worried that we can't do so as simply as proposed because the formatted string of the value doesn't always have a single representation. For integers, this isn't super problematic, but take an enum that is raw-representable by a Double: it would need to be parsed exactly as it is formatted:

enum Code: Double, CaseIterable {
  case a = 5
  ...
}

Code.parser().parse("5.0") // ✅
Code.parser().parse("5")   // ❌

Technically this is even a problem for Ints if we wanted to be lenient enough to support parsing a leading "+" sign, though this is arguably a lot less problematic:

enum Code: Int, CaseIterable {
  case a = 5
  ...
}

Code.parser().parse("5")  // ✅
Code.parser().parse("+5") // ❌

We think both of the above could be fixed if Code.parser() somehow knew to call out to Double.parser() and Int.parser() under the hood instead, which might be easier now that we unified all floating-point parsers in #177. For now, though, we think this is in good enough shape to merge.

If either of you come up with a way of supporting Double (or if you think you have a reasonable solution for generally parsing any LosslessStringConvertible values), please start a discussion or PR 😄

@stephencelis stephencelis merged commit 1a0050f into main Mar 7, 2022
@stephencelis stephencelis deleted the case-iterable-raw-representable branch March 7, 2022 19:43
@iampatbrown
Copy link
Contributor

As far as generalizing to LosslessStringConvertible goes, we agree that it would be nice, but are worried that we can't do so as simply as proposed because the formatted string of the value doesn't always have a single representation.

@stephencelis @mbrandonw Good call. I think String and Int gives decent coverage. Do you think using StringProtocol and FixedWidthInteger respectively would work as the constrains or does that have similar issues?

We think both of the above could be fixed if Code.parser() somehow knew to call out to Double.parser() and Int.parser() under the hood instead

Sounds like that could require passing a custom parser that incorporates Double.parser()/Int.parser() but also checks the different representations before moving on to the next case. Interesting problem....

we think this is in good enough shape to merge.

Agreed. Nice work!

@stephencelis
Copy link
Member Author

@stephencelis @mbrandonw Good call. I think String and Int gives decent coverage. Do you think using StringProtocol and FixedWidthInteger respectively would work as the constrains or does that have similar issues?

Definitely down for those generalizations! Would you want to explore them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants