-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex#each_name(&) and Regex#names #9760
base: master
Are you sure you want to change the base?
Conversation
Could this get a 2nd approval? |
These were never discussed before, sorry. I don't understand their usefulness so I won't approve them. |
This needs a rebase |
c2f3c1c
to
878094a
Compare
@straight-shoota done. |
Can you provide a use case? Also, UInt16 should be replace with Int32. Then, I don't understand why names is useful for, and why there needs to be an unsorted and sorted variants. This makes the API bigger but I can't see how it can be used for anything useful. |
@asterite These new methods are documented and I consider that enough for explaining the use cases.
That would be probably better, yes.
Agreed, although unsorted version is there for efficiency reasons, yet if deemed unnecessary I can leave only sorted version. |
@asterite ping 👋 |
Sorry, I don't have much more time to review or decide things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the API as small as possible. Regex is inefficient by itself, so I don't see much reason to provide a super-efficient overload for unsorted names, which seems a rather obscure use-case.
src/regex.cr
Outdated
def name_table : Hash(UInt16, String) | ||
Hash(UInt16, String).new.tap do |name_table| | ||
each_name do |name, capture_number| | ||
name_table[capture_number] = name | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key type should most certainly not be UInt16
but Int32
.
I'm also wondering if an array would be a better data structure. With many unnamed groups it would occupy more memory, but lookup is more performant. And looking up items by index is a classic job for Indexable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC you can't use the Array
since the index keys might not be correspondent the to array indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this method already exists in the current API with this return type. So we shouldn't break it.
But nevertheless, it's bad. We shouldn't expose UInt16
. That's just an implementation detail.
So I'd suggest to deprecate that method and add another one with a different name and correct return type.
Yes, in an array the indices of unnamed capture groups must be nil values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
/(.)(?<foo>.)(.)(?<bar>.)(.)/.name_table # => {4 => "bar", 2 => "foo"}
/(.)(?<foo>.)(.)(?<bar>.)(.)/.name_array # => [nil, "foo", nil, "bar"]
878094a
to
0799228
Compare
We can't just change the return type of I've also some doubts if |
https://github.com/search?q=name_table+language%3ACrystal&type=code There's literally no single case there which would necessitate adding a new method.
I don't get the point. If you call
There's no other way to get the list of named captures, so that's enough for the use case for me, and this kind of - already accessible - feature, shouldn't force you to monkey-patch stdlib. |
That just means no single use published at GitHub. And it doesn't matter. It's a breaking change.
You could expect the named captures to be indexed by their capture group index.
That's not a use case at all. There's no point in adding a method unless there is a real use case. |
Hi @Sija , thanks for taking the time to write the PR. But we need to ask you a little bit extra:
The documentation explains the use of the method, but does not provide enough evidence that the change is necessary. What we're asking is what makes you want to do this change. What couldn't be done before, or how inefficient or ugly was to do it, and what is better with your proposal? Ideally, you should first open an issue stating the problem, and perhaps drafting a solution, and then move to coding when we all agree on your proposal. Since we're discussing it here now, can you provide an answer to this question in this PR? Thanks! |
@beta-ziliani My use case was the replacement/escaping named group values without knowing the group names in advance. Without |
|
Yeah, name is up for the debate.
This I'm not sure I follow. Why is it redundant? You could pose the same argument for |
fwiw
Regex#names
exists in Ruby.Regex#each_name
was added as a helper and a way to access group names without additional allocations.