-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
bugfixes for rsplit (for unicode and empty strings) #26238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@StefanKarpinski, is Update: From this comment, "We want indices to always be 1 through ncodeunits(s)." I will update the PR accordingly. |
CI failures seem unrelated. Timeouts on FreeBSD and Win64, and Okay to merge? |
@@ -275,9 +275,9 @@ split(str::T, splitter::Char; | |||
_split(str, equalto(splitter), limit, keep, T <: SubString ? T[] : SubString{T}[]) | |||
|
|||
function _split(str::AbstractString, splitter, limit::Integer, keep_empty::Bool, strs::Array) | |||
i = start(str) | |||
@assert (i = firstindex(str)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to assert this? Why do it here and not elsewhere? I think implementations which get this wrong will realize it quite soon even without assertions given that may things will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asserting it because the subsequent code implicitly relies on it. Basically, I think of this as a comment with some added value. Assertions are cheap so we shouldn't hesitate to add them, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i = firstindex(str))
though goes against the recently added documention that use of @assert
might not work in the future (#25610)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really since the argument here is to delete it which is exactly what the compiler would do. On the contrary, this is exactly the way that @assert
should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StefanKarpinski, the point is that you can't completely delete this code because it contains an assignment. Hence this is not correct usage of @assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. That's true. I missed that bit.
Fixes #26225 (
rsplit
failure for strings with multi-byte Unicode characters).Also, I noticed that
rsplit
was broken in both 0.6 and 0.7 for empty splitter strings:split("abc", "")
gives["a","b","c"]
(correct, I think — and there was a test for this so it was clearly the intended behavior), butrsplit("abc", "")
on 0.6 incorrectly gives["ab","c"]
. In this PR, they both give["a","b","c"]
, and the case withlimit=2
is also tested.The new
rsplit
implementation is both more correct (I think) and shorter.I also noticed that both
rsplit
andsplit
were incorrectly(?) usingstart(string)
to getfirstindex
. Or is this always guaranteed to return1
?Finally, I simplified the
split
/rsplit
tests by replacingisequal
with==
, since the two should be equivalent for strings.==
is more readable, allowed me to use chained comparisons, and gives better diagnostic output for@test
failures.