-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Improve allunique performance for strings, avoid computing length several times for iterators #50644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ok I guess then the issue is resolved |
@oscardssmith There is failing test on CI: Error in testset abstractarray:
--
| Error During Test at /cache/build/default-amdci4-6/julialang/julia-master/julia-a96fdcc811/share/julia/test/abstractarray.jl:762
| Test threw exception
| Expression: #= /cache/build/default-amdci4-6/julialang/julia-master/julia-a96fdcc811/share/julia/test/abstractarray.jl:762 =# @inferred(cat3v(As)) == zeros(2, 2, 2)
| return type Array{Float64, 3} does not match inferred return type Array{_A, 3} where _A I'm not sure how is it related to the changes made in this PR. I tried to execute the test locally and it passed just in REPL. julia> using Test
julia> As = [zeros(2, 2) for _ = 1:2]
2-element Vector{Matrix{Float64}}:
[0.0 0.0; 0.0 0.0]
[0.0 0.0; 0.0 0.0]
julia> @test @inferred(cat(As...; dims=Val(3))) == zeros(2, 2, 2)
Test Passed
julia> cat3v(As) = cat(As...; dims=Val(3))
cat3v (generic function with 1 method)
julia> @test @inferred(cat3v(As)) == zeros(2, 2, 2)
Test Passed
julia> @test @inferred(cat(As...; dims=Val((1,2)))) == zeros(4, 4)
Test Passed I couldn't run the whole testset locally with There is also a whitecheck failing, says there is a trailing whitespace. Is there a command to automatically fix all code-style related problems? |
allunique(A::StridedArray) = length(A) < 32 ? _indexed_allunique(A) : _hashed_allunique(A) | ||
function allunique(S::String) | ||
nunits = ncodeunits(S) # computing `length` for a string might be expensive | ||
nunits < 32 ? _indexed_allunique(S, nunits) : _hashed_allunique(S, nunits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance this looks dangerous, but _indexed_allunique
doesn't actually index. I wonder if it should be called _iterate_allunique
or _quadratic_allunique
or something?
# strings | ||
@test !allunique("aaaa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present we have:
julia> allunique("óá")
true
julia> allunique("óá")
false
julia> collect("óá")
2-element Vector{Char}:
'ó': Unicode U+00F3 (category Ll: Letter, lowercase)
'á': Unicode U+00E1 (category Ll: Letter, lowercase)
julia> collect("óá")
4-element Vector{Char}:
'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
'́': Unicode U+0301 (category Mn: Mark, nonspacing)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'́': Unicode U+0301 (category Mn: Mark, nonspacing)
What answers do we want? These ones match unique
.
If they are right, they should probably be tested.
(I know this isn't what this PR wanted to do, but it does add a String method.)
ping @mcabbott |
This is not your fault
Not that I know of, but the check should also list the line number. |
What is the current status of this PR? I think the interesting example brought by @mcabbott with |
My two biggest concerns are
|
This PR fixes #50360 .
master:
This PR: