Skip to content

Allow types other than Int in randstring #54402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions stdlib/Random/src/misc.jl
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,14 @@ let b = UInt8['0':'9';'A':'Z';'a':'z']
global randstring

function randstring(r::AbstractRNG, chars=b, n::Integer=8)
_n = convert(Int, n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is worth doing in this specific instance but presumably structuring it like:

f(x::Integer) = f(convert(Int, n))
f(x::Int) = ...

would reduce compilation time since you would not compile the main function body for multiple integer types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where lots of different types are passed to randstring, that's certainly the case, yeah. I'm not sure how common that would be though, there's lots of better ways to control the length of a generated string other than through the type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @KristofferC is just referring to the amount of code that is generated by using function barriers or:
https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions

Copy link
Contributor Author

@Seelengrab Seelengrab May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm referring to the likelihood of hitting that & the additional compilation overhead being a significant slowdown. If this is a bottleneck, I'd first recommend switching to a different scheme for generating the length (with just Int) than optimize the compilation overhead.

Copy link
Contributor

@sjkelly sjkelly May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and the problem is that this PR would allow someone to have a type unstable call site with various Integer types for length, rather than error as it currently does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that inherently be a problem? We don't prevent type instabilities in user code in other places either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Krostoffer's convert approach is a good idea. Yes, it's ok to just allow a different integer type flow through but in this case Int covers the entire range of reasonable values and converting avoids additional compilation.

Copy link
Contributor Author

@Seelengrab Seelengrab Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no other integer type flowing through though; the code in this PR already calls _convert(Int, ..), just directly instead of in a function barrier. This function is a total of 11 lines long, I seriously doubt that this leads to a bottleneck in either compilation time or binary size. At that size, I'm willing to bet that the inlining pass costs more than potential dual compilation if anyone calls this with an UInt8..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but every little bit counts. Why not just change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I agree with the comment by Kristoffer that it's probably not worth doing in this case. Is it a problem that I agree with the reviewer? As it stands, this just seems like premature optimization of a hypothetical to me.

T = eltype(chars)
if T === UInt8
str = Base._string_n(n)
GC.@preserve str rand!(r, UnsafeView(pointer(str), n), chars)
str = Base._string_n(_n)
GC.@preserve str rand!(r, UnsafeView(pointer(str), _n), chars)
return str
else
v = Vector{T}(undef, n)
v = Vector{T}(undef, _n)
rand!(r, v, chars)
return String(v)
end
Expand Down
6 changes: 6 additions & 0 deletions stdlib/Random/test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,12 @@ let b = ['0':'9';'A':'Z';'a':'z']
@test randstring(MersenneTwister(0)) == randstring(MersenneTwister(0), b)
end

@testset "`randstring` with $T" for T in (UInt8, UInt16, UInt32, Int8, Int16, Int32, UInt, Int)
# clamp it to a small value so that we don't allocate too much unnecessarily
n = clamp(rand(T), Int8) % T
@test randstring(n) isa String
end

# this shouldn't crash (#22403)
@test_throws MethodError rand!(Union{UInt,Int}[1, 2, 3])

Expand Down