Skip to content

Commit e813f0d

Browse files
tpappfredrikekre
authored andcommitted
Expand documentation of custom random samplers. (#31787)
1 parent 48634f9 commit e813f0d

File tree

2 files changed

+110
-58
lines changed

2 files changed

+110
-58
lines changed

stdlib/Random/docs/src/index.md

Lines changed: 74 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -80,18 +80,45 @@ in order to support usual types of generated values.
8080

8181
### Generating random values of custom types
8282

83-
There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`).
84-
The simple cases are explained first, and more advanced usage is presented later.
85-
We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
83+
Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit.
84+
85+
The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both.
86+
87+
The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient:
88+
89+
1. `SamplerType{T}()` can be used for implementing samplers that draw from type `T` (e.g. `rand(Int)`).
90+
91+
2. `SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`).
92+
93+
3. `SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.
94+
95+
We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
96+
97+
```@docs
98+
Random.Sampler
99+
Random.SamplerType
100+
Random.SamplerTrivial
101+
Random.SamplerSimple
102+
```
103+
104+
Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows:
105+
106+
```julia
107+
rng = MersenneTwister()
108+
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister, 1:20)
109+
for x in X
110+
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
111+
# use n
112+
end
113+
```
114+
115+
This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`).
86116

87117
#### Generating values from a type
88118

89-
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
90-
In order to define random generation of values of type `T`, the following method can be defined:
91-
`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
119+
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the *default sampler for types*. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return.
92120

93-
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`.
94-
We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4):
121+
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4):
95122

96123
```jldoctest Die
97124
struct Die
@@ -126,12 +153,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a)
126153
Die(8)
127154
```
128155

129-
#### Generating values from a collection
156+
#### A simple sampler without pre-computed data
157+
158+
Here we define a sampler for a collection. If no pre-computed data is required, it can be implemented with a `SamplerTrivial` sampler, which is in fact the *default fallback for values*.
130159

131-
Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced.
132-
In order to define random generation out of objects of type `S`, the following method can be defined:
133-
`rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`.
134-
Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
160+
In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
135161

136162
```jldoctest Die; setup = :(Random.seed!(1))
137163
julia> Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{Die}) = rand(rng, 1:d[].nsides);
@@ -146,38 +172,48 @@ julia> rand(Die(4), 3)
146172
2
147173
```
148174

149-
In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define
150-
`Base.eltype(::Type{Die}) = Int`.
151-
175+
Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`.
152176

153-
#### Generating values for an `AbstractFloat` type
177+
A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`.
154178

155-
`AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather
156-
in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`:
157-
`Random.rand(::AbstractRNG, ::Random.SamplerTrivial{Random.CloseOpen01{T}})`
179+
Consequently, a method
180+
```julia
181+
Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} =
182+
Sampler(RNG, CloseOpen01(T), n)
183+
```
184+
is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it.
158185

186+
#### An optimized sampler with pre-computed data
159187

160-
#### Optimizing generation with cached computation between calls
188+
Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it.
161189

162-
When repeatedly generating random values (with the same `rand` parameters), it happens for some types
163-
that the result of a computation is used for each call. In this case, the computation can be decoupled
164-
from actually generating the values. This is the case for example with the default implementation for
165-
`AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage
166-
of this decoupling is as follows:
190+
Suppose that the distribution is described by
191+
```julia
192+
struct DiscreteDistribution{V <: AbstractVector}
193+
probabilities::V
194+
end
195+
```
196+
and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
197+
```julia
198+
Random.eltype(::Type{<:DiscreteDistribution}) = Int
167199

200+
function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition)
201+
SamplerSimple(disribution, make_alias_table(distribution.probabilities))
202+
end
203+
```
204+
should be defined to return a sampler with pre-computed data, then
168205
```julia
169-
rng = MersenneTwister()
170-
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister,1:20)
171-
for x in X
172-
n = rand(rng, sp) # similar to n = rand(rng, 1:20)
173-
# use n
206+
function rand(rng::AbstractRNG, sp::SamplerSimple{<:DiscreteDistribution})
207+
draw_number(rng, sp.data)
174208
end
175209
```
210+
will be used to draw the values.
211+
212+
#### Custom sampler types
213+
214+
The `SamplerSimple` type is sufficient for most use cases with precomputed data. However, in order to demonstrate how to use custom sampler types, here we implement something similar to `SamplerSimple`.
176215

177-
This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`).
178-
In order to implement this decoupling for a custom type, a helper type can be used.
179-
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
180-
there is an opportunity for this optimization:
216+
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization. We call our custom sampler `SamplerDie`.
181217

182218
```julia
183219
import Random: Sampler, rand
@@ -194,10 +230,9 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, r::Random.Repetition) =
194230
rand(rng::AbstractRNG, sp::SamplerDie) = rand(rng, sp.sp)
195231
```
196232

197-
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`.
198-
In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
233+
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
199234

200-
This pattern is so frequent that a helper type named `Random.SamplerSimple` is available,
235+
Of course, this pattern is so frequent that the helper type used above, namely `Random.SamplerSimple`, is available,
201236
saving us the definition of `SamplerDie`: we could have implemented our decoupling with:
202237

203238
```julia
@@ -228,8 +263,7 @@ Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{1}) = SamplerDie1(...)
228263
Sampler(RNG::Type{<:AbstractRNG}, die::Die, ::Val{Inf}) = SamplerDieMany(...)
229264
```
230265

231-
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)`
232-
and `rand(::AbstractRNG, ::SamplerDieMany)`).
266+
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Note that, as usual, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary.
233267

234268
Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and
235269
`Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`.

stdlib/Random/src/Random.jl

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -40,22 +40,6 @@ Supertype for random number generators such as [`MersenneTwister`](@ref) and [`R
4040
"""
4141
abstract type AbstractRNG end
4242

43-
"""
44-
Random.gentype(T)
45-
46-
Determine the type of the elements generated by calling `rand([rng], x)`,
47-
where `x::T`, and `x` is not a type.
48-
The definition `gentype(x) = gentype(typeof(x))` is provided for convenience,
49-
and `gentype(T)` defaults to `eltype(T)`.
50-
NOTE: `rand([rng], X)`, where `X` is a type, is always assumed to produce
51-
an object of type `X`.
52-
53-
# Examples
54-
```jldoctest
55-
julia> gentype(1:10)
56-
Int64
57-
```
58-
"""
5943
gentype(::Type{X}) where {X} = eltype(X)
6044
gentype(x) = gentype(typeof(x))
6145

@@ -137,6 +121,21 @@ const Repetition = Union{Val{1},Val{Inf}}
137121
# Sampler(::AbstractRNG, X, ::Val{Inf}) = Sampler(X)
138122
# Sampler(::AbstractRNG, ::Type{X}, ::Val{Inf}) where {X} = Sampler(X)
139123

124+
"""
125+
Sampler(rng, x, repetition = Val(Inf))
126+
127+
Return a sampler object that can be used to generate random values from `rng` for `x`.
128+
129+
When `sp = Sampler(rng, x, repetition)`, `rand(rng, sp)` will be used to draw random values,
130+
and should be defined accordingly.
131+
132+
`repetition` can be `Val(1)` or `Val(Inf)`, and should be used as a suggestion for deciding
133+
the amount of precomputation, if applicable.
134+
135+
[`Random.SamplerType`](@ref) and [`Random.SamplerTrivial`](@ref) are default fallbacks for
136+
*types* and *values*, respectively. [`Random.SamplerSimple`](@ref) can be used to store
137+
pre-computed values without defining extra types for only this purpose.
138+
"""
140139
Sampler(rng::AbstractRNG, x, r::Repetition=Val(Inf)) = Sampler(typeof(rng), x, r)
141140
Sampler(rng::AbstractRNG, ::Type{X}, r::Repetition=Val(Inf)) where {X} = Sampler(typeof(rng), X, r)
142141

@@ -149,18 +148,30 @@ Sampler(::Type{RNG}, ::Type{X}) where {RNG<:AbstractRNG,X} = Sampler(RNG, X, Val
149148

150149
#### pre-defined useful Sampler types
151150

152-
# default fall-back for types
151+
"""
152+
SamplerType{T}()
153+
154+
A sampler for types, containing no other information. The default fallback for `Sampler`
155+
when called with types.
156+
"""
153157
struct SamplerType{T} <: Sampler{T} end
154158

155159
Sampler(::Type{<:AbstractRNG}, ::Type{T}, ::Repetition) where {T} = SamplerType{T}()
156160

157161
Base.getindex(::SamplerType{T}) where {T} = T
158162

159-
# default fall-back for values
160163
struct SamplerTrivial{T,E} <: Sampler{E}
161164
self::T
162165
end
163166

167+
"""
168+
SamplerTrivial(x)
169+
170+
Create a sampler that just wraps the given value `x`. This is the default fall-back for
171+
values.
172+
173+
The recommended use case is sampling from values without precomputed data.
174+
"""
164175
SamplerTrivial(x::T) where {T} = SamplerTrivial{T,gentype(T)}(x)
165176

166177
Sampler(::Type{<:AbstractRNG}, x, ::Repetition) = SamplerTrivial(x)
@@ -173,6 +184,13 @@ struct SamplerSimple{T,S,E} <: Sampler{E}
173184
data::S
174185
end
175186

187+
"""
188+
SamplerSimple(x, data)
189+
190+
Create a sampler that wraps the given value `x` and the `data`.
191+
192+
The recommended use case is sampling from values with precomputed data.
193+
"""
176194
SamplerSimple(x::T, data::S) where {T,S} = SamplerSimple{T,S,gentype(T)}(x, data)
177195

178196
Base.getindex(sp::SamplerSimple) = sp.self

0 commit comments

Comments
 (0)