You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: stdlib/Random/docs/src/index.md
+74-40Lines changed: 74 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,18 +80,45 @@ in order to support usual types of generated values.
80
80
81
81
### Generating random values of custom types
82
82
83
-
There are two categories: generating values from a type (e.g. `rand(Int)`), or from a collection (e.g. `rand(1:3)`).
84
-
The simple cases are explained first, and more advanced usage is presented later.
85
-
We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
83
+
Generating random values for some distributions may involve various trade-offs. *Pre-computed* values, such as an [alias table](https://en.wikipedia.org/wiki/Alias_method) for discrete distributions, or [“squeezing” functions](https://en.wikipedia.org/wiki/Rejection_sampling) for univariate distributions, can speed up sampling considerably. How much information should be pre-computed can depend on the number of values we plan to draw from a distribution. Also, some random number generators can have certain properties that various algorithms may want to exploit.
84
+
85
+
The `Random` module defines a customizable framework for obtaining random values that can address these issues. Each invocation of `rand` generates a *sampler* which can be customized with the above trade-offs in mind, by adding methods to `Sampler`, which in turn can dispatch on the random number generator, the object that characterizes the distribution, and a suggestion for the number of repetitions. Currently, for the latter, `Val{1}` (for a single sample) and `Val{Inf}` (for an arbitrary number) are used, with `Random.Repetition` an alias for both.
86
+
87
+
The object returned by `Sampler` is then used to generate the random values, by a method of `rand` defined for this purpose. Samplers can be arbitrary values, but for most applications the following predefined samplers may be sufficient:
88
+
89
+
1.`SamplerType{T}()` can be used for implementing samplers that draw from type `T` (e.g. `rand(Int)`).
90
+
91
+
2.`SamplerTrivial(self)` is a simple wrapper for `self`, which can be accessed with `[]`. This is the recommended sampler when no pre-computed information is needed (e.g. `rand(1:3)`).
92
+
93
+
3.`SamplerSimple(self, data)` also contains the additional `data` field, which can be used to store arbitrary pre-computed values.
94
+
95
+
We provide examples for each of these. We assume here that the choice of algorithm is independent of the RNG, so we use `AbstractRNG` in our signatures.
96
+
97
+
```@docs
98
+
Random.Sampler
99
+
Random.SamplerType
100
+
Random.SamplerTrivial
101
+
Random.SamplerSimple
102
+
```
103
+
104
+
Decoupling pre-computation from actually generating the values is part of the API, and is also available to the user. As an example, assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage of this decoupling is as follows:
105
+
106
+
```julia
107
+
rng =MersenneTwister()
108
+
sp = Random.Sampler(rng, 1:20) # or Random.Sampler(MersenneTwister, 1:20)
109
+
for x in X
110
+
n =rand(rng, sp) # similar to n = rand(rng, 1:20)
111
+
# use n
112
+
end
113
+
```
114
+
115
+
This is the mechanism that is also used in the standard library, e.g. by the default implementation of random array generation (like in `rand(1:20, 10)`).
86
116
87
117
#### Generating values from a type
88
118
89
-
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced.
90
-
In order to define random generation of values of type `T`, the following method can be defined:
91
-
`rand(rng::AbstractRNG, ::Random.SamplerType{T})` (this should return what `rand(rng, T)` is expected to return).
119
+
Given a type `T`, it's currently assumed that if `rand(T)` is defined, an object of type `T` will be produced. `SamplerType` is the *default sampler for types*. In order to define random generation of values of type `T`, the `rand(rng::AbstractRNG, ::Random.SamplerType{T})` method should be defined, and should return values what `rand(rng, T)` is expected to return.
92
120
93
-
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`.
94
-
We want `rand(Die)` to produce a die with a random number of up to 20 sides (and at least 4):
121
+
Let's take the following example: we implement a `Die` type, with a variable number `n` of sides, numbered from `1` to `n`. We want `rand(Die)` to produce a `Die` with a random number of up to 20 sides (and at least 4):
95
122
96
123
```jldoctest Die
97
124
struct Die
@@ -126,12 +153,11 @@ julia> a = Vector{Die}(undef, 3); rand!(a)
126
153
Die(8)
127
154
```
128
155
129
-
#### Generating values from a collection
156
+
#### A simple sampler without pre-computed data
157
+
158
+
Here we define a sampler for a collection. If no pre-computed data is required, it can be implemented with a `SamplerTrivial` sampler, which is in fact the *default fallback for values*.
130
159
131
-
Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced.
132
-
In order to define random generation out of objects of type `S`, the following method can be defined:
133
-
`rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`.
134
-
Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
160
+
In order to define random generation out of objects of type `S`, the following method should be defined: `rand(rng::AbstractRNG, sp::Random.SamplerTrivial{S})`. Here, `sp` simply wraps an object of type `S`, which can be accessed via `sp[]`. Continuing the `Die` example, we want now to define `rand(d::Die)` to produce an `Int` corresponding to one of `d`'s sides:
In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define
150
-
`Base.eltype(::Type{Die}) = Int`.
151
-
175
+
Given a collection type `S`, it's currently assumed that if `rand(::S)` is defined, an object of type `eltype(S)` will be produced. In the last example, a `Vector{Any}` is produced; the reason is that `eltype(Die) == Any`. The remedy is to define `Base.eltype(::Type{Die}) = Int`.
152
176
153
-
#### Generating values for an `AbstractFloat` type
177
+
A `SamplerTrivial` does not have to wrap the original object. For example, in `Random`, `AbstractFloat`types are special-cased, because by default random values are not produced in the whole type domain, but rather in `[0,1)`.
154
178
155
-
`AbstractFloat` types are special-cased, because by default random values are not produced in the whole type domain, but rather
156
-
in `[0,1)`. The following method should be implemented for `T <: AbstractFloat`:
Sampler(::Type{RNG}, ::Type{T}, n::Repetition) where {RNG<:AbstractRNG,T<:AbstractFloat} =
182
+
Sampler(RNG, CloseOpen01(T), n)
183
+
```
184
+
is defined to return `SamplerTrivial` with a `Random.CloseOpen01{T}}` type defined for this purpose, which has an appropriate `rand` method defined for it.
158
185
186
+
#### An optimized sampler with pre-computed data
159
187
160
-
#### Optimizing generation with cached computation between calls
188
+
Consider a discrete distribution, where numbers `1:n` are drawn with given probabilities that some to one. When many values are needed from this distribution, the fastest method if using an [alias table](https://en.wikipedia.org/wiki/Alias_method). We don't provide the algorithm for building such a table here, but suppose it is available in `make_alias_table(probabilities)` instead, and `draw_number(rng, alias_table)` can be used to draw a random number from it.
161
189
162
-
When repeatedly generating random values (with the same `rand` parameters), it happens for some types
163
-
that the result of a computation is used for each call. In this case, the computation can be decoupled
164
-
from actually generating the values. This is the case for example with the default implementation for
165
-
`AbstractArray`. Assume that `rand(rng, 1:20)` has to be called repeatedly in a loop: the way to take advantage
166
-
of this decoupling is as follows:
190
+
Suppose that the distribution is described by
191
+
```julia
192
+
struct DiscreteDistribution{V <:AbstractVector}
193
+
probabilities::V
194
+
end
195
+
```
196
+
and that we *always* want to build an a alias table, regardless of the number of values needed (we learn how to customize this below). The methods
197
+
```julia
198
+
Random.eltype(::Type{<:DiscreteDistribution}) = Int
167
199
200
+
function Random.Sampler(::AbstractRng, distribution::DiscreteDistribution, ::Repetition)
The `SamplerSimple` type is sufficient for most use cases with precomputed data. However, in order to demonstrate how to use custom sampler types, here we implement something similar to `SamplerSimple`.
176
215
177
-
This mechanism is of course used by the default implementation of random array generation (like in `rand(1:20, 10)`).
178
-
In order to implement this decoupling for a custom type, a helper type can be used.
179
-
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so
180
-
there is an opportunity for this optimization:
216
+
Going back to our `Die` example: `rand(::Die)` uses random generation from a range, so there is an opportunity for this optimization. We call our custom sampler `SamplerDie`.
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`.
198
-
In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
233
+
It's now possible to get a sampler with `sp = Sampler(rng, die)`, and use `sp` instead of `die` in any `rand` call involving `rng`. In the simplistic example above, `die` doesn't need to be stored in `SamplerDie` but this is often the case in practice.
199
234
200
-
This pattern is so frequent that a helper type named `Random.SamplerSimple` is available,
235
+
Of course, this pattern is so frequent that the helper type used above, namely `Random.SamplerSimple`, is available,
201
236
saving us the definition of `SamplerDie`: we could have implemented our decoupling with:
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)`
232
-
and `rand(::AbstractRNG, ::SamplerDieMany)`).
266
+
Of course, `rand` must also be defined on those types (i.e. `rand(::AbstractRNG, ::SamplerDie1)` and `rand(::AbstractRNG, ::SamplerDieMany)`). Note that, as usual, `SamplerTrivial` and `SamplerSimple` can be used if custom types are not necessary.
233
267
234
268
Note: `Sampler(rng, x)` is simply a shorthand for `Sampler(rng, x, Val(Inf))`, and
235
269
`Random.Repetition` is an alias for `Union{Val{1}, Val{Inf}}`.
0 commit comments