Skip to content

Commit e5884e7

Browse files
authored
Merge pull request #40546 from JuliaLang/jb/chethega/tlrng
task-local xoshiro rebased
2 parents 58ffe7e + cfd940e commit e5884e7

File tree

27 files changed

+758
-106
lines changed

27 files changed

+758
-106
lines changed

NEWS.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Language changes
2626
* Destructuring will no longer mutate values on the left hand side while iterating through values on the right hand side. In the example
2727
of an array `x`, `x[2], x[1] = x` will now swap the first and second entry of `x`, whereas it used to fill both entries with `x[1]`
2828
because `x[2]` was mutated during the iteration of `x`. ([#40737])
29+
* The default random number generator has changed, so all random numbers will be different (even with the
30+
same seed) unless an explicit RNG object is used. See the section on the `Random` standard library below.
2931

3032
Compiler/Runtime improvements
3133
-----------------------------
@@ -39,7 +41,12 @@ Command-line option changes
3941

4042
Multi-threading changes
4143
-----------------------
44+
4245
* If the `JULIA_NUM_THREADS` environment variable is set to `auto`, then the number of threads will be set to the number of CPU threads ([#38952])
46+
* Every `Task` object has a local random number generator state, providing reproducible (schedule-independent) execution
47+
of parallel simulation code by default. The default generator is also significantly faster in parallel than in
48+
previous versions.
49+
4350

4451
Build system changes
4552
--------------------
@@ -130,6 +137,9 @@ Standard library changes
130137

131138
#### Random
132139

140+
* The default random number generator has been changed from Mersenne Twister to [Xoshiro256++](https://prng.di.unimi.it/).
141+
The new generator has smaller state, better performance, and superior statistical properties.
142+
This generator is the one used for reproducible Task-local randomness.
133143

134144
#### REPL
135145

doc/src/devdocs/subarrays.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,14 @@ julia> A = rand(2,3,4);
1919
2020
julia> S1 = view(A, :, 1, 2:3)
2121
2×2 view(::Array{Float64, 3}, :, 1, 2:3) with eltype Float64:
22-
0.200586 0.066423
23-
0.298614 0.956753
22+
0.166507 0.97397
23+
0.754392 0.831383
2424
2525
julia> S2 = view(A, 1, :, 2:3)
2626
3×2 view(::Array{Float64, 3}, 1, :, 2:3) with eltype Float64:
27-
0.200586 0.066423
28-
0.246837 0.646691
29-
0.648882 0.276021
27+
0.166507 0.97397
28+
0.518957 0.0705793
29+
0.503714 0.825124
3030
```
3131
```@meta
3232
DocTestSetup = nothing

doc/src/manual/performance-tips.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,12 @@ julia> function sum_global()
7777
end;
7878
7979
julia> @time sum_global()
80-
0.009639 seconds (7.36 k allocations: 300.310 KiB, 98.32% compilation time)
81-
496.84883432553846
80+
0.026328 seconds (9.30 k allocations: 416.747 KiB, 36.50% gc time, 99.48% compilation time)
81+
508.39048990953665
8282
8383
julia> @time sum_global()
84-
0.000140 seconds (3.49 k allocations: 70.313 KiB)
85-
496.84883432553846
84+
0.000075 seconds (3.49 k allocations: 70.156 KiB)
85+
508.39048990953665
8686
```
8787

8888
On the first call (`@time sum_global()`) the function gets compiled. (If you've not yet used [`@time`](@ref)
@@ -113,12 +113,12 @@ julia> function sum_arg(x)
113113
end;
114114
115115
julia> @time sum_arg(x)
116-
0.006202 seconds (4.18 k allocations: 217.860 KiB, 99.72% compilation time)
117-
496.84883432553846
116+
0.010298 seconds (4.23 k allocations: 226.021 KiB, 99.81% compilation time)
117+
508.39048990953665
118118
119119
julia> @time sum_arg(x)
120120
0.000005 seconds (1 allocation: 16 bytes)
121-
496.84883432553846
121+
508.39048990953665
122122
```
123123

124124
The 1 allocation seen is from running the `@time` macro itself in global scope. If we instead run
@@ -129,7 +129,7 @@ julia> time_sum(x) = @time sum_arg(x);
129129
130130
julia> time_sum(x)
131131
0.000001 seconds
132-
496.84883432553846
132+
508.39048990953665
133133
```
134134

135135
In some situations, your function may need to allocate memory as part of its operation, and this

src/jltypes.c

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2524,7 +2524,7 @@ void jl_init_types(void) JL_GC_DISABLED
25242524
NULL,
25252525
jl_any_type,
25262526
jl_emptysvec,
2527-
jl_perm_symsvec(10,
2527+
jl_perm_symsvec(14,
25282528
"next",
25292529
"queue",
25302530
"storage",
@@ -2534,8 +2534,12 @@ void jl_init_types(void) JL_GC_DISABLED
25342534
"code",
25352535
"_state",
25362536
"sticky",
2537-
"_isexception"),
2538-
jl_svec(10,
2537+
"_isexception",
2538+
"rngState0",
2539+
"rngState1",
2540+
"rngState2",
2541+
"rngState3"),
2542+
jl_svec(14,
25392543
jl_any_type,
25402544
jl_any_type,
25412545
jl_any_type,
@@ -2545,7 +2549,11 @@ void jl_init_types(void) JL_GC_DISABLED
25452549
jl_any_type,
25462550
jl_uint8_type,
25472551
jl_bool_type,
2548-
jl_bool_type),
2552+
jl_bool_type,
2553+
jl_uint64_type,
2554+
jl_uint64_type,
2555+
jl_uint64_type,
2556+
jl_uint64_type),
25492557
0, 1, 6);
25502558
jl_value_t *listt = jl_new_struct(jl_uniontype_type, jl_task_type, jl_nothing_type);
25512559
jl_svecset(jl_task_type->types, 0, listt);

src/julia.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1805,6 +1805,10 @@ typedef struct _jl_task_t {
18051805
uint8_t _state;
18061806
uint8_t sticky; // record whether this Task can be migrated to a new thread
18071807
uint8_t _isexception; // set if `result` is an exception to throw or that we exited with
1808+
uint64_t rngState0; // really rngState[4], but more convenient to split
1809+
uint64_t rngState1;
1810+
uint64_t rngState2;
1811+
uint64_t rngState3;
18081812

18091813
// hidden state:
18101814
// saved gc stack top for context switches

src/task.c

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
#include "julia_internal.h"
3636
#include "threading.h"
3737
#include "julia_assert.h"
38+
#include "support/hashing.h"
3839

3940
#ifdef __cplusplus
4041
extern "C" {
@@ -648,6 +649,63 @@ JL_DLLEXPORT void jl_rethrow_other(jl_value_t *e JL_MAYBE_UNROOTED)
648649
throw_internal(ct, NULL);
649650
}
650651

652+
/* This is xoshiro256++ 1.0, used for tasklocal random number generation in julia.
653+
This implementation is intended for embedders and internal use by the runtime, and is
654+
based on the reference implementation on http://prng.di.unimi.it
655+
656+
Credits go to Sebastiano Vigna for coming up with this PRNG.
657+
658+
There is a pure julia implementation in stdlib that tends to be faster when used from
659+
within julia, due to inlining and more agressive architecture-specific optimizations.
660+
*/
661+
JL_DLLEXPORT uint64_t jl_tasklocal_genrandom(jl_task_t *task) JL_NOTSAFEPOINT
662+
{
663+
uint64_t s0 = task->rngState0;
664+
uint64_t s1 = task->rngState1;
665+
uint64_t s2 = task->rngState2;
666+
uint64_t s3 = task->rngState3;
667+
668+
uint64_t t = s0 << 17;
669+
uint64_t tmp = s0 + s3;
670+
uint64_t res = ((tmp << 23) | (tmp >> 41)) + s0;
671+
s2 ^= s0;
672+
s3 ^= s1;
673+
s1 ^= s2;
674+
s0 ^= s3;
675+
s2 ^= t;
676+
s3 = (s3 << 45) | (s3 >> 19);
677+
678+
task->rngState0 = s0;
679+
task->rngState1 = s1;
680+
task->rngState2 = s2;
681+
task->rngState3 = s3;
682+
return res;
683+
}
684+
685+
void rng_split(jl_task_t *from, jl_task_t *to) JL_NOTSAFEPOINT
686+
{
687+
/* TODO: consider a less ad-hoc construction
688+
Ideally we could just use the output of the random stream to seed the initial
689+
state of the child. Out of an overabundance of caution we multiply with
690+
effectively random coefficients, to break possible self-interactions.
691+
692+
It is not the goal to mix bits -- we work under the assumption that the
693+
source is well-seeded, and its output looks effectively random.
694+
However, xoshiro has never been studied in the mode where we seed the
695+
initial state with the output of another xoshiro instance.
696+
697+
Constants have nothing up their sleeve:
698+
0x02011ce34bce797f == hash(UInt(1))|0x01
699+
0x5a94851fb48a6e05 == hash(UInt(2))|0x01
700+
0x3688cf5d48899fa7 == hash(UInt(3))|0x01
701+
0x867b4bb4c42e5661 == hash(UInt(4))|0x01
702+
*/
703+
to->rngState0 = 0x02011ce34bce797f * jl_tasklocal_genrandom(from);
704+
to->rngState1 = 0x5a94851fb48a6e05 * jl_tasklocal_genrandom(from);
705+
to->rngState2 = 0x3688cf5d48899fa7 * jl_tasklocal_genrandom(from);
706+
to->rngState3 = 0x867b4bb4c42e5661 * jl_tasklocal_genrandom(from);
707+
}
708+
651709
JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, jl_value_t *completion_future, size_t ssize)
652710
{
653711
jl_task_t *ct = jl_current_task;
@@ -683,6 +741,8 @@ JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, jl_value_t *completion
683741
t->_isexception = 0;
684742
// Inherit logger state from parent task
685743
t->logstate = ct->logstate;
744+
// Fork task-local random state from parent
745+
rng_split(ct, t);
686746
// there is no active exception handler available on this stack yet
687747
t->eh = NULL;
688748
t->sticky = 1;

stdlib/LinearAlgebra/test/bunchkaufman.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ n = 10
1212
n1 = div(n, 2)
1313
n2 = 2*n1
1414

15-
Random.seed!(12343210)
15+
Random.seed!(12343212)
1616

1717
areal = randn(n,n)/2
1818
aimg = randn(n,n)/2

stdlib/LinearAlgebra/test/cholesky.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ end
3939
n1 = div(n, 2)
4040
n2 = 2*n1
4141

42-
Random.seed!(12343)
42+
Random.seed!(12344)
4343

4444
areal = randn(n,n)/2
4545
aimg = randn(n,n)/2

stdlib/LinearAlgebra/test/dense.jl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,17 @@ n = 10
1515
n1 = div(n, 2)
1616
n2 = 2*n1
1717

18-
Random.seed!(1234321)
18+
Random.seed!(1234323)
1919

2020
@testset "Matrix condition number" begin
2121
ainit = rand(n,n)
2222
@testset "for $elty" for elty in (Float32, Float64, ComplexF32, ComplexF64)
2323
ainit = convert(Matrix{elty}, ainit)
2424
for a in (copy(ainit), view(ainit, 1:n, 1:n))
25-
@test cond(a,1) 4.837320054554436e+02 atol=0.01
26-
@test cond(a,2) 1.960057871514615e+02 atol=0.01
27-
@test cond(a,Inf) 3.757017682707787e+02 atol=0.01
28-
@test cond(a[:,1:5]) 10.233059337453463 atol=0.01
25+
@test cond(a,1) 50.60863783272028 atol=0.5
26+
@test cond(a,2) 23.059634761613314 atol=0.5
27+
@test cond(a,Inf) 45.12503933120795 atol=0.4
28+
@test cond(a[:,1:5]) 5.719500544258695 atol=0.01
2929
@test_throws ArgumentError cond(a,3)
3030
end
3131
end

stdlib/LinearAlgebra/test/eigen.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ n = 10
1111
n1 = div(n, 2)
1212
n2 = 2*n1
1313

14-
Random.seed!(1234321)
14+
Random.seed!(12343219)
1515

1616
areal = randn(n,n)/2
1717
aimg = randn(n,n)/2

0 commit comments

Comments
 (0)