Skip to content

<random>: Implement modified ziggurat algorithm for normal_distribution#6104

Open
statementreply wants to merge 9 commits intomicrosoft:mainfrom
statementreply:random-ziggurat
Open

<random>: Implement modified ziggurat algorithm for normal_distribution#6104
statementreply wants to merge 9 commits intomicrosoft:mainfrom
statementreply:random-ziggurat

Conversation

@statementreply
Copy link
Contributor

@statementreply statementreply commented Feb 23, 2026

This PR implements modified ziggurat algorithm for normal_distribution, based on McFarland, C. D. (2015). A modified ziggurat algorithm for generating exponentially and normally distributed pseudorandom numbers.

The main differences between this PR and the paper:

  • The layers are moved upwards to reduce the area of non-rectangular regions, creating a bottommost layer of which the size of the rectangular region is smaller than 1/N.
  • Different criteria are used for the always sampled and always rejected regions of ziggurat overhangs.

Benchmark results for x64 on Intel Core Ultra 7 265KF:

Benchmark Time Speed Up
BM_Generator<std::mt19937_64> 2.40 ns
BM_Generator<b_r::mt19937_64> 1.60 ns
BM_Distribution<std::mt19937_64, std_old::normal_distribution<double>> 7.00 ns 1.00
BM_Distribution<std::mt19937_64, std_new::normal_distribution<double>> 3.15 ns 2.22
BM_Distribution<std::mt19937_64, boost_r::normal_distribution<double>> 3.94 ns 1.78
BM_Distribution<b_r::mt19937_64, boost_r::normal_distribution<double>> 3.57 ns 1.96
BM_Generator<std::mt19937> 2.43 ns
BM_Generator<b_r::mt19937> 1.40 ns
BM_Distribution<std::mt19937, std_old::normal_distribution<float>> 6.77 ns 1.00
BM_Distribution<std::mt19937, std_new::normal_distribution<float>> 3.14 ns 2.16
BM_Distribution<std::mt19937, boost_r::normal_distribution<float>> 3.98 ns 1.70
BM_Distribution<b_r::mt19937, boost_r::normal_distribution<float>> 3.14 ns 2.16

Removed tests\GH_004618_normal_distribution_avoids_resets which became moot.

Fixes #1003.

@github-project-automation github-project-automation bot moved this from Initial Review to Work In Progress in STL Code Reviews Feb 23, 2026
@StephanTLavavej StephanTLavavej added the performance Must go faster label Feb 23, 2026

_STD_BEGIN

template <class _Ty, class _Uty, bool _Signed, int _Lx>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove _Signed is it's always true?

Suggested change
template <class _Ty, class _Uty, bool _Signed, int _Lx>
template <class _Ty, class _Uty, int _Lx>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reserved for exponential_distribution, where _Signed would be false.

static_assert(_Lx <= 254, "invalid table size");

using _Uint_type = _Uty;
using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;
using _Xtype = make_signed_t<_Uty>;

Also, would it be better to rename it to _Sint_type?

using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;

static constexpr int _Layer_num = _Lx;
static constexpr _Ty _Width_scale = (_Signed ? _Ty{1} : _Ty{2}) * _Ty{(numeric_limits<_Uty>::max)() / 2u + 1u};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid numeric_limits.

Suggested change
static constexpr _Ty _Width_scale = (_Signed ? _Ty{1} : _Ty{2}) * _Ty{(numeric_limits<_Uty>::max)() / 2u + 1u};
static constexpr _Ty _Width_scale = _Ty{static_cast<_Uty>(-1) / 2u + 1u};

@StephanTLavavej StephanTLavavej moved this from Work In Progress to Initial Review in STL Code Reviews Feb 24, 2026

if (_Regular_layer < _Tables._Layer_num - 1) {
// rectangular region of a middle layer
return static_cast<_Ty>(_Tables._Layer_widths[_Regular_layer + 1] * static_cast<_Fty>(_Xbits));
Copy link
Contributor

@MattStephanson MattStephanson Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In common cases (e.g., CVTSI2SD with ties-to-even) this conversion to floating point is slightly biased toward values where the significand ends in 0. It's also correlated with _Layer because a tie only happens when the lower eight or eleven bits of _Xbits are 1 followed by all 0s. Occurs below.

I'm unsure how much effect this has, but if it doesn't affect the performance much, maybe it would be cleaner to mask or shift out the lower bits as they're used.

[Edit: I shouldn't say that round-to-even bias happens only with those value of the lower bits. I think it's sufficient but not necessary.]

const _Fty _Ydiff = _Yy1 - _Yy0;

for (;;) {
const _Fty _Dx = _Xdiff * _STD _Nrand_impl<_Fty>(_Eng);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper bits of _Xbits could be used on the first iteration. Also for the tail, below.

Comment on lines +2709 to +2714
const _Fty _Xval = -_STD log(_Fty{1.0} - _STD _Nrand_impl<_Fty>(_Eng)) * _Tail_reciprocal;
const _Fty _Yval = _STD _Nrand_impl<_Fty>(_Eng);
if (_Yval < _STD exp(_Xval * _Xval * _Fty{-0.5})) {
const _Fty _Xorig = _Tail + _Xval;
return static_cast<_Ty>(_Xbits < 0 ? -_Xorig : _Xorig);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marsaglia's tail method might be faster if we don't have a fast exponential generator, as it only needs sqrt when returning, instead of an additional log or exp. It can be found in Luc Devroye's book, section IX.1.2, or this DTIC report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

<random>: normal_distribution is slower than Boost

4 participants