[ENH] Add new methods of generating bounded random integers

**Describe the solution you'd like**
There have been recent developments in generating bounded random integers, in particular see https://github.com/apple/swift/pull/39143

This should be added to RAFT and then used in cugraph (in particular algorithms like random walks), for the reasons outlined in context below.

**Additional context**
There are generally two "simple" methods of generating bounded random integers:
1. Generate some number of random bits (usually 32 or 64), interpret as unsigned integer and take modulo bound.
2. Generate random uniform floating-point number (curand generates in interval `]0, 1]`) and multiply by `~bound`, then center this range and cast to integer.

Method 1 is slow (due to modulo) and can have fairly high bias towards the lower values in `[0, bound[` since the `bound` generally does not divide the number of possible integers generated.
Method 2 seems better for lower values of `bound` but there are a number of issues:
1. It is actually not that easy to get right, since we need to center correctly: we have to multiply by `bound`, otherwise we don't get the right range, but curand may generate 1, so `floor(x * bound)` may generate the value `bound`. We either have to do `floor((x - std::numeric_limits<float>::epsilon()) * bound)` or `floor(x * bound)` then check for `bound` and subtract 1 introducing slight bias towards the largest value. Multiplying by `bound - 1` or `bound - 0.5` always offsets the range with some values having more or less bias.
2. For large values of `bound`, we run into similar (if not bigger) problems with bias than the modulo-based method: with 32-bit float, there are only 2^24 possible values in `]0, 1]`. That's only ~17M. For a large enough bound, we don't have enough floating point numbers to evenly distribute over the possible outputs: for example for a bound of 10M, some integers will get hit twice but some only once meaning that some integers will be 50% more likely to be drawn than others.
3. We could solve the problems above by always generating 64-bit floats for which most possible values of `bound` are small (as long as they are much smaller than 2^53). But generating uniform floating point numbers is slow itself and 64-bit floats are very slow, plus the arithmetic needed afterwards are slower, too.

Examples of issues with floating-point centering:
[current method in random walks] `floor(x * (bound - 1))` : this has a very low likelihood to generate the value `bound - 1`. For `bound := 3`, this means `2` is only generated with a probability of ~2^-24
[centered method]: `floor(x * (bound - 1) + 0.5)` : For `bound := 3`, this would transform the range to ]0.5, 2.5] meaning that 1 is twice more likely to be generated than both 0 and 2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add new methods of generating bounded random integers #1979

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] Add new methods of generating bounded random integers #1979

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions