Description
Proposal
Problem statement
It is a common theme in performance-oriented programming to want to select between two outcomes, but without changing which code runs as this invokes branch prediction. Branch prediction is great when it works but has two downsides:
- It can be significantly slower than branchless code if the branch is unpredictable, like in sorting random data.
- It can have inconsistent performance (even if faster on average), which can be a deal-breaker in latency-sensitive applications.
Currently in Rust if you want to have a good chance of something being branchless you have to write your code in a particular way, and even then it regularly fails to compile to branchless instructions, depending on the mood of the compiler.
Motivating examples or use cases
There are tons of examples, I don't think I have to convince anyone familiar with performance-sensitive code of its use. Binary searches, sorting, filtering of data based on a condition, the applications are nearly endless. In performance-sensitive code, if you write an if
where both sides of the if
are cheap to evaluate and the condition is not 99%+ consistent there's a good chance a branchless select
statement would be faster.
And of course there is the consistency argument even if the branch is 99%+ consistent in most cases, for example audio DSP code might want to ensure a kernel is always consistent in its speed to prevent audio crackling when a rare sequence of unpredictable samples come along, or high-frequency trading code may want to ensure it always has a response in a certain amount of nanoseconds.
Solution sketch
I propose we add the following method to the bool
primitive type:
/// Selects one of two values based on the value of self.
///
/// Unlike an `if` statement both arguments are always evaluated. The compiler
/// will try to emit machine code that avoids control flow for this selection,
/// taking the same amount of time regardless of whether self is true or false.
const fn select<T>(self, true_val: T, false_val: T) -> T;
This interface is similar to the std::simd::Mask::select
API, just for one value.
Alternatives
There is the select_unpredictable
intrinsic. However, it is an intrinsic so it should not really be used. There is no viable alternative, even writing things such as [false_val, true_val][cond as usize]
can and do get rewritten by the compiler to branches, or just generate plain terrible code. One can reach for inline assembly but this pessimizes surrounding code and is not portable.
As for bikeshedding about the name, I considered the following alternatives, in order of preference:
select_predicated
(see predication)select_eager
select_branchless
select_unpredictable
(as the intrinsic was named)
However, I prefer just the simple select
. I think the fact it always eagerly evaluates both arguments (as functions do) is already self-documenting enough to readers to show it is not a control flow structure, but rather a selection operator between two pre-computed values. I think it is fairly natural that this then in turn compiles to a branchless select, even when simply reading cond.select(a, b)
.
My reason for not favoring the current name of the intrinsic select_unpredictable
is that (in my opinion) it confuses a problem with the solution. This can lead to confusing scenarios where the solution is applicable but the problem is not, e.g. someone could make an informed choice to use select_unpredictable
while the boolean condition is in fact predictable - they just wanted consistency or avoid conditional jumps for other reasons.