-
Couldn't load subscription status.
- Fork 13.9k
Optimise floating point is_finite (2x) and is_infinite (1.6x).
#57353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
src/libcore/num/f32.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary, rather than using the existing abs method (which uses the LLVM fabsf32 intrinsic directly)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only available in std. #50145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bikeshedding here, but given typical naming conventions, I think it'd make sense to simply call this abs (which shouldn't cause any conflicts) and add a comment explaining why this method is private for discoverability, e.g.
// FIXME(#50145): `abs` is publicly unavailable in libcore due to concerns
// about portability, so this implementation is for private use internally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about adding a comment, but I don't think that naming it abs is appropriate. I'd call it something like abs_hack so it's clear that it's not using the proper abs method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about abs_private with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:
```asm
is_infinite:
andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000
setae al
ret
is_finite:
andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000
ucomiss xmm1, xmm0
seta al
ret
```
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).
The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.
Benchmark (`abs` is the new form, `std` is the old):
```
test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10)
test f32_is_finite_std ... bench: 118 ns/iter (+/- 5)
test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1)
test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6)
test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12)
test f64_is_finite_std ... bench: 128 ns/iter (+/- 25)
test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5)
test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23)
```
```rust
#![feature(test)]
extern crate test;
use std::{f32, f64};
use test::Bencher;
const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
#[bench]
fn f32_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
#[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}
const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
#[bench]
fn f64_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
#[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
6a4473a to
6e742db
Compare
|
Nice catch with the nit. I've updated to fix it. |
|
@bors r+ |
|
📌 Commit 6e742db has been approved by |
|
⌛ Testing commit 6e742db with merge ab00b4b23bee84a75a5ee5ceec8d72340c4795f8... |
|
💔 Test failed - status-appveyor |
|
@bors retry |
…odrAus
Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:
```asm
is_infinite:
andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000
setae al
ret
is_finite:
andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000
ucomiss xmm1, xmm0
seta al
ret
```
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).
The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.
Benchmark (`abs` is the new form, `std` is the old):
```
test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10)
test f32_is_finite_std ... bench: 118 ns/iter (+/- 5)
test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1)
test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6)
test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12)
test f64_is_finite_std ... bench: 128 ns/iter (+/- 25)
test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5)
test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23)
```
```rust
#![feature(test)]
extern crate test;
use std::{f32, f64};
use test::Bencher;
const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
#[bench]
fn f32_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
#[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}
const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
#[bench]
fn f64_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
#[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
…odrAus
Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:
```asm
is_infinite:
andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000
setae al
ret
is_finite:
andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000
ucomiss xmm1, xmm0
seta al
ret
```
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).
The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.
Benchmark (`abs` is the new form, `std` is the old):
```
test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10)
test f32_is_finite_std ... bench: 118 ns/iter (+/- 5)
test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1)
test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6)
test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12)
test f64_is_finite_std ... bench: 128 ns/iter (+/- 25)
test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5)
test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23)
```
```rust
#![feature(test)]
extern crate test;
use std::{f32, f64};
use test::Bencher;
const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
#[bench]
fn f32_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
#[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}
const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
#[bench]
fn f64_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
#[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
Rollup of 16 pull requests Successful merges: - #57351 (Don't actually create a full MIR stack frame when not needed) - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).) - #57412 (Improve the wording) - #57436 (save-analysis: use a fallback when access levels couldn't be computed) - #57453 (lldb_batchmode.py: try `import _thread` for Python 3) - #57454 (Some cleanups for core::fmt) - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.) - #57473 (std: Render large exit codes as hex on Windows) - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.) - #57494 (Speed up item_bodies for large match statements involving regions) - #57496 (re-do docs for core::cmp) - #57508 (rustdoc: Allow inlining of reexported crates and crate items) - #57547 (Use `ptr::eq` where applicable) - #57557 (resolve: Mark extern crate items as used in more cases) - #57560 (hygiene: Do not treat `Self` ctor as a local variable) - #57564 (Update the const fn tracking issue to the new metabug) Failed merges: r? @ghost
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The
absbit-fiddling is simple (a single and), and so these newforms compile down to a few instructions, without branches, e.g. for
f32:
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the
seta/setaeare likely to be collapsed intoconditional jumps or moves (or similar).
The old
is_infinitedid two comparisons, and the oldis_finitedidthree (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.
Benchmark (
absis the new form,stdis the old):