Skip to content

Add --fast-math mode #3155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 30, 2020
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ full changeset diff at the end of each section.
Current Trunk
-------------

- Add `--fast-math` mode. (#3155)

v97
---

Expand Down
5 changes: 5 additions & 0 deletions src/pass.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,11 @@ struct PassOptions {
// many cases.
bool lowMemoryUnused = false;
enum { LowMemoryBound = 1024 };
// Whether to allow "loose" math semantics, ignoring corner cases with NaNs
// and assuming math follows the algebraic rules for associativity and so
// forth (which IEEE floats do not, strictly speaking). This is inspired by
// gcc/clang's -ffast-math flag.
bool fastMath = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fast math" has many meanings, including ignoring negative zero, ignoring infinities, ignoring NaNs, ignoring signaling NaN, allowing for greater precision, and allowing for reduced precision. GCC and clang have moved to have several different flags for these things, as no single definition of "fast math" works for everyone. I encourage Binaryen to follow GCC and clang here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, @sunfishcode , that is the plan. This is just the first step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the status of this plan?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunfishcode So far the use cases for the fast math flag have only been ignoring NaNs, AFAIK. So we've not added more specific flags. I imagine we will when we start to optimize them.

Do you have more use cases or ideas perhaps?

Copy link
Contributor

@MaxGraey MaxGraey Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more levels we could do more floating points optimizations:
#3155 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate ignoreNaNs flag, which could also be enabled by fastMath, would allow users that want to ignore NaNs do so without having to know this implementation detail about Binaryen, and without opting into unknown optimizations in future versions of Binaryen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Is this something you'd use soon @sunfishcode ? If so I can open a PR later today probably. (Or maybe @MaxGraey you'd want to?) If it's not urgent we could open an issue so we don't forget.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken If this PR only add separate ignoreNaNs flags then I think it would be better that you open PR. I might do a more substantial PR adding a few levels for fastMath a bit later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I found some time, PR up: #4262

// Whether to try to preserve debug info through, which are special calls.
bool debugInfo = false;
// Arbitrary string arguments from the commandline, which we forward to
Expand Down
19 changes: 11 additions & 8 deletions src/passes/OptimizeInstructions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,10 @@ struct OptimizeInstructions
#endif
}

bool fastMath;

void doWalkFunction(Function* func) {
fastMath = getPassOptions().fastMath;
// first, scan locals
{
LocalScanner scanner(localInfo, getPassOptions());
Expand Down Expand Up @@ -1402,14 +1405,15 @@ struct OptimizeInstructions
}
{
double value;
if (matches(curr, binary(Abstract::Sub, any(), fval(&value))) &&
if (fastMath &&
matches(curr, binary(Abstract::Sub, any(), fval(&value))) &&
value == 0.0) {
// x - (-0.0) ==> x + 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong about this optimization without fastMath? Is it that it doesn't change non-canonical NaN bits but it should?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, re-reading it now I'm not sure. It doesn't remove a math operation (unlike the others), so it would still change NaNs as expected, I think? I guess we'd need to read the spec (wasm? IEEE?) carefully. If no one knows offhand, the safe thing may be to land this with a TODO for later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imteresting article:
https://754r.ucbtest.org/background/nan-propagation.pdf
See "What should happen when two payloads are combined?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes seems to be IEEE754 doesn't specify how two NaNs with different payloads should be combined. I guess it should be reflect on wasm spec tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw webassembly spec specify Na N-propogation rules for fneg, fabs and fcopysign:
https://webassembly.github.io/spec/core/bikeshed/index.html#nan-propagation%E2%91%A0

if (std::signbit(value)) {
curr->op = Abstract::getBinary(type, Abstract::Add);
right->value = right->value.neg();
return curr;
} else {
} else if (fastMath) {
// x - 0.0 ==> x
return curr->left;
}
Expand All @@ -1418,19 +1422,18 @@ struct OptimizeInstructions
{
// x + (-0.0) ==> x
double value;
if (matches(curr, binary(Abstract::Add, any(), fval(&value))) &&
if (fastMath &&
matches(curr, binary(Abstract::Add, any(), fval(&value))) &&
value == 0.0 && std::signbit(value)) {
return curr->left;
}
}
// Note that this is correct even on floats with a NaN on the left,
// as a NaN would skip the computation and just return the NaN,
// and that is precisely what we do here. but, the same with -1
// (change to a negation) would be incorrect for that reason.
if (matches(curr, binary(Abstract::Mul, any(&left), constant(1))) ||
matches(curr, binary(Abstract::DivS, any(&left), constant(1))) ||
matches(curr, binary(Abstract::DivU, any(&left), constant(1)))) {
return left;
if (curr->type.isInteger() || fastMath) {
return left;
}
}
return nullptr;
}
Expand Down
8 changes: 7 additions & 1 deletion src/tools/optimization-options.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,13 @@ struct OptimizationOptions : public ToolOptions {
Options::Arguments::Zero,
[this](Options*, const std::string&) {
passOptions.lowMemoryUnused = true;
});
})
.add(
"--fast-math",
"-ffm",
"Optimize floats without handling corner cases of NaNs and rounding",
Options::Arguments::Zero,
[this](Options*, const std::string&) { passOptions.fastMath = true; });
// add passes in registry
for (const auto& p : PassRegistry::get()->getRegisteredNames()) {
(*this).add(
Expand Down
41 changes: 4 additions & 37 deletions src/wasm/literal.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -934,35 +934,10 @@ Literal Literal::mul(const Literal& other) const {
return Literal(uint32_t(i32) * uint32_t(other.i32));
case Type::i64:
return Literal(uint64_t(i64) * uint64_t(other.i64));
case Type::f32: {
// Special-case multiplication by 1. nan * 1 can change nan bits per the
// wasm spec, but it is ok to just return that original nan, and we
// do that here so that we are consistent with the optimization of
// removing the * 1 and leaving just the nan. That is, if we just
// do a normal multiply and the CPU decides to change the bits, we'd
// give a different result on optimized code, which would look like
// it was a bad optimization. So out of all the valid results to
// return here, return the simplest one that is consistent with
// our optimization for the case of 1.
float lhs = getf32(), rhs = other.getf32();
if (rhs == 1) {
return Literal(lhs);
}
if (lhs == 1) {
return Literal(rhs);
}
return Literal(lhs * rhs);
}
case Type::f64: {
double lhs = getf64(), rhs = other.getf64();
if (rhs == 1) {
return Literal(lhs);
}
if (lhs == 1) {
return Literal(rhs);
}
return Literal(lhs * rhs);
}
case Type::f32:
return Literal(getf32() * other.getf32());
case Type::f64:
return Literal(getf64() * other.getf64());
case Type::v128:
case Type::funcref:
case Type::externref:
Expand Down Expand Up @@ -1002,10 +977,6 @@ Literal Literal::div(const Literal& other) const {
case FP_INFINITE: // fallthrough
case FP_NORMAL: // fallthrough
case FP_SUBNORMAL:
// Special-case division by 1, similar to multiply from earlier.
if (rhs == 1) {
return Literal(lhs);
}
return Literal(lhs / rhs);
default:
WASM_UNREACHABLE("invalid fp classification");
Expand Down Expand Up @@ -1034,10 +1005,6 @@ Literal Literal::div(const Literal& other) const {
case FP_INFINITE: // fallthrough
case FP_NORMAL: // fallthrough
case FP_SUBNORMAL:
// See above comment on f32.
if (rhs == 1) {
return Literal(lhs);
}
return Literal(lhs / rhs);
default:
WASM_UNREACHABLE("invalid fp classification");
Expand Down
21 changes: 21 additions & 0 deletions test/passes/O_fast-math.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
(module
(type $none_=>_f32 (func (result f32)))
(export "div" (func $0))
(export "mul1" (func $1))
(export "mul2" (func $2))
(export "add1" (func $1))
(export "add2" (func $2))
(export "add3" (func $2))
(export "add4" (func $2))
(export "sub1" (func $1))
(export "sub2" (func $2))
(func $0 (; has Stack IR ;) (result f32)
(f32.const -nan:0x23017a)
)
(func $1 (; has Stack IR ;) (result f32)
(f32.const -nan:0x34546d)
)
(func $2 (; has Stack IR ;) (result f32)
(f32.const -nan:0x74546d)
)
)
57 changes: 57 additions & 0 deletions test/passes/O_fast-math.wast
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
;; with fast-math we can optimize some of these patterns
(module
(func "div" (result f32)
(f32.div
(f32.const -nan:0x23017a)
(f32.const 1)
)
)
(func "mul1" (result f32)
(f32.mul
(f32.const -nan:0x34546d)
(f32.const 1)
)
)
(func "mul2" (result f32)
(f32.mul
(f32.const 1)
(f32.const -nan:0x34546d)
)
)
(func "add1" (result f32)
(f32.add
(f32.const -nan:0x34546d)
(f32.const -0)
)
)
(func "add2" (result f32)
(f32.add
(f32.const -0)
(f32.const -nan:0x34546d)
)
)
(func "add3" (result f32)
(f32.add
(f32.const -nan:0x34546d)
(f32.const 0)
)
)
(func "add4" (result f32)
(f32.add
(f32.const 0)
(f32.const -nan:0x34546d)
)
)
(func "sub1" (result f32)
(f32.sub
(f32.const -nan:0x34546d)
(f32.const 0)
)
)
(func "sub2" (result f32)
(f32.sub
(f32.const -nan:0x34546d)
(f32.const -0)
)
)
)
52 changes: 44 additions & 8 deletions test/passes/fuzz-exec_O.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,29 +31,65 @@
[fuzz-exec] comparing func_0
[fuzz-exec] comparing func_1
[fuzz-exec] calling div
[fuzz-exec] note result: div => -nan:0x23017a
[fuzz-exec] note result: div => -nan:0x63017a
[fuzz-exec] calling mul1
[fuzz-exec] note result: mul1 => -nan:0x34546d
[fuzz-exec] note result: mul1 => -nan:0x74546d
[fuzz-exec] calling mul2
[fuzz-exec] note result: mul2 => -nan:0x34546d
[fuzz-exec] note result: mul2 => -nan:0x74546d
[fuzz-exec] calling add1
[fuzz-exec] note result: add1 => -nan:0x74546d
[fuzz-exec] calling add2
[fuzz-exec] note result: add2 => -nan:0x74546d
[fuzz-exec] calling add3
[fuzz-exec] note result: add3 => -nan:0x74546d
[fuzz-exec] calling add4
[fuzz-exec] note result: add4 => -nan:0x74546d
[fuzz-exec] calling sub1
[fuzz-exec] note result: sub1 => -nan:0x74546d
[fuzz-exec] calling sub2
[fuzz-exec] note result: sub2 => -nan:0x74546d
(module
(type $none_=>_f32 (func (result f32)))
(export "div" (func $0))
(export "mul1" (func $1))
(export "mul2" (func $1))
(export "add1" (func $1))
(export "add2" (func $1))
(export "add3" (func $1))
(export "add4" (func $1))
(export "sub1" (func $1))
(export "sub2" (func $1))
(func $0 (; has Stack IR ;) (result f32)
(f32.const -nan:0x23017a)
(f32.const -nan:0x63017a)
)
(func $1 (; has Stack IR ;) (result f32)
(f32.const -nan:0x34546d)
(f32.const -nan:0x74546d)
)
)
[fuzz-exec] calling div
[fuzz-exec] note result: div => -nan:0x23017a
[fuzz-exec] note result: div => -nan:0x63017a
[fuzz-exec] calling mul1
[fuzz-exec] note result: mul1 => -nan:0x34546d
[fuzz-exec] note result: mul1 => -nan:0x74546d
[fuzz-exec] calling mul2
[fuzz-exec] note result: mul2 => -nan:0x34546d
[fuzz-exec] note result: mul2 => -nan:0x74546d
[fuzz-exec] calling add1
[fuzz-exec] note result: add1 => -nan:0x74546d
[fuzz-exec] calling add2
[fuzz-exec] note result: add2 => -nan:0x74546d
[fuzz-exec] calling add3
[fuzz-exec] note result: add3 => -nan:0x74546d
[fuzz-exec] calling add4
[fuzz-exec] note result: add4 => -nan:0x74546d
[fuzz-exec] calling sub1
[fuzz-exec] note result: sub1 => -nan:0x74546d
[fuzz-exec] calling sub2
[fuzz-exec] note result: sub2 => -nan:0x74546d
[fuzz-exec] comparing add1
[fuzz-exec] comparing add2
[fuzz-exec] comparing add3
[fuzz-exec] comparing add4
[fuzz-exec] comparing div
[fuzz-exec] comparing mul1
[fuzz-exec] comparing mul2
[fuzz-exec] comparing sub1
[fuzz-exec] comparing sub2
45 changes: 40 additions & 5 deletions test/passes/fuzz-exec_O.wast
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@
)
(module
(func "div" (result f32)
(f32.div ;; div by 1 can be removed, leaving this nan
(f32.const -nan:0x23017a) ;; as it is. wasm semantics allow nan bits to
(f32.const 1) ;; change, but the interpreter should not do so,
) ;; so that it does not fail on that opt.
(f32.div
(f32.const -nan:0x23017a)
(f32.const 1)
)
)
(func "mul1" (result f32)
(f32.mul
Expand All @@ -39,5 +39,40 @@
(f32.const -nan:0x34546d)
)
)
(func "add1" (result f32)
(f32.add
(f32.const -nan:0x34546d)
(f32.const -0)
)
)
(func "add2" (result f32)
(f32.add
(f32.const -0)
(f32.const -nan:0x34546d)
)
)
(func "add3" (result f32)
(f32.add
(f32.const -nan:0x34546d)
(f32.const 0)
)
)
(func "add4" (result f32)
(f32.add
(f32.const 0)
(f32.const -nan:0x34546d)
)
)
(func "sub1" (result f32)
(f32.sub
(f32.const -nan:0x34546d)
(f32.const 0)
)
)
(func "sub2" (result f32)
(f32.sub
(f32.const -nan:0x34546d)
(f32.const -0)
)
)
)

Loading