Skip to content

Commit

Permalink
[BE] Use __builtin_overflow_sub when available (pytorch#117015)
Browse files Browse the repository at this point in the history
Which is faster then ternary.

Following script
```python
import torch
from timeit import default_timer

global_setup = """
"""
setup = """
c10::SymInt a = c10::SymInt(123);
"""
code = """
-a;
"""

from torch.utils.benchmark import Timer

t = Timer(stmt=code, setup=setup, global_setup=global_setup, language="c++", timer=default_timer)

print(t.blocked_autorange())
```

reports 4.17 ns median type before and 3.61 ns after on x86_64 Linux and 2.02 ns before and 1.91 ns after on Apple M1

Pull Request resolved: pytorch#117015
Approved by: https://github.com/albanD
  • Loading branch information
malfet authored and pytorchmergebot committed Jan 10, 2024
1 parent a6325ad commit fdfdba7
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions c10/core/SymInt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#include <c10/core/SymInt.h>
#include <c10/core/SymNodeImpl.h>
#include <c10/util/intrusive_ptr.h>
#include <c10/util/safe_numerics.h>
#include <functional>

namespace c10 {
Expand Down Expand Up @@ -139,8 +140,16 @@ SymInt operator-(const SymInt& s) {
// But on many platforms it equals to self + setting Carry/Overflow flags
// Which in opimized code affects results of `check_range` condition
// Workaround by using ternary that avoids alterning the flags
#if C10_HAS_BUILTIN_OVERFLOW()
std::decay_t<decltype(val)> out = 0;
if (C10_UNLIKELY(__builtin_sub_overflow(out, val, &out))) {
return SymInt(val);
}
return SymInt(out);
#else
constexpr auto val_min = std::numeric_limits<decltype(val)>::min();
return SymInt(val != val_min ? -val : val_min);
#endif
} else {
return SymInt(s.toSymNodeImplUnowned()->neg());
}
Expand Down

0 comments on commit fdfdba7

Please sign in to comment.