Description
Bugzilla Link | 52593 |
Version | trunk |
OS | All |
CC | @fhahn,@RKSimon |
Extended Description
Suppose we have a function
template<class T>
double foo(const std::vector<double>& v) {
double sum = 0;
T size = v.size();
for (T i = 0; i < size; i += 2) {
sum += v[i];
}
return sum;
}
and call it as foo<uint64_t>
and foo<int64_t>
.
The signed version is 1.7 times faster (both with libstdc++ and libc++)
https://quick-bench.com/q/EQxNHHtqi9497mG3ovtfG7fQa6s
(Note that gcc generates code for both signed and unsigned version approximately as fast as signed clang version https://quick-bench.com/q/lDXyfset7rLDmOG2mGxjVb1h0Zw)
Indeed we can se on godbolt that code generated for different type is different: https://godbolt.org/z/Kjv6q7dsr
But seemingly there's no reason why they should be different.
One can argue that one possible difference is we can assume no overflow for signed version. But actually we can assume no overflow in unsigned version as well at least for three reasons:
- if there's overflow, then the loop is infinite without any side-effects which is UB
- the size guaranteed to be less then v.max_size() which is way less then 2**64
- size is calculated as difference of 2 pointers to 8byte type, so it's bounded by 2**61 or so
Note that I tried few ways to use __builtin_assume() to "proof" to the compiler there's no overflow which didn't help.