Skip to content

Missed optimization when summing every second number in unsigned #51935

Open
@AlexeyDmitriev

Description

@AlexeyDmitriev
Bugzilla Link 52593
Version trunk
OS All
CC @fhahn,@RKSimon

Extended Description

Suppose we have a function

template<class T>
double foo(const std::vector<double>& v) {
    double sum = 0;
    T size = v.size();
    for (T i = 0; i < size; i += 2) {
        sum += v[i];
    }
    return sum;
}

and call it as foo<uint64_t> and foo<int64_t>.

The signed version is 1.7 times faster (both with libstdc++ and libc++)
https://quick-bench.com/q/EQxNHHtqi9497mG3ovtfG7fQa6s

(Note that gcc generates code for both signed and unsigned version approximately as fast as signed clang version https://quick-bench.com/q/lDXyfset7rLDmOG2mGxjVb1h0Zw)

Indeed we can se on godbolt that code generated for different type is different: https://godbolt.org/z/Kjv6q7dsr
But seemingly there's no reason why they should be different.

One can argue that one possible difference is we can assume no overflow for signed version. But actually we can assume no overflow in unsigned version as well at least for three reasons:

  1. if there's overflow, then the loop is infinite without any side-effects which is UB
  2. the size guaranteed to be less then v.max_size() which is way less then 2**64
  3. size is calculated as difference of 2 pointers to 8byte type, so it's bounded by 2**61 or so

Note that I tried few ways to use __builtin_assume() to "proof" to the compiler there's no overflow which didn't help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugzillaIssues migrated from bugzillaenhancementImproving things as opposed to bug fixing, e.g. new or missing featureloopoptim

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions