Closed
Description
When loading a bool
atomically in C, clang generates an extra andi
instruction, which I believe is extraneous.
Consider this small litmus test (also at https://godbolt.org/z/TzzxGra86):
#include <stdbool.h>
extern bool active;
bool foo1(void) {
return __atomic_load_n(&active, __ATOMIC_RELAXED);
}
bool foo2(void) {
return active;
}
With -O1
, clang generates the following:
foo1:
.Lpcrel_hi0:
auipc a0, %got_pcrel_hi(active)
ld a0, %pcrel_lo(.Lpcrel_hi0)(a0)
lb a0, 0(a0)
andi a0, a0, 1
ret
foo2:
.Lpcrel_hi1:
auipc a0, %got_pcrel_hi(active)
ld a0, %pcrel_lo(.Lpcrel_hi1)(a0)
lbu a0, 0(a0)
ret
The atomic load version does an extra andi
rather than just lbu
as in the non-atomic version.
It seems GCC used to do this as well, but it was fixed (reference below).
References:
- old GCC bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417
Comparison's with the same litmus test as that bug report: - clang trunk comparison (extra
andi
): https://godbolt.org/z/1nzdxT64v - GCC 14.2.0 comparison (identical output): https://godbolt.org/z/75fjrxa45
- GCC 8.2 comparison (also has extra
andi
): https://godbolt.org/z/9Y6osTqqP