GCC conveniently provides the __builtin_cpu_supports("f16c") and __builtin_cpu_supports("avx512fp16") used here, to detect the availability of half-precision float16 SIMD-instructions in x86 CPUs of the AVX2 and AVX-512 generations. Assuming this functionality is becoming more important in rising AI applications, achieving parity with recent GCC versions would be great.
For now, I've switched to inline Assembly to support a broader range of compilers. I'm ready to help patch Clang if more people face the issue.