Open
Description
opened on May 23, 2024
Eventually we will want to be able to make use of simd operations for f16 and f128, now that we have primitives to represent them. Possibilities that I know of:
- Aarch64 neon supports
float16x{4,8}
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&f:@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&q=. - Arm sve supports
float16x{1,2}
https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiesreturnbasetype=[float]&f:@navigationhierarchieselementbitsize=[16]&f:@navigationhierarchiessimdisa=[sve2,sve]&q= - RISC-V apparently has both f16 and f128 https://five-embeddev.com/riscv-user-isa-manual/riscv-user-2.2/v.html
- NVIDIA PTX has f16 SIMD
- Implementation: NVPTX: Add f16 SIMD intrinsics stdarch#1626
- Submodule Update stdarch submodule #128866
- Tracking issue: Tracking Issue for NVPTX arch intrinsics #111199
- x86 with +avx512fp16
- Implementation: Implement AVX512_FP16 stdarch#1605
- Submodule: Update the stdarch submodule #128466
- Tracking issue: Tracking Issue for AVX512_FP16 intrinsics #127213
- Portable SIMD should eventually be able to support these operations
Probably some work/research overlap with adding assembly #125398
Tracking issue: #116909
Activity