Skip to content

osx-arm64 optimal code generation #41128

Open
@sdmaclea

Description

@sdmaclea

The Apple Silicon dev kit reports the following hardware features.

$ sysctl-a | grep hw.optional
hw.optional.floatingpoint: 1                                                   
hw.optional.watchpoint: 4           
hw.optional.breakpoint: 6                                                      
hw.optional.neon: 1                
hw.optional.neon_hpfp: 1              
hw.optional.neon_fp16: 1                                                       
hw.optional.armv8_1_atomics: 1  
hw.optional.armv8_crc32: 1                                                     
hw.optional.armv8_2_fhm: 0                                                     
hw.optional.amx_version: 0           
hw.optional.ucnormal_mem: 0                                                    
hw.optional.arm64: 1                                                           

I believe there is at least draft support for armv8_1_atomics, but given these are performance critical we should try to make sure we have used them in any perf critical code.

I believe we have enabled armv8_crc32 intrinsics

@tannergooding was looking for half precision floating point for AI work. Given that it is supported here it might be good to at least add the intrinsics. Maybe consider higher level support too.

category:cq
theme:vector-codegen
skill-level:expert
cost:large
impact:large

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions