Open
Description
According to the performance test it's unconditionally faster for vectors, by at least 5x. We could pattern match a / cast(..., some_uint8)
where a is a vector type in the find_intrinsics pass.
division rounding to negative infinity:
type const-divisor speed-up runtime-divisor speed-up
Int(32, 1) 2.662 1.101
Int(16, 1) 3.022 1.663
Int( 8, 1) 1.408 1.080
UInt(32, 1) 2.570 1.706
UInt(16, 1) 3.068 1.450
UInt( 8, 1) 2.987 1.456
Int(32, 8) 10.722 7.991
Int(16, 16) 46.577 30.900
Int( 8, 32) 25.602 8.292
UInt(32, 8) 8.115 5.423
UInt(16, 16) 24.296 13.680
UInt( 8, 32) 42.669 19.993
signed division rounding to zero:
type const-divisor speed-up runtime-divisor speed-up
Int(32, 1) 2.402 1.155
Int(16, 1) 2.537 1.453
Int( 8, 1) 1.774 0.680
Int(32, 8) 8.517 5.975
Int(16, 16) 52.965 38.595
Int( 8, 32) 19.745 8.318
modulus:
type const-divisor speed-up runtime-divisor speed-up
Int(32, 1) 2.394 1.143
Int(16, 1) 2.536 1.503
Int( 8, 1) 1.755 0.671
UInt(32, 1) 2.279 1.690
UInt(16, 1) 2.659 1.594
UInt( 8, 1) 2.567 1.212
Int(32, 8) 8.296 5.696
Int(16, 16) 53.311 32.092
Int( 8, 32) 19.439 8.173
UInt(32, 8) 6.009 5.103
UInt(16, 16) 19.090 12.386
UInt( 8, 32) 22.973 15.043
Success!
Metadata
Metadata
Assignees
Labels
No labels