Skip to content

TensorPrimitive: Consider to optimize integer divisions #105204

Open
@huoyaoyuan

Description

@huoyaoyuan

TensorPrimitive by default delegates simple operators to vector intrinsics. This is fine for most operations, but IDIV is an exception.

First, most (if not all) ISAs lack support for IDIV in vector. I've checked AVX512/Avx2 and Sve/AdvSimd but don't find it. Thus our intrinsic vector will use software simulation. On my CPU with AVX2, it's about 2.5x slower comparing to naive for-loop on int[1024] / int(scalar).

When dividing with a common divisor, there is also the widely-used preinv algorithm to turn the division into cheaper multiplication, which is supported for vectorization on various ISAs.

I'm not sure if integer division is popular enough for this optimization. But we should at least disable DivideOperator.Vectorizable for integer types, because it ends up uses software simulation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions