Skip to content

risc-v vector v1.0 support #4050

Open
Open
@sh-zheng

Description

@sh-zheng

The discussions in #4049 inspire me to creat an issue for further discussions.
Differ from commercial ISAs, which have a clear development plan, the total amount of products supporting rvv may be large. Optimization for all individual products may lead to code bloat, and is contrary to the purpose of the vector isa, which is expected to be length-adaptive.
Until now the intrinsic spec of rvv 1.0 is stable enough to develop codes, and the support of rvv 1.0 has been fully submitted to openblas, based on sifive x280, an in-order cpu with vlen=512.
Would it be better to do more development, based on this x280 version? The final destination may be the compatibility in different vlen, instruction execution order, tail/mask policy. Of course the pursuing of compatibility may lead to suboptimum performance, a balance have to be considered.
There are some cpu specified features in kernels of x280 and may lead to incorrect results in other cpus. List as following

  1. Architecture specified cflags, such as -riscv-v-vector-bits-min=512 and -ffast-math.
  2. Changing vl in a loop, leading to tail cleared without tail undisturbed setted. Such as vl = VSETVL(k); in symv_L_rvv.c, line 96.
  3. Set vl by immediate value under the assumption of vlen=512. Such as size_t vl = 8; in gemm_tcopy_8_rvv.c, line 84.

In addition to above, the registers tiling in gemm of different vlen should be considered. Now we set GEMM_UNROLL_N_SHIFT 8, which may waste other vector registers. 12 or 14 may be better?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions