risc-v vector v1.0 support

The discussions in #4049 inspire me to creat an issue for further discussions.
Differ from commercial ISAs, which have a clear development plan, the total amount of products supporting rvv may be large. Optimization for all individual products may lead to code bloat, and is contrary to the purpose of the vector isa, which is expected to be length-adaptive.
Until now the intrinsic spec of rvv 1.0 is stable enough to develop codes, and the support of rvv 1.0 has been fully submitted to openblas, based on sifive x280, an in-order cpu with vlen=512.
Would it be better to do more development, based on this x280 version? The final destination may be the compatibility in different vlen, instruction execution order, tail/mask policy. Of course the pursuing of compatibility may lead to suboptimum performance, a balance have to be considered.
There are some cpu specified features in kernels of x280 and may lead to incorrect results in other cpus. List as following

1. Architecture specified cflags, such as `-riscv-v-vector-bits-min=512` and `-ffast-math`.
2. Changing vl in a loop, leading to tail cleared without tail undisturbed setted. Such as `vl = VSETVL(k);` in symv_L_rvv.c, line 96. 
3. Set vl by immediate value under the assumption of vlen=512. Such as `size_t vl = 8;` in gemm_tcopy_8_rvv.c, line 84.

In addition to above, the registers tiling in gemm of different vlen should be considered. Now we set `GEMM_UNROLL_N_SHIFT 8`, which may waste other vector registers. 12 or 14 may be better?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

risc-v vector v1.0 support #4050

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

risc-v vector v1.0 support #4050

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions