Skip to content

update riscv intrinsics for latest spec #3919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2023

Conversation

sergei-lewis
Copy link
Contributor

  • update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, _riscv prefixes for rvv intrinsics)
  • fix multiple numerical stability and corner case issues
  • add a script to generate arbitrary gemm kernel shapes
  • add a generic zvl256b target to demonstrate large gemm kernel unrolls

…non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)

* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Feb 27, 2023

(CI failure appears to be a build/environment issue unrelated to this patch - I note current head for this branch is failing with the same issue)

@brada4
Copy link
Contributor

brada4 commented Mar 1, 2023

Can you take dot.c fix into separate PR - that file is duplicated around tree and will need deduplication of sorts. Not obligation to split, but merely better to have reminder to "look at" others.

CI error is with flang dependencies, not related to your code as you duly noticed.

@sergei-lewis
Copy link
Contributor Author

dropped dot.c from this PR as requested :)

@martin-frbg
Copy link
Collaborator

Well, copying the dot.c fix into the other incarnations of dot.c would make sense, but I guess it is not critical as the subsequent while loop never executes in the n=0 case anyway. Only kernel/generic/dot.c does some spurious initializations when universal intrinsics are available.

@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Mar 1, 2023

Note kernel/riscv64/dot_vector.c has the equivalent fix, and it saves some actual work there.

In general, my thinking is that people begin work on new ports by copy/pasting existing code, so having existing code just always bail early for n=0 will save a bunch of time working out whether that corner case is sane in any new code even when it has no direct benefit to existing code. If people don't reckon it's worth it, though, I'll not bother :)

@brada4
Copy link
Contributor

brada4 commented Mar 1, 2023

Actually should be ../generic/dot.c if there is nothing added. Not your fault exposing problem.

@martin-frbg
Copy link
Collaborator

Would you like to prepare a PR for the dot.c fix as will ? I assume that was what brada4 intended when he suggested to remove it from this PR. (No problem if you do not have the time, but I wanted to ask as the fix was yours)

@sergei-lewis
Copy link
Contributor Author

Wasn't clear if the consensus was to do this or not bother! PR sent, #3928

@brada4
Copy link
Contributor

brada4 commented Mar 2, 2023

Thank you @sergei-lewis !!!

@xianyi
Copy link
Collaborator

xianyi commented Mar 15, 2023

OK. It looks like to merge this PR

@xianyi xianyi merged commit 20511df into OpenMathLib:risc-v Mar 15, 2023
@OMaghiarIMG
Copy link
Contributor

Hello @sergei-lewis, I noticed that the zvl256b kernels you added say they are autogenerated, wondering if there is an available tool for generating these kernels?
Thanks!

@sergei-lewis
Copy link
Contributor Author

@OMaghiarIMG yes, it's committed alongside everything else: https://github.com/xianyi/OpenBLAS/blob/risc-v/kernel/riscv64/generate_kernel.py

@OMaghiarIMG
Copy link
Contributor

@OMaghiarIMG yes, it's committed alongside everything else: https://github.com/xianyi/OpenBLAS/blob/risc-v/kernel/riscv64/generate_kernel.py

Hi, @sergei-lewis, I got around to looking through the script and it's quite great!
One thing I did notice is the way LMUL is used, it seems to increase the number of accumulation registers, not the number of vector elements. Basically this way the maximum vl used is only going to be the VLMAX for LMUL=1, no matter what LMUL is actually used.

Also there is a line that says 'conjugate gemm not yet supported' - are you still planning to work on this?

@martin-frbg
Copy link
Collaborator

It occurs to me that this PR renamed some of the instances of the target name "x280" to "riscv64_zvl256b" where it claimed to add the latter - is everybody OK with that (including @HellerZheng who contributed the initial x280 support) ? (And also there appears to be no attempt at autodetecting the cpu name/model - but from what I've seen so far, lack of CPUID may be a feature of current RISCV ?)

@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Dec 13, 2023 via email

@martin-frbg
Copy link
Collaborator

Thank you - I think your ZVL256B target would need adding to getarch.c (the FORCE_cpuname block) and TargetList.txt unless I overlooked it there.

@SokolovAndrey1
Copy link

Hi @sergei-lewis!

I noticed that this PR changes kernel files with _vector postfix, which are used to build C910V target.
C910V is RVV 0.7.1 target and, AFAIK, should be build with Xuantie toolchain (as it done in CI for develop branch). But latest Xuantie toolchain version 2.8.0 doenst support 0.11.0 or 0.12.0 RISC-V Vector intrinsics spec.

Did you build C910V target with these changes? For me now it looks impossible, since GEMM kernels for C910V are written in inline ASM, which obliges compiler to support RVV 0.7.1 version. But I don't know compiler that supports RVV 0.7.1 and 0.11.0 or 0.12.0 RISC-V Vector intrinsics spec.

Thanks!

@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Dec 22, 2023 via email

@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Dec 22, 2023 via email

@SokolovAndrey1
Copy link

Hi Andrey, This PR was tested for the C910V target with version 2.6.1 of the Xuantie toolchain, as well as for rvv 1.0 targets supported by ToT gcc/clang/qemu/spike, at the time of submission. Please ensure that CORE is set to C910V when compiling for C910V. This selects version 0.7 of the vector spec by passing -march=rv64imafdcv0p7_zfh_xtheadc to the compiler (the v0p7 part is what controls this; cf. Makefile.riscv64 in project root).

Thanks to quick reply!

Yes, I tried to build with 2.6.1 of Xuantie toolchain. The problem is that it doesn't seem to support intrinsics with the __risсv prefix.

Build command:

make all ARCH=riscv64 TARGET=C910V CC=/home/andrey/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1/bin/riscv64-unknown-linux-gnu-gcc FC=/home/andrey/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1/bin/riscv64-unknown-linux-gnu-gfortran HOSTCC=gcc

For risc-v branch with __risсv prefix (one of many errors):

/home/andrey/Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.6.1/bin/riscv64-unknown-linux-gnu-gcc -c -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DNO_WARMUP -DMAX_CPU_NUMBER=8 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.21.dev\" -march=rv64imafdcv0p7_zfh_xtheadc -mabi=lp64d -mtune=c920 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=ismin_k -DASMFNAME=ismin_k_ -DNAME=ismin_k_ -DCNAME=ismin_k -DCHAR_NAME=\"ismin_k_\" -DCHAR_CNAME=\"ismin_k\" -DNO_AFFINITY -I.. -UDOUBLE  -UCOMPLEX -UCOMPLEX -UDOUBLE -UUSE_ABS  -DUSE_MIN ../kernel/riscv64/imin_vector.c -o ismin_k.o

...

../kernel/riscv64/imin_vector.c:77:15: warning: implicit declaration of function '__riscv_vmv_x_s_u32m8_u32'; did you mean 'vmv_x_s_u32m8_u32'? [-Wimplicit-function-declaration]
   77 | #define VMV_X __riscv_vmv_x_s_u32m8_u32
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
../kernel/riscv64/imin_vector.c:116:29: note: in expansion of macro 'VMV_X'
  116 |                 min_index = VMV_X(compressed);
      |                             ^~~~~
../kernel/riscv64/imin_vector.c:60:20: error: incompatible types when assigning to type 'vfloat32m8_t' from type 'int'
   60 | #define VLEV_FLOAT __riscv_vle32_v_f32m8
      |                    ^~~~~~~~~~~~~~~~~~~~~
../kernel/riscv64/imin_vector.c:120:33: note: in expansion of macro 'VLEV_FLOAT'
  120 |                         v_min = VLEV_FLOAT(&x[j], gvl);
      |                                 ^~~~~~~~~~
../kernel/riscv64/imin_vector.c:62:26: error: incompatible types when assigning to type 'vfloat32m1_t' from type 'int'
   62 | #define VFREDMINVS_FLOAT __riscv_vfredmin_vs_f32m8_f32m1
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../kernel/riscv64/imin_vector.c:122:33: note: in expansion of macro 'VFREDMINVS_FLOAT'
  122 |                         v_res = VFREDMINVS_FLOAT(v_min, v_res, gvl);
      |                                 ^~~~~~~~~~~~~~~~

For develop branch without __risсv prefix its ok, build successful.

I haven't looked with intrinsics version trunk gcc\clang supports for RVV 0.7, but I think С910 target is expected to be builded using Xuantie toolchain.
So you can clarify how you tested this PR (with __riscv) with Xuantie toolchain v2.6.1? Am I missing something?

@sergei-lewis
Copy link
Contributor Author

sergei-lewis commented Dec 22, 2023 via email

@sergei-lewis
Copy link
Contributor Author

Hi all,

I've submitted PR 4439 which fixes both the above issues (building with toolchains that don't support the release candidate vector intrinsics spec; x280 target getting dropped from cpuid_riscv64.c)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants