Skip to content

support OpenBLAS on the SiFive X280 which is RISC-V Vector 1.0 compliant. #3825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

HellerZheng
Copy link
Contributor

Some of the improvements with this RVV implementation:

  • Support vector ABS
  • Support segment load/stores for complex data types, for better performance.
  • Alternative (in our view better) implementation choices. Rather than loops using a predetermined vector length (VL), and a subsequent tail, we found that determining VL within the loop was actually more efficient (in addition to cleaner/smaller code).
  • Generally faster for most kernels.
  • Significantly faster version of GEMM, modeled after what was done for SVE (8xVLEN).
  • All implementations using vector intrinsics, even the faster GEMM. No assembler required.

@brada4
Copy link
Contributor

brada4 commented Nov 24, 2022

very elegant where incx=1 has one codepath and others have other.

@martin-frbg
Copy link
Collaborator

This appears to require a vendor gcc right now ?

@brada4
Copy link
Contributor

brada4 commented Nov 24, 2022

yup, latest version only

@ken-unger
Copy link

RISC-V vector intrinsic support comes with recent versions of clang/llvm. The vendor toolchain is solely needed for some additional compiler flags within makefile.riscv64 specific to the X280 target, and should not prevent anyone from using the RVV kernel support provided by this PR.

@brada4 the separate code path for incx==1 is needed since the performance of unity load/store is significantly better than for strided access. At least in our experience. It would appear that this was also the experience of the C910V folks although we tried to implement throughout.

Just an added note that this PR is being contributed by Microchip, where both Heller and I am employed.

Hope this helps.

@martin-frbg
Copy link
Collaborator

The reason I'm asking is that I do not see the -mcpu=sifive-x280 documented in either LLVM or GCC - so this (and potentially the other flags as well) would seem to force one to use a particular compiler for the X280 target, even though "generic" RVV1.0 support is more widely available. Perhaps the uncommon options could be made conditional based on the compiler version, like we already have various if GCCVERSIONGTEQ _someversion_ ?
(BTW is it necessary to set all these flags in Makefile.prebuild as well, which "only" builds the getarch tools for cpu detection and parameter retrieval ?)

@brada4
Copy link
Contributor

brada4 commented Nov 25, 2022

@ken-unger no pun intended, likely some generic implementations would compile to more efficient code if such healthy ideas are ported over.

I found no defects compiling with freedom-tools built off github, nor today with clang15 and debian11 rv64 glibc. I trust you that builtin tests completed fine.

@ken-unger
Copy link

Good point @martin-frbg. We will take a look and update. Also I see there is a merge issue in common_riscv64.h which we need to fix.

@brada4
Copy link
Contributor

brada4 commented Nov 26, 2022

I just striked out the mcpu parameter for clang15, probably
if gcc 10???? contains sifive -> all options
clang >=15 -> vector code without tuning (there is manual vector minimal size paramete - any will do for building, you have CPU to measure/know)

other -> generic

@ken-unger
Copy link

@martin-frbg we made your suggested changes to the PR and fixed the merge error I noted. As @brada4 pointed out there was no need, in this case, for the proprietary flags. We hope this is sufficient for you to approve.

@xianyi
Copy link
Collaborator

xianyi commented Dec 3, 2022

I will merge this patch into risc-v branch at first.

@xianyi xianyi closed this Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants