Skip to content

[RVV] rework rvv qs8-gemm/qs8-igemm generators#9639

Open
ken-unger wants to merge 3 commits intogoogle:masterfrom
ken-unger:qd8-f16-gemm-rvv
Open

[RVV] rework rvv qs8-gemm/qs8-igemm generators#9639
ken-unger wants to merge 3 commits intogoogle:masterfrom
ken-unger:qd8-f16-gemm-rvv

Conversation

@ken-unger
Copy link
Contributor

A large PR but a small number of notable changes. Primarily this is a rework of the rvv qs8-gemm and qs8-igemm generators to clean up past sins and prep for future updates.

  • fixed the kernel generation to properly reflect the output datatype in the vector length (qs8 and qd8-f16 kernels). This is a mostly cosmetic (but proper) update that was flagged by fbarchard on one of my PRs last year.
  • add qu8-gemm/igemm
  • add qd8-f16-qc8w-gemm/igemm
  • add qd8-f16-qc4w-gemm

Tested on qemu-riscv64 and bpi-f3. (I'll submit a separate PR for the qemu option used for rvv fp16)

In future, qc2w and other variants will be added. Additionally, I hope to add support for vqdot.[vv,vx], although likely using a separate generator.

@@ -4,31 +4,40 @@
// This source code is licensed under the BSD-style license found in the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary change in this PR is this file and the qs8-igemm version.

  • use the output datatype as the LMUL (and reflected in the filename)
  • use overloaded intrinsics where possible, which is a lot cleaner
  • add QU8, QC4_F32, QC4_F16
  • other cleanup

"f32": "float",
}[input_datatype]
)
nr_type = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change (rvv specific) is paired with the change to the qs8-gemm/igemm generator. Makes more sense now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant