cuBLAS: refactor and optimize f16 mat mul performance (#1259) · LostRuins/koboldcpp@58b367c · GitHub

Commit

cuBLAS: refactor and optimize f16 mat mul performance (ggerganov#1259)

Browse files

* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1

Loading branch information

slaren authored May 1, 2023

1 parent ea3a0ad commit 58b367c

0 comments on commit `58b367c`

Please sign in to comment.