Skip to content

Commit

Permalink
cuBLAS: refactor and optimize f16 mat mul performance (ggerganov#1259)
Browse files Browse the repository at this point in the history
* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1
  • Loading branch information
slaren authored May 1, 2023
1 parent ea3a0ad commit 58b367c
Show file tree
Hide file tree
Showing 4 changed files with 479 additions and 258 deletions.
Loading

0 comments on commit 58b367c

Please sign in to comment.