Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cuBLAS: refactor and optimize f16 mat mul performance (ggerganov#1259)
* cuBLAS: refactor, convert fp16 to fp32 on device * cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16 * fix build * cuBLAS: update block_q5_1
- Loading branch information