Commit f5ef5cf
authored
ggml-cuda : perform cublas mat mul of quantized types as f16 (ggml-org#3412)
* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16
* rename CC_TURING to CC_VOLTA
* disable fp16 mat mul completely with multi GPU1 parent 40e07a6 commit f5ef5cf
1 file changed
+122
-72
lines changed
0 commit comments