CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling #3110
+559
−595
We went looking everywhere, but couldn’t find those commits.
Sometimes commits can disappear after a force-push. Head back to the latest changes here.