-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLAS: non-contiguous tensor support #1215
Conversation
OK, now I have to merge with the CL code, technically the same kind of trick could work there with |
Perplexity run done on the 2080 TI: [655]6.2838,
(the output is weird because of the Slurm thing on the machine I have access to) Without this change: 3.16 seconds per pass, 6.01 ms per token |
These ifdefs are kicking my ass |
should now build OpenBLAS, CLBlast, cuBLAS, nothingBLAS |
This will allow cuBLAS to multiply tensors that are not contiguous in the row or column (I don't think llama has that situation) level by using cudaMemcpy2dAsync.
Testing perplexity right now on a 2080 TI.