Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"About" text is misleading, should specify that this is an NVidia-only repository #58

Open
NeedsMoar opened this issue Aug 26, 2023 · 0 comments

Comments

@NeedsMoar
Copy link

Efficient GPU kernels for block-sparse matrix multiplication and convolution

I clicked on this thinking it was a general library, maybe OpenCL, scrolled down, and got a bit peeved.
The only code here is written in non-portable CUDA and non-portable GPU assembly; NVidia cards are required unless you do HIP conversion which is no longer necessarily the most efficient kernel.

I might be getting one of their workstation cards later this year to get the best of both worlds, but NVidia aren't the only GPUs; for general purpose compute the current 170% more expensive model gets beat by a 7900XTX. I have no brand preference... actually if the drivers don't clash I plan on having an Arc A770 and an A6000 in this machine alongside hte 7900XTX by the end of the year to get the best of everything for 3D rendering or use the low power Arc for inference since it's as fast as the 7900XTX (with the XTX under the fastest way of running anything, Shark) and both are faster than the A6000 for that, but the NVidia should render some scenes faster and will probably still be the easiest way to do local training given how many libraries assume cuda and how long it will take them to make the slight changes required to fix that. Anyway my point is, tagging this correctly as "Efficient NVidia CUDA / assembly kernels for..." would be the user-friendly thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant