You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Efficient GPU kernels for block-sparse matrix multiplication and convolution
I clicked on this thinking it was a general library, maybe OpenCL, scrolled down, and got a bit peeved.
The only code here is written in non-portable CUDA and non-portable GPU assembly; NVidia cards are required unless you do HIP conversion which is no longer necessarily the most efficient kernel.
I might be getting one of their workstation cards later this year to get the best of both worlds, but NVidia aren't the only GPUs; for general purpose compute the current 170% more expensive model gets beat by a 7900XTX. I have no brand preference... actually if the drivers don't clash I plan on having an Arc A770 and an A6000 in this machine alongside hte 7900XTX by the end of the year to get the best of everything for 3D rendering or use the low power Arc for inference since it's as fast as the 7900XTX (with the XTX under the fastest way of running anything, Shark) and both are faster than the A6000 for that, but the NVidia should render some scenes faster and will probably still be the easiest way to do local training given how many libraries assume cuda and how long it will take them to make the slight changes required to fix that. Anyway my point is, tagging this correctly as "Efficient NVidia CUDA / assembly kernels for..." would be the user-friendly thing to do.
The text was updated successfully, but these errors were encountered:
I clicked on this thinking it was a general library, maybe OpenCL, scrolled down, and got a bit peeved.
The only code here is written in non-portable CUDA and non-portable GPU assembly; NVidia cards are required unless you do HIP conversion which is no longer necessarily the most efficient kernel.
I might be getting one of their workstation cards later this year to get the best of both worlds, but NVidia aren't the only GPUs; for general purpose compute the current 170% more expensive model gets beat by a 7900XTX. I have no brand preference... actually if the drivers don't clash I plan on having an Arc A770 and an A6000 in this machine alongside hte 7900XTX by the end of the year to get the best of everything for 3D rendering or use the low power Arc for inference since it's as fast as the 7900XTX (with the XTX under the fastest way of running anything, Shark) and both are faster than the A6000 for that, but the NVidia should render some scenes faster and will probably still be the easiest way to do local training given how many libraries assume cuda and how long it will take them to make the slight changes required to fix that. Anyway my point is, tagging this correctly as "Efficient NVidia CUDA / assembly kernels for..." would be the user-friendly thing to do.
The text was updated successfully, but these errors were encountered: