feature: Add fastlanes bit unpacking cuda kernels#6145
Conversation
|
shall we have this in -cuda for now? |
|
We should not, we should move things from vortex-cuda, there's too many things there |
Merging this PR will degrade performance by 82.84%
|
|
Can we have a test running a kernel? |
8714525 to
5526c3a
Compare
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
5526c3a to
4e238c3
Compare
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
I'm happy to big bang the whole thing in one pr or we can merge this as an intermediary step that generates the cuda kernels
Signed-off-by: Robert Kruszewski github@robertk.io