Skip to content

CUDA-dyn dispatch: reduce generated assembly #6656

@0ax1

Description

@0ax1

Due to the inlined bitpacking kernel the amount of generated assembly is huge for the dynamic dispatch kernel: > 100k lines. Besides slow compilation, this also makes the kernel slow to load at runtime.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions