Open
Description
When checking out the conclusion of the tutorial for 2:4 sparsity here, the claimed advantage of 2:4 sparsity over dense execution is given as 1.3x-2.0x. However, when checking the actual values that are output in the dense and sparse section terminal sections we get the following table:
bs | compile | Dense | Sparse | Speedup |
---|---|---|---|---|
4 | n | 9.56 | 16.77 | 0.57x |
4 | y | 8.98 | 9.49 | 0.95x |
16 | n | 31.86 | 62.27 | 0.51x |
16 | y | 30.83 | 34.29 | 0.90x |
64 | n | 123.97 | 243.16 | 0.51x |
64 | y | 104.98 | 133.49 | 0.79x |
256 | n | 476.03 | 1195.23 | 0.40x |
256 | y | 397.13 | 542.3 | 0.73x |
As can be seen, the sparse matrix computation does not beat the dense one even once. I rerun these experiments with torch 2.5.1+cu2.4 on a single H100 and observed similar results.
How come the values are this much worse?
Metadata
Metadata
Assignees
Labels
No labels