Skip to content

Commit 66691c3

Browse files
committed
Add ROCm header support for sparse Marlin MMA implementation
Include necessary ROCm-specific headers for HIP runtime and half-precision operations, with comments addressing potential compiler and architecture considerations for AMD GPU platforms.
1 parent 30bd924 commit 66691c3

File tree

1 file changed

+10
-2
lines changed
  • torchao/csrc/cuda/sparse_marlin

1 file changed

+10
-2
lines changed

torchao/csrc/cuda/sparse_marlin/mma.h

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,21 @@
2222
#include <cudaTypedefs.h>
2323
#endif
2424

25+
#ifdef USE_ROCM
26+
#include <hip/hip_runtime.h>
27+
#include <hip/hip_fp16.h>
28+
#include <device_functions.h> // For some ROCm versions
29+
// Some intrinsics might require the compiler to be in the right mode
30+
// with the correct target architecture flags (-march=gfx942)
31+
#endif
32+
2533
namespace torchao {
2634

2735
// On CUDA earlier than 12.5, the ordered_metadata version of this instruction
2836
// is not supported. On later versions of CUDA the version without ordered
2937
// metadata results in the following warning:
30-
// | Advisory: Modifier 'sp::ordered_metadata' should be used on instruction
31-
// | 'mma' instead of modifier 'sp' as it is expected to have substantially
38+
// | Advisory: Modifier ‘.sp::ordered_metadata should be used on instruction
39+
// | mma instead of modifier ‘.sp’ as it is expected to have substantially
3240
// | reduced performance on some future architectures
3341

3442
#if defined(USE_ROCM)

0 commit comments

Comments
 (0)