Update ROCm docker container to 7.2 by superm1 · Pull Request #19418 · ggml-org/llama.cpp

superm1 · 2026-02-07T14:35:36Z

Update all the CI artifacts and jobs to use ROCm 7.2.

When testing offline, I found a problem with rocWMMA on the docker container with GFX908, so it's disabled for that.

CISC · 2026-02-07T18:08:53Z

There have been several issues submitted about newer versions of ROCm, including rocWMMA, not sure we are ready to there. @IMbackK ?

superm1 · 2026-02-07T20:33:03Z

How important is gfx908 to CI and to release artifacts?

I believe there is something wrong with rocWMMA specifically with gfx908 and some intrinsic types. Right now what my PR does is disables rocWMMA entirely in that container while building. Another option is to disable gfx908.

IMbackK · 2026-02-07T21:02:43Z

What problem specifically? There is no problem on my side with gfx908 on rocwmma 2.2.0 (which ships with 7.2).
rocwmma 2.0.0 (rocm 7.0 - 7.1) for fp16 output dataypes was broken at compile time on all gfx9 devices rocwmma supports, ie gfx908-gfx942, not gfx908 specifically.

superm1 · 2026-02-07T21:10:23Z

Here's a snippet of what I was seeing that lead to what I put in this PR. I guess it's some issues with template instantiation.

[ 56%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
In file included from /app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:35:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_config.hpp:32:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:32:
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: (skipping 1 context in backtrace; use -ftemplate-backtrace-limit=0 to see all)
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 2>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:627:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, __half>' requested here
  627 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx908.
gmake[2]: *** [ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/build.make:335: ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/fattn-wmma-f16.cu.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....

IMbackK · 2026-02-07T21:12:10Z

Right this is yet another bug in the fp16 downcast code in rocwmma.
It affects gfx908-gfx942 and is an upstream issue

IMbackK · 2026-02-07T21:12:48Z

we cant disable this for all of cdna so this pr is non-viable until its fixed upstream or we find a workaround.

superm1 · 2026-02-07T21:13:27Z

Would you be open to some macros to force the data types?

IMbackK · 2026-02-07T21:14:21Z

sure ofc

IMbackK · 2026-02-08T08:35:48Z

this pr is blocked by #19269 (comment)

superm1 · 2026-02-08T15:34:17Z

Thanks for that and pointing out the upstream bug. I think that we can split this into two parts if you agree.

Updating the docker container to 7.2.
Adding release artifacts based upon 7.2 for unaffected architectures

I split off the second part to #19433.

superm1 · 2026-02-09T14:54:20Z

Would you be open to some macros to force the data types?

sure ofc

I came up with a workaround that seems to work for me with 7.2 to explicitly declare the types. I split it off to #19461.

IMbackK · 2026-02-09T18:12:31Z

Please remove cc5a595, otherwise this is fine to merge after #19461 lands.

superm1 · 2026-02-09T20:03:30Z

Thanks; dropped that commit.

IMbackK

Otherwise lgtm and should be ready now.

CISC · 2026-02-13T08:15:15Z

@superm1 Please provide a successful build on your own fork to verify.

superm1 · 2026-02-13T12:00:37Z

Hi,

I did a successful build of this locally.

superm1 · 2026-02-13T15:21:26Z

I had to hack up the build targets to get it to run (I don't have a workflow dispatch or cron target in a fork), but here is a successful run on my fork.

https://github.com/superm1/llama.cpp/actions/runs/21989365518/job/63531927483

superm1 · 2026-02-19T19:31:15Z

Can this merge now?

CISC · 2026-02-19T19:33:51Z

Can this merge now?

I'd like a final approval by @IMbackK first.

CISC · 2026-02-19T19:35:50Z

@superm1 Also, why are you building more arches here than in the release?

superm1 · 2026-02-19T19:48:32Z

@superm1 Also, why are you building more arches here than in the release?

That's a good point. Let me pare it down.

IMbackK · 2026-02-19T20:53:21Z

you removed <gfx908 as i mentioned in #19418 (comment) that makes sense as rocm in these docker images is not built for those. But now you also removed gfx1030 for which the docker image is certainly built and gfx1010 and gfx1032 which i need to check (pulling the image right now)

IMbackK · 2026-02-19T21:11:54Z

Looks like they dont build for gfx1010 and 32 anymore so just gfx1030 is missing here and in the other pr.

CISC · 2026-02-20T18:38:14Z

@IMbackK gentle ping, waiting for your approval. :)

IMbackK

Looks good, nit can be addressed or not - its not terribly important.

Also update architectures

superm1 requested a review from ngxson as a code owner February 7, 2026 14:35

github-actions Bot added the devops improvements to build systems and github actions label Feb 7, 2026

superm1 requested a review from CISC as a code owner February 7, 2026 15:17

superm1 force-pushed the docker-rocm branch 6 times, most recently from 21b0a0f to 373a1c3 Compare February 7, 2026 17:51

superm1 force-pushed the docker-rocm branch from 373a1c3 to 97738c5 Compare February 9, 2026 15:05

superm1 force-pushed the docker-rocm branch from 97738c5 to 000126b Compare February 9, 2026 20:03

superm1 mentioned this pull request Feb 9, 2026

Add a build target to generate ROCm artifacts using ROCm 7.2 #19433

Merged

IMbackK approved these changes Feb 12, 2026

View reviewed changes

Comment thread .github/workflows/build.yml Outdated

IMbackK requested changes Feb 12, 2026

View reviewed changes

Comment thread .devops/rocm.Dockerfile Outdated

superm1 force-pushed the docker-rocm branch from 000126b to 88d5e5c Compare February 13, 2026 05:07

superm1 requested a review from IMbackK February 13, 2026 05:08

superm1 force-pushed the docker-rocm branch from 88d5e5c to 9ac5e4e Compare February 19, 2026 19:50

superm1 force-pushed the docker-rocm branch from 9ac5e4e to 20713e9 Compare February 19, 2026 22:02

IMbackK approved these changes Feb 21, 2026

View reviewed changes

Comment thread .devops/rocm.Dockerfile Outdated

CISC approved these changes Feb 21, 2026

View reviewed changes

Update ROCm docker container to 7.2 release

d8fea45

Also update architectures

superm1 force-pushed the docker-rocm branch from 20713e9 to d8fea45 Compare February 21, 2026 19:45

CISC merged commit 3571565 into ggml-org:master Feb 21, 2026
2 checks passed

liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

88e89cc

Also update architectures

bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

7e8fbea

Also update architectures

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

fce7eb1

Also update architectures

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

0c89ba3

Also update architectures

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

09459b4

Also update architectures

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

3b05c72

Also update architectures

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

e785884

Also update architectures

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

2b529f8

Also update architectures

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

Update ROCm docker container to 7.2 release (ggml-org#19418)

c4aa6f1

Also update architectures

Conversation

superm1 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Feb 7, 2026

Uh oh!

superm1 commented Feb 7, 2026

Uh oh!

IMbackK commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superm1 commented Feb 7, 2026

Uh oh!

IMbackK commented Feb 7, 2026

Uh oh!

IMbackK commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superm1 commented Feb 7, 2026

Uh oh!

IMbackK commented Feb 7, 2026

Uh oh!

IMbackK commented Feb 8, 2026

Uh oh!

superm1 commented Feb 8, 2026

Uh oh!

superm1 commented Feb 9, 2026

Uh oh!

IMbackK commented Feb 9, 2026

Uh oh!

superm1 commented Feb 9, 2026

Uh oh!

IMbackK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CISC commented Feb 13, 2026

Uh oh!

superm1 commented Feb 13, 2026

Uh oh!

superm1 commented Feb 13, 2026

Uh oh!

superm1 commented Feb 19, 2026

Uh oh!

CISC commented Feb 19, 2026

Uh oh!

CISC commented Feb 19, 2026

Uh oh!

superm1 commented Feb 19, 2026

Uh oh!

IMbackK commented Feb 19, 2026

Uh oh!

IMbackK commented Feb 19, 2026

Uh oh!

CISC commented Feb 20, 2026

Uh oh!

IMbackK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

superm1 commented Feb 7, 2026 •

edited

Loading

IMbackK commented Feb 7, 2026 •

edited

Loading

IMbackK commented Feb 7, 2026 •

edited

Loading