Skip to content

Update ROCm docker container to 7.2#19418

Merged
CISC merged 1 commit into
ggml-org:masterfrom
superm1:docker-rocm
Feb 21, 2026
Merged

Update ROCm docker container to 7.2#19418
CISC merged 1 commit into
ggml-org:masterfrom
superm1:docker-rocm

Conversation

@superm1

@superm1 superm1 commented Feb 7, 2026

Copy link
Copy Markdown
Contributor

Update all the CI artifacts and jobs to use ROCm 7.2.

When testing offline, I found a problem with rocWMMA on the docker container with GFX908, so it's disabled for that.

@superm1 superm1 requested a review from ngxson as a code owner February 7, 2026 14:35
@github-actions github-actions Bot added the devops improvements to build systems and github actions label Feb 7, 2026
@superm1 superm1 requested a review from CISC as a code owner February 7, 2026 15:17
@superm1 superm1 force-pushed the docker-rocm branch 6 times, most recently from 21b0a0f to 373a1c3 Compare February 7, 2026 17:51
@CISC

CISC commented Feb 7, 2026

Copy link
Copy Markdown
Member

There have been several issues submitted about newer versions of ROCm, including rocWMMA, not sure we are ready to there. @IMbackK ?

@superm1

superm1 commented Feb 7, 2026

Copy link
Copy Markdown
Contributor Author

How important is gfx908 to CI and to release artifacts?

I believe there is something wrong with rocWMMA specifically with gfx908 and some intrinsic types. Right now what my PR does is disables rocWMMA entirely in that container while building. Another option is to disable gfx908.

@IMbackK

IMbackK commented Feb 7, 2026

Copy link
Copy Markdown
Collaborator

What problem specifically? There is no problem on my side with gfx908 on rocwmma 2.2.0 (which ships with 7.2).
rocwmma 2.0.0 (rocm 7.0 - 7.1) for fp16 output dataypes was broken at compile time on all gfx9 devices rocwmma supports, ie gfx908-gfx942, not gfx908 specifically.

@superm1

superm1 commented Feb 7, 2026

Copy link
Copy Markdown
Contributor Author

Here's a snippet of what I was seeing that lead to what I put in this PR. I guess it's some issues with template instantiation.

[ 56%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
In file included from /app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:35:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_config.hpp:32:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:32:
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: (skipping 1 context in backtrace; use -ftemplate-backtrace-limit=0 to see all)
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 2>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:627:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, __half>' requested here
  627 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx908.
gmake[2]: *** [ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/build.make:335: ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/fattn-wmma-f16.cu.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....

@IMbackK

IMbackK commented Feb 7, 2026

Copy link
Copy Markdown
Collaborator

Right this is yet another bug in the fp16 downcast code in rocwmma.
It affects gfx908-gfx942 and is an upstream issue

@IMbackK

IMbackK commented Feb 7, 2026

Copy link
Copy Markdown
Collaborator

we cant disable this for all of cdna so this pr is non-viable until its fixed upstream or we find a workaround.

@superm1

superm1 commented Feb 7, 2026

Copy link
Copy Markdown
Contributor Author

Would you be open to some macros to force the data types?

@IMbackK

IMbackK commented Feb 7, 2026

Copy link
Copy Markdown
Collaborator

sure ofc

@IMbackK

IMbackK commented Feb 8, 2026

Copy link
Copy Markdown
Collaborator

this pr is blocked by #19269 (comment)

@superm1

superm1 commented Feb 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for that and pointing out the upstream bug. I think that we can split this into two parts if you agree.

  1. Updating the docker container to 7.2.
  2. Adding release artifacts based upon 7.2 for unaffected architectures

I split off the second part to #19433.

@superm1

superm1 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor Author

Would you be open to some macros to force the data types?

sure ofc

I came up with a workaround that seems to work for me with 7.2 to explicitly declare the types. I split it off to #19461.

@IMbackK

IMbackK commented Feb 9, 2026

Copy link
Copy Markdown
Collaborator

Please remove cc5a595, otherwise this is fine to merge after #19461 lands.

@superm1

superm1 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor Author

Thanks; dropped that commit.

@IMbackK IMbackK left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm and should be ready now.

Comment thread .github/workflows/build.yml Outdated
Comment thread .devops/rocm.Dockerfile Outdated
@CISC

CISC commented Feb 13, 2026

Copy link
Copy Markdown
Member

@superm1 Please provide a successful build on your own fork to verify.

@superm1

superm1 commented Feb 13, 2026

Copy link
Copy Markdown
Contributor Author

Hi,

I did a successful build of this locally.

@superm1

superm1 commented Feb 13, 2026

Copy link
Copy Markdown
Contributor Author

I had to hack up the build targets to get it to run (I don't have a workflow dispatch or cron target in a fork), but here is a successful run on my fork.

https://github.com/superm1/llama.cpp/actions/runs/21989365518/job/63531927483

@superm1

superm1 commented Feb 19, 2026

Copy link
Copy Markdown
Contributor Author

Can this merge now?

@CISC

CISC commented Feb 19, 2026

Copy link
Copy Markdown
Member

Can this merge now?

I'd like a final approval by @IMbackK first.

@CISC

CISC commented Feb 19, 2026

Copy link
Copy Markdown
Member

@superm1 Also, why are you building more arches here than in the release?

@superm1

superm1 commented Feb 19, 2026

Copy link
Copy Markdown
Contributor Author

@superm1 Also, why are you building more arches here than in the release?

That's a good point. Let me pare it down.

@IMbackK

IMbackK commented Feb 19, 2026

Copy link
Copy Markdown
Collaborator

you removed <gfx908 as i mentioned in #19418 (comment) that makes sense as rocm in these docker images is not built for those. But now you also removed gfx1030 for which the docker image is certainly built and gfx1010 and gfx1032 which i need to check (pulling the image right now)

@IMbackK

IMbackK commented Feb 19, 2026

Copy link
Copy Markdown
Collaborator

Looks like they dont build for gfx1010 and 32 anymore so just gfx1030 is missing here and in the other pr.

@CISC

CISC commented Feb 20, 2026

Copy link
Copy Markdown
Member

@IMbackK gentle ping, waiting for your approval. :)

@IMbackK IMbackK left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nit can be addressed or not - its not terribly important.

Comment thread .devops/rocm.Dockerfile Outdated
@CISC CISC merged commit 3571565 into ggml-org:master Feb 21, 2026
2 checks passed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants