Add a build target to generate ROCm artifacts using ROCm 7.2 by superm1 · Pull Request #19433 · ggml-org/llama.cpp

superm1 · 2026-02-08T15:32:38Z

This builds the following targets:

gfx1151
gfx1150
gfx1200
gfx1201
gfx1100
gfx1101
gfx908
gfx90a
gfx942

IMbackK · 2026-02-08T20:51:01Z

@superm1 i would lean towards simply generating the release with 7.1 and including all targets, while rocm unfortunately dose not have a stable abi in practice - the 7.1 compile time + 7.2 runtime combination works fine.

Unless you know of an issue i am not aware with with some non cdna target in 7.1.

superm1 · 2026-02-08T22:15:48Z

Yes, there is an incompatibility with mainline kernel on 7.1 for Strix Halo.

It's a long story, but it would be much better to release 7.2 based artifacts.

In this case runtime and compile time are identical because the rocm libraries are added into the artifact not coming from OS.

IIIIIllllIIIIIlllll · 2026-02-09T01:48:31Z

ROCm/rocm-systems#2865 (comment)

Just a heads-up, ROCm 7.2 currently has some performance issues, and I'm not sure if they've been fixed.

superm1 · 2026-02-09T02:25:29Z

Just a heads-up, ROCm 7.2 currently has some performance issues, and I'm not sure if they've been fixed.

My feeling is this is a perfect is the enemy of good situation. I say that because there are no llama.cpp artifacts right now and everyone is compiling their own thing. This gets the ball rolling for official ones and we can all keep improving them.

Yes; there is a regression reported here. It's been root caused to a compiler change and has been reverted in the develop branch but will take a bit to make it's way to a stable release.

There is a workaround right now that can be applied though that avoids it:

-mllvm --amdgpu-unroll-threshold-local=600

IMbackK · 2026-02-09T14:47:52Z

since the compiler is the problem in both cases we could also just downgrade the just compiler in the container by fetching the rocm 7.1 package and installing it.

superm1 · 2026-02-09T17:51:08Z

Is that what you would rather see (set up 7.2 container, add 7.1 repos, and downgrade compiler to the one from 7.1)?

I did come up with a workaround for the fp16 issue if you would rather go that way: #19461

IMbackK · 2026-02-09T18:10:19Z

No, #19461 is preferable.

slojosic-amd · 2026-02-11T20:03:22Z

@superm1 @IMbackK we should add -DCMAKE_HIP_FLAGS="-mllvm --amdgpu-unroll-threshold-local=600" here: https://github.com/ggml-org/llama.cpp/pull/19433/changes#diff-87db21a973eed4fef5f32b267aa60fcee5cbdf03c67fafdc2a9b553bb0b15f34R601 if we are planning to add artifacts based on legacy ROCm 7.2 release.
This additional CMake flag is fixing ROCm 7.2 perf regression described here: ROCm/rocm-systems#2865
With this workaround we don't need to downgrade compiler to the one from ROCm 7.1

superm1 · 2026-02-11T20:42:03Z

That's a great suggestion. I've modified the PR accordingly.

superm1 · 2026-02-13T18:09:59Z

Considering the comments in #19594 I have adjusted this PR to do 7.2 in the same way that 7.11 is done. That is have a single artifact.

Basically install the ROCm stack for doing the build, but don't bundle ROCm itself in the artifact. The user would be responsible for installing ROCm to use the artifact.

Here is a CI build from my fork demonstrating how it works now. The artifact is 464MB.

IMbackK · 2026-02-13T20:39:23Z

So as mentioned in #19594 i think this one is the better option.
I will take a look at the generated artifacts soon - after that i think we are good to proceed with this one.

superm1 · 2026-02-18T13:45:35Z

Hi @IMbackK can you take a look this week? As I mentioned in #19594 I do think that doing artifacts for both legacy and TheRock builds makes sense. If you agree I can close the 7.11 one and merge it into this one. Or if you would prefer to only do 7.2 this PR should be sufficient on it's own.

IMbackK · 2026-02-18T18:01:32Z

Lets go for just the official release, for one thing the builds are pretty large and the other reason being that new versions of rocm have broken things fairly often and dealing with that at the higher release cadence of therock feels not terribly appealing. Sure we could just not update what therock build we build against, but that would imo defeat the purpose of building against the preview versions at all.

Tiny nit: it would be better if the release name included the version of Ubuntu built against.

CISC · 2026-02-18T18:23:26Z

Tiny nit: it would be better if the release name included the version of Ubuntu built against.

None of the other releases have it, so it's fine for now.

IMbackK · 2026-02-18T21:44:34Z

Tiny nit: it would be better if the release name included the version of Ubuntu built against.

None of the other releases have it, so it's fine for now.

yes and i think it would be relevant for all releases since it tells you what version of glibc etc its built against.

CISC · 2026-02-18T22:07:57Z

Tiny nit: it would be better if the release name included the version of Ubuntu built against.

None of the other releases have it, so it's fine for now.

yes and i think it would be relevant for all releases since it tells you what version of glibc etc its built against.

Separate PR if anything, but not too keen on potentially breaking someone's workflow again.

CISC

Merge on @IMbackK approval.

superm1 · 2026-02-18T23:29:09Z

Merge on @IMbackK approval.

Can you trigger the rest of the CI jobs?

CISC · 2026-02-19T08:48:14Z

Merge on @IMbackK approval.

Can you trigger the rest of the CI jobs?

There are no more jobs related to release.yml, the actual release job must be tested on the fork, which I see you sort of did. Did you test the binaries?

superm1 · 2026-02-19T14:13:06Z

Yes, just tested them again this morning. This is a Ubuntu 24.04 toolbox, that I installed ROCm 7.2 into and then ran the binaries.

[supermario@toolbx llama-b8028]$ ./llama-cli -m /home/supermario/.cache/huggingface/hub/models--LiquidAI--LFM2-1.2B-GGUF/snapshots/ad410707125b58bc535ac81e21eeb84d56a5e2ee/LFM2-1.2B-Q4_K_M.gguf --ctx-size 4096  --jinja --context-shift --keep 16 --reasoning-format auto -ngl 99
ggml_cuda_init: found 1 ROCm devices:
  Device 0: Radeon 8050S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
load_backend: loaded ROCm backend from /home/supermario/Downloads/llama-bin-rocm/llama-b8028/libggml-hip.so
load_backend: loaded RPC backend from /home/supermario/Downloads/llama-bin-rocm/llama-b8028/libggml-rpc.so
load_backend: loaded CPU backend from /home/supermario/Downloads/llama-bin-rocm/llama-b8028/libggml-cpu-zen4.so

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8028-92e9e70a
model      : LFM2-1.2B-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> foo the bar

This phrase is often used as a placeholder or a part of a larger phrase or sentence. It's the beginning of "foo the bar," which could mean:

1. **A function or command name**: It could be the name of a simple program or script that performs some action (like `foo`) and then does something else (like `the bar`).
2. **A title or heading**: It might be the title of an article, a section in a document, or a part of a title.
3. **A fragment of code or instruction**: It could be a starting point for a code snippet or a programming instruction.
4. **A creative or artistic statement**: It could be used in a piece of writing or art to introduce a theme or idea.

Without more context, it's hard to determine the exact meaning, but it generally starts with "foo" to indicate a simple or generic term, and "the bar" to add more specificity or context. If you have a specific context or use case in mind, feel free to share more, and I can provide a more detailed response!

[ Prompt: 324.8 t/s | Generation: 186.9 t/s ]

> 

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]     | total    free    self   model   context   compute       unaccounted |
llama_memory_breakdown_print: |   - ROCm0 (8050S Graphics) | 12125 = 12632 + ( 878 =   694 +      48 +     136) + 17592186043030 |
llama_memory_breakdown_print: |   - Host                   |                   121 =   105 +       0 +      16                   |

IMbackK · 2026-02-19T20:55:58Z

I just noticed that gfx1030 went missing here too, probubly want to build this for gfx1030 also, since that is an architecture amd both builds for and has official support for. I need to check the release images to see if amd is currently also building for gfx1010 and gfx1031, gfx1032 which are not officially supported architectures but where built for in rocm 7.0

IMbackK · 2026-02-19T20:56:56Z

I tried the release binaries locally on gfx908 and they worked fine here, no issues on that front.

This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942

IMbackK

I have no further objections.

Side note:
Since this is bring-your-own-rocm, we could consider in the future also build for architectures not in the default rocm release that are known to work well, like gfx900, gfx906, gfx101x, gfx1031 but just the officially supported architectures for this initial pr is good.

…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942

superm1 requested a review from CISC as a code owner February 8, 2026 15:32

github-actions Bot added the devops improvements to build systems and github actions label Feb 8, 2026

superm1 mentioned this pull request Feb 8, 2026

Update ROCm docker container to 7.2 #19418

Merged

superm1 force-pushed the superm1/rocm-github-action branch from 1790896 to 8aeb553 Compare February 8, 2026 15:35

CISC requested a review from IMbackK February 8, 2026 18:09

loci-dev mentioned this pull request Feb 9, 2026

UPSTREAM PR #19433: Add a build target to generate ROCm artifacts using ROCm 7.2 auroralabs-loci/llama.cpp#1160

Open

IMbackK requested changes Feb 9, 2026

View reviewed changes

Comment thread .github/workflows/release.yml Outdated

superm1 force-pushed the superm1/rocm-github-action branch 7 times, most recently from 050b836 to 2b1e35b Compare February 11, 2026 13:26

superm1 force-pushed the superm1/rocm-github-action branch from 2b1e35b to 170f2f9 Compare February 11, 2026 20:41

superm1 mentioned this pull request Feb 13, 2026

Add a build target to generate ROCm artifacts using ROCm 7.11 #19594

Closed

superm1 force-pushed the superm1/rocm-github-action branch from 170f2f9 to 67a425d Compare February 13, 2026 18:07

superm1 force-pushed the superm1/rocm-github-action branch from 67a425d to 6559a81 Compare February 13, 2026 20:36

superm1 force-pushed the superm1/rocm-github-action branch from 6559a81 to d961293 Compare February 14, 2026 01:51

IMbackK approved these changes Feb 18, 2026

View reviewed changes

CISC reviewed Feb 18, 2026

View reviewed changes

Comment thread .github/workflows/release.yml Outdated

superm1 force-pushed the superm1/rocm-github-action branch from fb4909c to 4a1f236 Compare February 18, 2026 22:27

CISC approved these changes Feb 18, 2026

View reviewed changes

Add a build target to generate ROCm artifacts using ROCm 7.2

3493f6d

This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942

superm1 force-pushed the superm1/rocm-github-action branch from 4a1f236 to 3493f6d Compare February 19, 2026 22:04

IMbackK approved these changes Feb 21, 2026

View reviewed changes

CISC merged commit f75c4e8 into ggml-org:master Feb 21, 2026
2 checks passed

CISC mentioned this pull request Feb 21, 2026

ci : fix rocm release path #19784

Merged

superm1 mentioned this pull request Feb 22, 2026

Update Windows ROCm build to 26.Q1 #19810

Merged

slojosic-amd mentioned this pull request Feb 25, 2026

[HIP] Updates the ROCm/HIP toolchain versions used in CI pipelines #19891

Merged

Conversation

superm1 commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IMbackK commented Feb 8, 2026

Uh oh!

superm1 commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IIIIIllllIIIIIlllll commented Feb 9, 2026

Uh oh!

superm1 commented Feb 9, 2026

Uh oh!

IMbackK commented Feb 9, 2026

Uh oh!

superm1 commented Feb 9, 2026

Uh oh!

IMbackK commented Feb 9, 2026

Uh oh!

Uh oh!

slojosic-amd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superm1 commented Feb 11, 2026

Uh oh!

superm1 commented Feb 13, 2026

Uh oh!

IMbackK commented Feb 13, 2026

Uh oh!

superm1 commented Feb 18, 2026

Uh oh!

IMbackK commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented Feb 18, 2026

Uh oh!

IMbackK commented Feb 18, 2026

Uh oh!

CISC commented Feb 18, 2026

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

superm1 commented Feb 18, 2026

Uh oh!

CISC commented Feb 19, 2026

Uh oh!

superm1 commented Feb 19, 2026

Uh oh!

IMbackK commented Feb 19, 2026

Uh oh!

IMbackK commented Feb 19, 2026

Uh oh!

IMbackK left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

superm1 commented Feb 8, 2026 •

edited

Loading

superm1 commented Feb 8, 2026 •

edited

Loading

slojosic-amd commented Feb 11, 2026 •

edited

Loading

IMbackK commented Feb 18, 2026 •

edited

Loading

IMbackK left a comment •

edited

Loading