Add a build target to generate ROCm artifacts using ROCm 7.2#19433
Conversation
1790896 to
8aeb553
Compare
|
@superm1 i would lean towards simply generating the release with 7.1 and including all targets, while rocm unfortunately dose not have a stable abi in practice - the 7.1 compile time + 7.2 runtime combination works fine. Unless you know of an issue i am not aware with with some non cdna target in 7.1. |
|
Yes, there is an incompatibility with mainline kernel on 7.1 for Strix Halo. It's a long story, but it would be much better to release 7.2 based artifacts. In this case runtime and compile time are identical because the rocm libraries are added into the artifact not coming from OS. |
|
ROCm/rocm-systems#2865 (comment) Just a heads-up, ROCm 7.2 currently has some performance issues, and I'm not sure if they've been fixed. |
My feeling is this is a perfect is the enemy of good situation. I say that because there are no llama.cpp artifacts right now and everyone is compiling their own thing. This gets the ball rolling for official ones and we can all keep improving them. Yes; there is a regression reported here. It's been root caused to a compiler change and has been reverted in the develop branch but will take a bit to make it's way to a stable release. There is a workaround right now that can be applied though that avoids it: -mllvm --amdgpu-unroll-threshold-local=600 |
|
since the compiler is the problem in both cases we could also just downgrade the just compiler in the container by fetching the rocm 7.1 package and installing it. |
|
Is that what you would rather see (set up 7.2 container, add 7.1 repos, and downgrade compiler to the one from 7.1)? I did come up with a workaround for the fp16 issue if you would rather go that way: #19461 |
|
No, #19461 is preferable. |
050b836 to
2b1e35b
Compare
|
@superm1 @IMbackK we should add |
2b1e35b to
170f2f9
Compare
|
That's a great suggestion. I've modified the PR accordingly. |
170f2f9 to
67a425d
Compare
|
Considering the comments in #19594 I have adjusted this PR to do 7.2 in the same way that 7.11 is done. That is have a single artifact. Basically install the ROCm stack for doing the build, but don't bundle ROCm itself in the artifact. The user would be responsible for installing ROCm to use the artifact. Here is a CI build from my fork demonstrating how it works now. The artifact is 464MB. |
67a425d to
6559a81
Compare
|
So as mentioned in #19594 i think this one is the better option. |
6559a81 to
d961293
Compare
|
Lets go for just the official release, for one thing the builds are pretty large and the other reason being that new versions of rocm have broken things fairly often and dealing with that at the higher release cadence of therock feels not terribly appealing. Sure we could just not update what therock build we build against, but that would imo defeat the purpose of building against the preview versions at all. Tiny nit: it would be better if the release name included the version of Ubuntu built against. |
None of the other releases have it, so it's fine for now. |
yes and i think it would be relevant for all releases since it tells you what version of glibc etc its built against. |
Separate PR if anything, but not too keen on potentially breaking someone's workflow again. |
fb4909c to
4a1f236
Compare
Can you trigger the rest of the CI jobs? |
There are no more jobs related to |
|
Yes, just tested them again this morning. This is a Ubuntu 24.04 toolbox, that I installed ROCm 7.2 into and then ran the binaries. |
|
I just noticed that gfx1030 went missing here too, probubly want to build this for gfx1030 also, since that is an architecture amd both builds for and has official support for. I need to check the release images to see if amd is currently also building for gfx1010 and gfx1031, gfx1032 which are not officially supported architectures but where built for in rocm 7.0 |
|
I tried the release binaries locally on gfx908 and they worked fine here, no issues on that front. |
This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
4a1f236 to
3493f6d
Compare
There was a problem hiding this comment.
I have no further objections.
Side note:
Since this is bring-your-own-rocm, we could consider in the future also build for architectures not in the default rocm release that are known to work well, like gfx900, gfx906, gfx101x, gfx1031 but just the officially supported architectures for this initial pr is good.
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
…g#19433) This builds the following targets: * gfx1151 * gfx1150 * gfx1200 * gfx1201 * gfx1100 * gfx1101 * gfx1030 * gfx908 * gfx90a * gfx942
This builds the following targets: