You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# ROCm Integration Progress: Core Components Ready for EasyBuild Release
8
8
9
9
Following our previous posts on [mapping the ROCm ecosystem](https://www.eessi.io/docs/blog/2025/05/26/rocm) and [building ROCm support in EESSI](https://www.eessi.io/docs/blog/2025/06/30/rocm), we're excited to share a significant milestone in our ROCm integration journey.
10
-
Our core ROCm components are now hardware-validated and nearly ready for inclusion in the next EasyBuild release, marking substantial progress toward making AMD GPU computing more accessible through EESSI.
10
+
Our core ROCm components are now hardware-validated and nearly ready for inclusion in EasyBuild, marking substantial progress toward making AMD GPU computing more accessible through EESSI.
11
11
12
12
<!-- more -->
13
13
@@ -18,51 +18,50 @@ Our `ROCm-LLVM` toolchain, `ROCmInfo`, and `AMD SMI` components have been tested
18
18
More importantly, these components are now ready to be submitted to the official EasyBuild repository, which means the broader scientific computing community will soon have access to reliable, tested ROCm builds.
19
19
20
20
Having a working ROCm-LLVM toolchain provides the compiler foundation that (nearly) every other ROCm component depends on.
21
-
Combined with `ROCmInfo` for GPU discovery and `AMD SMI` for system monitoring, we now have the essential building blocks for AMD GPU computing in place.
21
+
Combined with `ROCmInfo` for GPU discovery and `AMD SMI` for system monitoring, we now have the basic building blocks for AMD GPU computing in place.
22
22
23
23
## Hardware Validation Success
24
24
25
25
Our recent hardware testing confirmed that our builds don't just compile correctly, they function reliably with real AMD GPU hardware.
26
-
One particularly encouraging discovery during our hardware validation was that **the AMDGPU driver required no special handling in our EasyBuild configurations**.
27
-
This contrasts favorably with NVIDIA CUDA driver integration, where additional complexity is often required.
26
+
One particularly encouraging discovery during our hardware validation was that **the AMDGPU driver required no special handling** to leverage it in EESSI.
27
+
This contrasts favorably with NVIDIA CUDA driver integration, where additional steps are required to expose the driver
28
+
(see [here](../../../../site_specific_config/gpu.md)).
28
29
The AMDGPU driver integration worked seamlessly out of the box, which should significantly simplify deployment for EESSI users.
29
30
30
31
## EasyBuild Integration: Ready for Community Access
31
32
32
-
Our work is now formalized in [EasyBuild pull request #23542](https://github.com/easybuilders/easybuild-easyconfigs/pull/23542), which adds comprehensive ROCm support to the EasyBuild ecosystem.
33
+
Our work is now formalized in [`easybuild-easyconfigs` pull request #23542](https://github.com/easybuilders/easybuild-easyconfigs/pull/23542), which adds support for installing ROCm 6.4.1 to EasyBuild.
34
+
33
35
This PR includes:
34
36
35
37
***ROCm-LLVM toolchain**: The compiler foundation for AMD GPU computing
36
38
***Essential dependencies**: Previously missing components now properly packaged
37
39
***Core utilities**: ROCmInfo, AMDSMI, and HIP runtime with all necessary dependencies
38
-
***Validation suite groundwork**: Initial components for the ROCm Validation Suite
40
+
***Validation suite groundwork**: Initial components for the [ROCm Validation Suite](https://github.com/ROCm/ROCmValidationSuite)
39
41
40
42
The PR builds upon significant framework enhancements, including support for AMD GPU compute capabilities (similar to CUDA's compute capabilities), a new LLVM-based toolchain implementation, and specialized EasyBlocks for ROCm components.
41
-
Special thanks to [Jan André Reuter](https://github.com/Thyre) and [Bob Dröge](https://github.com/bedroge) for their parts in this.
43
+
Special thanks to [Jan André Reuter](https://github.com/Thyre), [Davide Grassano](https://github.com/crivella), and [Bob Dröge](https://github.com/bedroge) for their parts in this.
42
44
43
45
## Current Challenge: ROCm Validation Suite
44
46
45
47
While celebrating our core component success, we must acknowledge the challenge that has prevented us from completing our original goal of full ROCm Validation Suite support.
46
48
We encountered compilation issues with `hipBLASLt` (documented in [ROCm issue #316](https://github.com/ROCm/rocm-libraries/issues/316)), which is a dependency for the complete validation suite.
47
49
48
-
**[UPDATE MARKER: Check hipBLASLt workaround status before publication]**
49
-
50
50
We're currently exploring potential workarounds for the `hipBLASLt` compilation issue.
51
-
If successful, we may be able to include full validation suite support in this EasyBuild release.
52
-
Otherwise, this will remain a priority for the next development cycle.
51
+
If successful, we may be able to add support for installing the ROCm validation suite in EasyBuild soon.
One important piece of functionality we still need to implement is automatic compatibility checking between the host AMDGPU driver version and our ROCm builds.
57
-
While our current builds work correctly with compatible drivers, we want to provide clear guidance to users when incompatibilities are detected, similar to what's available in the CUDA ecosystem.
55
+
One important piece of functionality we still need to implement is automatic compatibility checking between the host AMDGPU driver version and our ROCm installation.
56
+
While our current installations work correctly with compatible drivers, we want to provide clear guidance to users when incompatibilities are detected, similar to what's available in the CUDA ecosystem.
58
57
59
58
## What's Next: Expanding the Stack
60
59
61
60
With our core components ready for release, our focus shifts to expanding ROCm support in several key areas:
62
61
63
62
**Immediate priorities:**
64
63
65
-
* Resolving the `hipBLASLt` compilation issue to enable full validation suite support
64
+
* Resolving the `hipBLASLt` compilation issue to support installing the ROCm validation suite
66
65
* Implementing driver compatibility detection and user guidance
67
66
68
67
**Medium-term goals:**
@@ -74,12 +73,12 @@ With our core components ready for release, our focus shifts to expanding ROCm s
74
73
**Long-term goals:**
75
74
76
75
* Contributing our ROCm ecosystem overview to AMD's official documentation
77
-
* Providing feedback to AMD's TheRock project based on our integration experience
76
+
* Providing feedback to [AMD's TheRock project](https://github.com/ROCm/TheRock) based on our integration experience
78
77
79
78
## Community Impact and Looking Forward
80
79
81
-
As we approach the EasyBuild release containing our ROCm components, we're excited about the possibilities this opens for the scientific computing community.
82
-
AMD GPUs offer compelling alternatives to traditional computing architectures, and our work is helping remove the barriers that have historically made them challenging to deploy and manage.
80
+
As we approach adding support to EasyBuild for installing recent ROCm versions, we're excited about the possibilities this opens for the scientific computing community.
81
+
AMD GPUs offer compelling alternatives to traditional computing architectures, and our work is helping to remove the barriers that have historically made them challenging to deploy and manage.
83
82
84
83
We'll continue sharing our progress as we work toward complete ROCm integration in EESSI.
85
84
The goal remains unchanged: making high-performance AMD GPU computing as accessible and reliable as possible for researchers and developers worldwide.
0 commit comments