Skip to content

Commit c0f7bf1

Browse files
authored
Merge pull request #1 from boegel/rocm-blog
tweaks for new ROCm blog post
2 parents 281c88a + f91a092 commit c0f7bf1

File tree

1 file changed

+17
-18
lines changed

1 file changed

+17
-18
lines changed

docs/blog/posts/2025/08/eessi-rocm.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
authors: [toine, boegel, ocaisa]
3-
date: 2025-08-04
3+
date: 2025-08-07
44
slug: rocm
55
---
66

77
# ROCm Integration Progress: Core Components Ready for EasyBuild Release
88

99
Following our previous posts on [mapping the ROCm ecosystem](https://www.eessi.io/docs/blog/2025/05/26/rocm) and [building ROCm support in EESSI](https://www.eessi.io/docs/blog/2025/06/30/rocm), we're excited to share a significant milestone in our ROCm integration journey.
10-
Our core ROCm components are now hardware-validated and nearly ready for inclusion in the next EasyBuild release, marking substantial progress toward making AMD GPU computing more accessible through EESSI.
10+
Our core ROCm components are now hardware-validated and nearly ready for inclusion in EasyBuild, marking substantial progress toward making AMD GPU computing more accessible through EESSI.
1111

1212
<!-- more -->
1313

@@ -18,51 +18,50 @@ Our `ROCm-LLVM` toolchain, `ROCmInfo`, and `AMD SMI` components have been tested
1818
More importantly, these components are now ready to be submitted to the official EasyBuild repository, which means the broader scientific computing community will soon have access to reliable, tested ROCm builds.
1919

2020
Having a working ROCm-LLVM toolchain provides the compiler foundation that (nearly) every other ROCm component depends on.
21-
Combined with `ROCmInfo` for GPU discovery and `AMD SMI` for system monitoring, we now have the essential building blocks for AMD GPU computing in place.
21+
Combined with `ROCmInfo` for GPU discovery and `AMD SMI` for system monitoring, we now have the basic building blocks for AMD GPU computing in place.
2222

2323
## Hardware Validation Success
2424

2525
Our recent hardware testing confirmed that our builds don't just compile correctly, they function reliably with real AMD GPU hardware.
26-
One particularly encouraging discovery during our hardware validation was that **the AMDGPU driver required no special handling in our EasyBuild configurations**.
27-
This contrasts favorably with NVIDIA CUDA driver integration, where additional complexity is often required.
26+
One particularly encouraging discovery during our hardware validation was that **the AMDGPU driver required no special handling** to leverage it in EESSI.
27+
This contrasts favorably with NVIDIA CUDA driver integration, where additional steps are required to expose the driver
28+
(see [here](../../../../site_specific_config/gpu.md)).
2829
The AMDGPU driver integration worked seamlessly out of the box, which should significantly simplify deployment for EESSI users.
2930

3031
## EasyBuild Integration: Ready for Community Access
3132

32-
Our work is now formalized in [EasyBuild pull request #23542](https://github.com/easybuilders/easybuild-easyconfigs/pull/23542), which adds comprehensive ROCm support to the EasyBuild ecosystem.
33+
Our work is now formalized in [`easybuild-easyconfigs` pull request #23542](https://github.com/easybuilders/easybuild-easyconfigs/pull/23542), which adds support for installing ROCm 6.4.1 to EasyBuild.
34+
3335
This PR includes:
3436

3537
* **ROCm-LLVM toolchain**: The compiler foundation for AMD GPU computing
3638
* **Essential dependencies**: Previously missing components now properly packaged
3739
* **Core utilities**: ROCmInfo, AMDSMI, and HIP runtime with all necessary dependencies
38-
* **Validation suite groundwork**: Initial components for the ROCm Validation Suite
40+
* **Validation suite groundwork**: Initial components for the [ROCm Validation Suite](https://github.com/ROCm/ROCmValidationSuite)
3941

4042
The PR builds upon significant framework enhancements, including support for AMD GPU compute capabilities (similar to CUDA's compute capabilities), a new LLVM-based toolchain implementation, and specialized EasyBlocks for ROCm components.
41-
Special thanks to [Jan André Reuter](https://github.com/Thyre) and [Bob Dröge](https://github.com/bedroge) for their parts in this.
43+
Special thanks to [Jan André Reuter](https://github.com/Thyre), [Davide Grassano](https://github.com/crivella), and [Bob Dröge](https://github.com/bedroge) for their parts in this.
4244

4345
## Current Challenge: ROCm Validation Suite
4446

4547
While celebrating our core component success, we must acknowledge the challenge that has prevented us from completing our original goal of full ROCm Validation Suite support.
4648
We encountered compilation issues with `hipBLASLt` (documented in [ROCm issue #316](https://github.com/ROCm/rocm-libraries/issues/316)), which is a dependency for the complete validation suite.
4749

48-
**[UPDATE MARKER: Check hipBLASLt workaround status before publication]**
49-
5050
We're currently exploring potential workarounds for the `hipBLASLt` compilation issue.
51-
If successful, we may be able to include full validation suite support in this EasyBuild release.
52-
Otherwise, this will remain a priority for the next development cycle.
51+
If successful, we may be able to add support for installing the ROCm validation suite in EasyBuild soon.
5352

5453
## Outstanding Work: Driver Compatibility Checking
5554

56-
One important piece of functionality we still need to implement is automatic compatibility checking between the host AMDGPU driver version and our ROCm builds.
57-
While our current builds work correctly with compatible drivers, we want to provide clear guidance to users when incompatibilities are detected, similar to what's available in the CUDA ecosystem.
55+
One important piece of functionality we still need to implement is automatic compatibility checking between the host AMDGPU driver version and our ROCm installation.
56+
While our current installations work correctly with compatible drivers, we want to provide clear guidance to users when incompatibilities are detected, similar to what's available in the CUDA ecosystem.
5857

5958
## What's Next: Expanding the Stack
6059

6160
With our core components ready for release, our focus shifts to expanding ROCm support in several key areas:
6261

6362
**Immediate priorities:**
6463

65-
* Resolving the `hipBLASLt` compilation issue to enable full validation suite support
64+
* Resolving the `hipBLASLt` compilation issue to support installing the ROCm validation suite
6665
* Implementing driver compatibility detection and user guidance
6766

6867
**Medium-term goals:**
@@ -74,12 +73,12 @@ With our core components ready for release, our focus shifts to expanding ROCm s
7473
**Long-term goals:**
7574

7675
* Contributing our ROCm ecosystem overview to AMD's official documentation
77-
* Providing feedback to AMD's TheRock project based on our integration experience
76+
* Providing feedback to [AMD's TheRock project](https://github.com/ROCm/TheRock) based on our integration experience
7877

7978
## Community Impact and Looking Forward
8079

81-
As we approach the EasyBuild release containing our ROCm components, we're excited about the possibilities this opens for the scientific computing community.
82-
AMD GPUs offer compelling alternatives to traditional computing architectures, and our work is helping remove the barriers that have historically made them challenging to deploy and manage.
80+
As we approach adding support to EasyBuild for installing recent ROCm versions, we're excited about the possibilities this opens for the scientific computing community.
81+
AMD GPUs offer compelling alternatives to traditional computing architectures, and our work is helping to remove the barriers that have historically made them challenging to deploy and manage.
8382

8483
We'll continue sharing our progress as we work toward complete ROCm integration in EESSI.
8584
The goal remains unchanged: making high-performance AMD GPU computing as accessible and reliable as possible for researchers and developers worldwide.

0 commit comments

Comments
 (0)