Skip to content

Conversation

DevManpreet5
Copy link

Motivation

Extend CI matrix to test against ROCm 7.0. Fix #170

Technical Details

  • .github/workflows/iris-tests-apptainer.yml: Extended CI matrix to test both ROCm 6.3.1 and 7.0
  • apptainer/iris-rocm6.3.1.def: Apptainer definition for ROCm 6.3.1
  • apptainer/iris-rocm7.0.def: Apptainer definition for ROCm 7.0

Test Plan

Could not perform local testing since all AMD cloud droplets are currently out of stock.
Submitting as a Draft PR to validate changes via CI.

Test Result

N/A (pending)

Submission Checklist

@DevManpreet5
Copy link
Author

@neoblizz @maawad Hi, could you please approve and run the workflow for this draft PR? Thanks!

@mawad-amd
Copy link
Collaborator

Thanks for the PR Manpreet!

@mawad-amd
Copy link
Collaborator

Looks like the ROCm 7.0 Apptainer is broken. I suggest trying to build that locally on any system.

@DevManpreet5
Copy link
Author

DevManpreet5 commented Sep 22, 2025

@mawad-amd can you look into new checks , old failing check worked but now 1 -rank is failing. thats Interesting , 6.3 worked but 7.0 failed because pip tried writing to a read-only path. Maybe something different in the 7.0 Docker image? so should I force user install ?

@DevManpreet5
Copy link
Author

@mawad-amd Hi , I am getting [Errno 30] Read-only file system: '/opt/venv/lib/python3.10/site-packages/urllib3'
I was thinking to use pip install --user -e . (https://luminousmen.medium.com/why-use-pip-install-user-2df0259c8fb7) while copilot is suggesting Modify your workflow step to create and activate a new virtual environment before installing packages. ,

Which method will you suggest?

@mawad-amd
Copy link
Collaborator

@mawad-amd Hi , I am getting [Errno 30] Read-only file system: '/opt/venv/lib/python3.10/site-packages/urllib3' I was thinking to use pip install --user -e . (https://luminousmen.medium.com/why-use-pip-install-user-2df0259c8fb7) while copilot is suggesting Modify your workflow step to create and activate a new virtual environment before installing packages. ,

Which method will you suggest?

There seems to be something wrong with the apptainer image itself. See log here.
image

I think we will have to fix that first, then see if the other problem you are pointing out still persist. Do you still have problems with getting AMD GPU access (single GPU is fine)?

@DevManpreet5
Copy link
Author

@mawad-amd Thanks , Ohh the testcase build-apptainer-7.0 passed so I didn’t check the logs I’ll look into it. Yes, I do have access to an AMD GPU now, but just building the Apptainer image fails with:

INFO:    Extracting OCI image...
INFO:    Inserting Apptainer configuration...
INFO:    Running post scriptlet
ERROR  : Failed to set mount propagation: Permission denied
FATAL:   While performing build: while running engine: while running %post section: exit status 1

for all apptainers so i will have to figure some stuff , thanks for your time !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: CI for ROCm 7.0
2 participants