-
Notifications
You must be signed in to change notification settings - Fork 66
{2025.06}[SYSTEM] Cuda 12.6.0, 12.8.0, cuDNN 9.5.0.50 #1278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
Failure in the cuDNN host injections installations because it doesn't contain ptx code (fixed in EESSI/software-layer-scripts@e25b625 en bf2fc9c) Also, another failure: Not sure what's wrong here. We may be missing a |
|
Added some extra verbosity EESSI/software-layer-scripts@54bd9ad , let's see bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
Making it verbose seems to have solved the issue. That is, of course, impossible, but... things are working now: So maybe this was just one more of |
|
Let's get all of those host-injections installed... All bots that run native builds (one architecture per bot is sufficient) bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/intel/cascadelake,accel=nvidia/cc70 x86_64 and arm archs on AWS bot: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Edit: not sure why the previous build failed. The installations in the host_injections failed with a message that the lock file was already present. That's very strange, there should not be a lock file in the host_injections... bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
Oh crap, I see the issue, the other build was bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=aarch64/generic |
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=aarch64/generic |
|
New job on instance
|
|
Wrong version... bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/generic |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc120 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Strange... CC 70/80/90 works, but 100 and 120 give me: And in the build log: I don't get why. I don't think the CUDA compute capability is even used by EasyBuild when installing CUDA itself, until the sanity check. Why would it be different between compute capabilities? |
|
Oh, I also see: I guess we need to update that to match any number of 0-9, and potentially a,f as well. |
|
|
Ah, it seems that because of this no |
I think we should deploy the script from EESSI/software-layer-scripts#120 through this current PR, then change the
build.shback to it's original form. The issue is that EESSI/software-layer-scripts#120 can't be deployed there, because no software is built, and thus no "no missing installations" message is printed. This causes the bot to consider the build step a 'failure'.