Add support for a CUDA 12.9.1 image with PyTorch 2.8.0 #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a Makefile recipe to build a Determined base image with CUDA 12.9.1 and PyTorch 2.8.0
I encountered some issues trying to build the image with PyTorch 2.9.0, I believe that we should be overriding this anyways when we build the actual augment images though
I was also unable to add
10.0to theTORCH_CUDA_ARCH_LIST. I think this should be the Blackwell version.TESTED:
make build-gpt-neox-deepspeed-gpu-torch-280and thendocker run -it 77824367d1e6 /bin/bashto check thenvccversion is 12.9.1