-
Notifications
You must be signed in to change notification settings - Fork 80
Dockerfile - add sm_103 support for cuda12.9 docker image #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #737 +/- ##
=======================================
Coverage 85.97% 85.97%
=======================================
Files 102 102
Lines 7579 7579
=======================================
Hits 6516 6516
Misses 1063 1063
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| RUN echo PATH="$PATH" > /etc/environment && \ | ||
| echo LD_LIBRARY_PATH="$LD_LIBRARY_PATH" >> /etc/environment && \ | ||
| echo SB_MICRO_PATH="$SB_MICRO_PATH" >> /etc/environment && \ | ||
| echo "source /opt/hpcx/hpcx-init.sh && hpcx_load" | tee -a /etc/bash.bashrc >> /etc/profile.d/10-hpcx.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to call hpcx_load proactively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no custom hpc-x installed we can just remove hpcx_load, but for cuda13.0 I see Dilip installed a custom hpc-x version so might need to wait later to see how to cope with both situations.
| ifeq ($(shell echo $(CUDA_VER)">=12.8" | bc -l), 1) | ||
| ifeq ($(shell echo $(CUDA_VER)">=12.9" | bc -l), 1) | ||
| # Get commit 87048bd from msscl to support updated nccl and sm_100 | ||
| $(eval ARCHS := 100 103) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks there are some duplicated changes with [#739]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I'm assuming we will merge cuda13.0 [#739] first. I will take care of the code merge later here.
polarG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added sm_103 arch for executables with cuda arch dependency.
Removed duplicate installation of hpc-x and nccl in cuda12.9.dockerfile and cuda12.8.dockerfile