-
-
Notifications
You must be signed in to change notification settings - Fork 48
Enable the use of libcufile #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nda-forge-pinning 2024.11.22.09.17.35
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12050123841. Examine the logs at this URL for more detail. |
Thanks Michał! 🙏 Had a suggestion above. Also happy to help with reviewing here |
Updated. That said, I've no clue why smithy suddenly doubled the number of configurations (but it happens without my changes too). |
We did a pretty major change to how the docker images and the stdlib interact. This means that the work-arounds in CBC are now likely causing extra configurations to appear. |
As discussed in: conda-forge#177 (comment) Co-Authored-By: H. Vetinari <h.vetinari@gmx.com>
…nda-forge-pinning 2024.11.23.20.32.37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 65ffa43.
…nda-forge-pinning 2024.11.23.20.32.37
@conda-forge/pytorch-cpu, after a bunch of restarts this now finally got through conda-forge/conda-smithy#2163 - the rest works without issue (debugging that continues in #294). The last commit is not strictly necessary (that was a false lead), but I didn't want to restart the CI yet again. Note that it drops CUDA 11.8 as discussed in #177. Finally, I'd also be happy to join as a maintainer here - I've been in the "shadows" here long enough I think. ;-) |
The goal might have been to build Linux packages locally to conserve CI resources temporarily. Since this doesn't seem intended as a long-term solution, re-enabling CI makes sense to me. I tested this PR locally, and it appears that aarch64 with CUDA 12.6 fails:
|
This is not a failure per se, it just means you're on a too-old system (you need at least glibc 2.28). My understanding had been that only the CUDA binaries themselves (and not necessarily) pytorch required newer glibc, but it'd be very easy to add c_stdlib_version: # [aarch64]
- "2.28" # [aarch64] to enforce that constraint. |
Job cancellation on azure and on github are two completely different things...? If you want to be able to cancel jobs on azure, you'll need to ask Matt to add you to the admins of the conda-forge org (on the AzurePipelines-side). In contrast, cancelling and restarting jobs in the github actions for cirun should work as soon as you have the rights to cirun. |
thanks! |
OK, I think I found the solution. The test environment containts an old sysroot (presumably as a transitive dependency), which is probably triggering this. So the immediate solution is to add run_constrained:
- sysroot_{{ target_platform }} >={{ c_stdlib_version }} to |
ok lets try an even shorter term solution! See latest commit! |
If I have more changes, should I add them here or start another PR to let this one finish building (presumably, after you restart the builds)? |
So the CI failures are quite annoying..... as h-vetinari alluded to and make it hard to merge into main.
Babysitting the builds once they get merge (or triggering CFEP03) is tiring.... so "fewer merges" is good while the CIs are flaky..... |
# The medium term solution is to add such a constraint to libcufile | ||
# The long term solution is to add such a constraint to all packages | ||
# that depend on a specific sysroot at building. | ||
- sysroot_{{ target_platform }} >={{ c_stdlib_version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're doing this here rather than in libcufile, you either need to write >=2.28
explicitly, or set
c_stdlib_version: # [aarch64]
- "2.28" # [aarch64]
and rerender
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@conda-forge-admin please rerender |
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe/meta.yaml:
For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12055186149. Examine the logs at this URL for more detail. |
…onda-forge-pinning 2024.11.27.13.47.37
They're touching cross, so perhaps best leave them out for now, and make another later. Maybe I'll have more changes by then. |
Sounds good. well then we can see if restarting these jobs enough will get them to be green. |
Apparently the issues should be fixed now (thanks @aktech! 🙏) -- or at least mitigated. I'll push a fix for the linter error that'll also test the hypothesis whether turning off the affected GPU returns stability. |
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12059425335. Examine the logs at this URL for more detail. |
All 6 CI runs on the opengpu server now started successfully, so I'm going to claim that the fix was sufficient. Thanks again @aktech! :) |
And yay, it's all green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Nice to see some parallel builds in these final stages! |
Summary: This PR aims to reduce maintenance effort now that the latest `conda-forge::pytorch` package [has dropped the CUDA 11.8 support](conda-forge/pytorch-cpu-feedstock#293 (comment)), and we no longer have any use cases for CUDA 11.8. Additionally, removing the unnecessary use of 'pytorch' and 'nvidia' channels to simplify the number of conda channels used. ## Checklist: - [x] Adheres to the [style guidelines](https://facebookincubator.github.io/momentum/docs/developer_guide/style_guide) - [x] Codebase formatted by running `pixi run lint` Pull Request resolved: #148 Test Plan: CI Reviewed By: nickyhe-gemini Differential Revision: D66679272 Pulled By: jeongseok-meta fbshipit-source-id: a3fb7ef53e2b62a501601d871c76b99315b3ae96
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Fixes #257