Enable the use of libcufile #293

mgorny · 2024-11-22T19:10:35Z

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Fixes #257

…nda-forge-pinning 2024.11.22.09.17.35

conda-forge-admin · 2024-11-22T19:11:59Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12050123841. Examine the logs at this URL for more detail.}

recipe/meta.yaml

jakirkham · 2024-11-23T02:37:43Z

Thanks Michał! 🙏

Had a suggestion above. Also happy to help with reviewing here

…nda-forge-pinning 2024.11.23.13.45.45

mgorny · 2024-11-23T16:48:34Z

Updated. That said, I've no clue why smithy suddenly doubled the number of configurations (but it happens without my changes too).

h-vetinari · 2024-11-23T20:16:07Z

That said, I've no clue why smithy suddenly doubled the number of configurations

We did a pretty major change to how the docker images and the stdlib interact. This means that the work-arounds in CBC are now likely causing extra configurations to appear.

As discussed in: conda-forge#177 (comment) Co-Authored-By: H. Vetinari <h.vetinari@gmx.com>

…nda-forge-pinning 2024.11.23.20.32.37

h-vetinari

I cleaned up the now-obsolete config, picked your commit d912eab (c.f. discussion in #177), and leveraged the switch to CUDA 12.6, so this should now be substantially simplified. :)

recipe/meta.yaml

h-vetinari · 2024-11-23T21:03:30Z

@hmaarrfk, I'm not sure that 65ffa43 was supposed to end up being merged? As in: why are we not using the GHA server anymore?

This reverts commit 65ffa43.

…nda-forge-pinning 2024.11.23.20.32.37

h-vetinari · 2024-11-25T00:14:31Z

@conda-forge/pytorch-cpu, after a bunch of restarts this now finally got through conda-forge/conda-smithy#2163 - the rest works without issue (debugging that continues in #294). The last commit is not strictly necessary (that was a false lead), but I didn't want to restart the CI yet again.

Note that it drops CUDA 11.8 as discussed in #177. Finally, I'd also be happy to join as a maintainer here - I've been in the "shadows" here long enough I think. ;-)

jslee02 · 2024-11-25T00:24:23Z

@hmaarrfk, I'm not sure that 65ffa43 was supposed to end up being merged? As in: why are we not using the GHA server anymore?

The goal might have been to build Linux packages locally to conserve CI resources temporarily. Since this doesn't seem intended as a long-term solution, re-enabling CI makes sense to me.

I tested this PR locally, and it appears that aarch64 with CUDA 12.6 fails:

+ pip check
No broken requirements found.
+ python -c 'import torch; torch.tensor(1).to('\''cpu'\'').numpy(); print('\''numpy support enabled!!!'\'')'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/conda/feedstock_root/build_artifacts/libtorch_1732480330557/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.12/site-packages/torch/__init__.py", line 367, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home/conda/feedstock_root/build_artifacts/libtorch_1732480330557/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.12/site-packages/torch/../../../././libcufile.so.0)
WARNING: Tests failed for pytorch-2.5.1-cuda126_py312h8a24fa9_204.conda - moving package to /home/conda/feedstock_root/build_artifacts/broken
Traceback (most recent call last):
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 3483, in test
    utils.check_call_env(
  File "/opt/conda/lib/python3.12/site-packages/conda_build/utils.py", line 404, in check_call_env
    return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/utils.py", line 380, in _func_defaulting_env_to_os_environ
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/libtorch_1732480330557/test_tmp/conda_test_runner.sh']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda-build", line 11, in <module>
    sys.exit(execute())
             ^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/cli/main_build.py", line 589, in execute
    api.build(
  File "/opt/conda/lib/python3.12/site-packages/conda_build/api.py", line 209, in build
    return build_tree(
           ^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 3670, in build_tree
    test(pkg, config=metadata.config.copy(), stats=stats)
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 3497, in test
    tests_failed(
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 3544, in tests_failed
    raise CondaBuildUserError("TESTS FAILED: " + os.path.basename(pkg))
conda_build.exceptions.CondaBuildUserError: TESTS FAILED: pytorch-2.5.1-cuda126_py312h8a24fa9_204.conda
valid configs are {'osx_arm64_numpy2.0python3.9.____cpython', 'osx_64_blas_implgenericnumpy2.0python3.9.____cpython', 'osx_arm64_numpy2.0python3.12.____cpython', 'osx_arm64_numpy2.0python3.10.____cpython', 'osx_arm64_numpy2python3.13.____cp313', 'osx_64_blas_implmklnumpy2.0python3.9.____cpython', 'osx_64_blas_implmklnumpy2.0python3.10.____cpython', 'osx_64_blas_implmklnumpy2python3.13.____cp313', 'linux_aarch64_c_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13', 'linux_64_blas_implmklc_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13', 'osx_64_blas_implmklnumpy2.0python3.11.____cpython', 'linux_64_blas_implgenericc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version12', 'osx_64_blas_implgenericnumpy2.0python3.10.____cpython', 'osx_64_blas_implgenericnumpy2python3.13.____cp313', 'osx_64_blas_implmklnumpy2.0python3.12.____cpython', 'osx_64_blas_implgenericnumpy2.0python3.11.____cpython', 'osx_64_blas_implgenericnumpy2.0python3.12.____cpython', 'linux_64_blas_implmklc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version12', 'linux_aarch64_c_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version12', 'osx_arm64_numpy2.0python3.11.____cpython', 'linux_64_blas_implgenericc_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13'}
Using linux_aarch64_c_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.6cxx_compiler_version12 configuration

h-vetinari · 2024-11-25T01:39:20Z

ImportError: /lib64/libm.so.6: version GLIBC_2.27' not found`

This is not a failure per se, it just means you're on a too-old system (you need at least glibc 2.28). My understanding had been that only the CUDA binaries themselves (and not necessarily) pytorch required newer glibc, but it'd be very easy to add

c_stdlib_version:  # [aarch64]
  - "2.28"         # [aarch64]

to enforce that constraint.

h-vetinari · 2024-11-27T09:26:58Z

I purposefully didn't start the CIRUN but here is a demo of me failing to cancel jobs and

Job cancellation on azure and on github are two completely different things...? If you want to be able to cancel jobs on azure, you'll need to ask Matt to add you to the admins of the conda-forge org (on the AzurePipelines-side).

In contrast, cancelling and restarting jobs in the github actions for cirun should work as soon as you have the rights to cirun.

hmaarrfk · 2024-11-27T10:23:32Z

thanks!

h-vetinari · 2024-11-27T11:18:48Z

Under normal circumstances this shouldn't be possible - the job is running inside an alma8 image with glibc 2.28. It might be that libcufile has been compiled without picking up the right sysroot

OK, I think I found the solution. The test environment containts an old sysroot (presumably as a transitive dependency), which is probably triggering this.

So the immediate solution is to add

run_constrained:
  - sysroot_{{ target_platform }} >={{ c_stdlib_version }}

to libcufile. The large-scale solution would be conda-forge/linux-sysroot-feedstock#63.

hmaarrfk · 2024-11-27T12:21:49Z

ok lets try an even shorter term solution! See latest commit!

mgorny · 2024-11-27T16:30:08Z

If I have more changes, should I add them here or start another PR to let this one finish building (presumably, after you restart the builds)?

hmaarrfk · 2024-11-27T16:58:29Z

So the CI failures are quite annoying..... as h-vetinari alluded to and make it hard to merge into main.

If you are experimenting, probably best to leave this working.
If you know they will work, you can add them here.

Babysitting the builds once they get merge (or triggering CFEP03) is tiring.... so "fewer merges" is good while the CIs are flaky.....

h-vetinari · 2024-11-27T17:04:30Z

recipe/meta.yaml

+    #   The medium term solution is to add such a constraint to libcufile
+    #   The long term solution is to add such a constraint to all packages
+    #   that depend on a specific sysroot at building.
+    - sysroot_{{ target_platform }} >={{ c_stdlib_version }}


If you're doing this here rather than in libcufile, you either need to write >=2.28 explicitly, or set

c_stdlib_version: # [aarch64] - "2.28" # [aarch64]

and rerender

hmaarrfk · 2024-11-27T17:10:42Z

@conda-forge-admin please rerender

conda-forge-admin · 2024-11-27T17:11:46Z

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

❌ You are setting c_stdlib_version below the current global baseline in conda-forge (10.13). If this is your intention, you also need to override MACOSX_DEPLOYMENT_TARGET (with the same value) locally.

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12055186149. Examine the logs at this URL for more detail.}

…onda-forge-pinning 2024.11.27.13.47.37

mgorny · 2024-11-27T17:18:54Z

If you are experimenting, probably best to leave this working.

They're touching cross, so perhaps best leave them out for now, and make another later. Maybe I'll have more changes by then.

hmaarrfk · 2024-11-27T17:20:16Z

Sounds good. well then we can see if restarting these jobs enough will get them to be green.

h-vetinari · 2024-11-27T22:49:40Z

Sounds good. well then we can see if restarting these jobs enough will get them to be green.

Apparently the issues should be fixed now (thanks @aktech! 🙏) -- or at least mitigated. I'll push a fix for the linter error that'll also test the hypothesis whether turning off the affected GPU returns stability.

conda-forge-admin · 2024-11-27T22:54:56Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.
ℹ️ The recipe is not parsable by parser conda-recipe-manager. Your recipe may not receive automatic updates and/or may not be compatible with conda-forge's infrastructure. Please check the logs for more information and ensure your recipe can be parsed.

_{This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12059425335. Examine the logs at this URL for more detail.}

recipe/conda_build_config.yaml

h-vetinari · 2024-11-28T07:23:51Z

All 6 CI runs on the opengpu server now started successfully, so I'm going to claim that the fix was sufficient. Thanks again @aktech! :)

mgorny · 2024-11-28T13:11:51Z

And yay, it's all green!

h-vetinari

👏

hmaarrfk · 2024-11-28T17:40:07Z

Nice to see some parallel builds in these final stages!

Summary: This PR aims to reduce maintenance effort now that the latest `conda-forge::pytorch` package [has dropped the CUDA 11.8 support](conda-forge/pytorch-cpu-feedstock#293 (comment)), and we no longer have any use cases for CUDA 11.8. Additionally, removing the unnecessary use of 'pytorch' and 'nvidia' channels to simplify the number of conda channels used. ## Checklist: - [x] Adheres to the [style guidelines](https://facebookincubator.github.io/momentum/docs/developer_guide/style_guide) - [x] Codebase formatted by running `pixi run lint` Pull Request resolved: #148 Test Plan: CI Reviewed By: nickyhe-gemini Differential Revision: D66679272 Pulled By: jeongseok-meta fbshipit-source-id: a3fb7ef53e2b62a501601d871c76b99315b3ae96

mgorny added 2 commits November 22, 2024 20:09

Enable the use of libcufile

f9d72b1

Fixes conda-forge#257

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.7, and co…

4d410ad

…nda-forge-pinning 2024.11.22.09.17.35

mgorny requested review from jeongseok-meta, Tobias-Fischer, beckermr, benjaminrwilson, hmaarrfk and sodre as code owners November 22, 2024 19:10

jakirkham reviewed Nov 23, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

mgorny added 3 commits November 23, 2024 17:43

Pull libcufile-dev only for CUDA >= 12.2 on AArch64

5876869

Pass USE_CUFILE=1 only we actually want to use it

c385189

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.7, and co…

3144a68

…nda-forge-pinning 2024.11.23.13.45.45

h-vetinari and others added 3 commits November 24, 2024 07:27

remove obsolete config in CBC & conda-forge.yml

2b86afa

Remove CUDA 11.8 builds

757368e

As discussed in: conda-forge#177 (comment) Co-Authored-By: H. Vetinari <h.vetinari@gmx.com>

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.7, and co…

4a14c60

…nda-forge-pinning 2024.11.23.20.32.37

h-vetinari reviewed Nov 23, 2024

View reviewed changes

recipe/meta.yaml Outdated Show resolved Hide resolved

h-vetinari added 3 commits November 24, 2024 08:03

Revert "build on azure to trigger timeout"

54abf7f

This reverts commit 65ffa43.

simplify libcufile-dev handling

a08ccbf

MNT: Re-rendered with conda-build 24.9.0, conda-smithy 3.44.7, and co…

64b417c

…nda-forge-pinning 2024.11.23.20.32.37

h-vetinari mentioned this pull request Nov 23, 2024

BUG: flaky start-up failures with GHA builds conda-forge/conda-smithy#2163

Closed

downgrade image on aarch

8555f1c

h-vetinari mentioned this pull request Nov 24, 2024

Investigate server startup failure #294

Closed

hmaarrfk mentioned this pull request Nov 27, 2024

Update c_stdlib_version for aarch + cuda 12.6 conda-forge/conda-forge-pinning-feedstock#6745

Closed

5 tasks

add sysroot constraint

108adff

h-vetinari reviewed Nov 27, 2024

View reviewed changes

Add c_stdlib for aarch

66a15ad

MNT: Re-rendered with conda-build 24.11.1, conda-smithy 3.44.8, and c…

9b6984d

…onda-forge-pinning 2024.11.27.13.47.37

help linter understand which c_stdlib_version refers to linux

d9c52f3

hmaarrfk reviewed Nov 28, 2024

View reviewed changes

recipe/conda_build_config.yaml Show resolved Hide resolved

h-vetinari mentioned this pull request Nov 28, 2024

Update glibc pinning to 2.28 for aarch64 conda-forge/libcufile-feedstock#24

Merged

h-vetinari approved these changes Nov 28, 2024

View reviewed changes

hmaarrfk merged commit 78ea0ce into conda-forge:main Nov 28, 2024
25 checks passed

mgorny mentioned this pull request Nov 28, 2024

Allow "generic" BLAS builds with CUDA enabled #292

Closed

5 tasks

mgorny deleted the cufile branch November 28, 2024 19:39

jeongseok-meta mentioned this pull request Dec 2, 2024

Drop supporting CUDA 11.8 in favor of 12.6 facebookresearch/momentum#148

Closed

2 tasks

jeongseok-meta mentioned this pull request Dec 5, 2024

Update to torchaudio 2.5.1 and rebuild with pytorch 2.5.1 conda-forge/torchaudio-feedstock#12

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the use of libcufile #293

Enable the use of libcufile #293

mgorny commented Nov 22, 2024 •

edited

Loading

conda-forge-admin commented Nov 22, 2024 •

edited

Loading

jakirkham commented Nov 23, 2024

mgorny commented Nov 23, 2024

h-vetinari commented Nov 23, 2024

h-vetinari left a comment

h-vetinari commented Nov 23, 2024 •

edited

Loading

h-vetinari commented Nov 25, 2024

jslee02 commented Nov 25, 2024

h-vetinari commented Nov 25, 2024

h-vetinari commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024 •

edited

Loading

mgorny commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari Nov 27, 2024

hmaarrfk Nov 27, 2024

hmaarrfk commented Nov 27, 2024

conda-forge-admin commented Nov 27, 2024 •

edited

Loading

mgorny commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari commented Nov 27, 2024 •

edited

Loading

conda-forge-admin commented Nov 27, 2024

h-vetinari commented Nov 28, 2024 •

edited

Loading

mgorny commented Nov 28, 2024

h-vetinari left a comment

hmaarrfk commented Nov 28, 2024

Enable the use of libcufile #293

Enable the use of libcufile #293

Conversation

mgorny commented Nov 22, 2024 • edited Loading

conda-forge-admin commented Nov 22, 2024 • edited Loading

jakirkham commented Nov 23, 2024

mgorny commented Nov 23, 2024

h-vetinari commented Nov 23, 2024

h-vetinari left a comment

Choose a reason for hiding this comment

h-vetinari commented Nov 23, 2024 • edited Loading

h-vetinari commented Nov 25, 2024

jslee02 commented Nov 25, 2024

h-vetinari commented Nov 25, 2024

h-vetinari commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024 • edited Loading

mgorny commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari Nov 27, 2024

Choose a reason for hiding this comment

hmaarrfk Nov 27, 2024

Choose a reason for hiding this comment

hmaarrfk commented Nov 27, 2024

conda-forge-admin commented Nov 27, 2024 • edited Loading

mgorny commented Nov 27, 2024

hmaarrfk commented Nov 27, 2024

h-vetinari commented Nov 27, 2024 • edited Loading

conda-forge-admin commented Nov 27, 2024

h-vetinari commented Nov 28, 2024 • edited Loading

mgorny commented Nov 28, 2024

h-vetinari left a comment

Choose a reason for hiding this comment

hmaarrfk commented Nov 28, 2024

mgorny commented Nov 22, 2024 •

edited

Loading

conda-forge-admin commented Nov 22, 2024 •

edited

Loading

h-vetinari commented Nov 23, 2024 •

edited

Loading

hmaarrfk commented Nov 27, 2024 •

edited

Loading

conda-forge-admin commented Nov 27, 2024 •

edited

Loading

h-vetinari commented Nov 27, 2024 •

edited

Loading

h-vetinari commented Nov 28, 2024 •

edited

Loading