Skip to content

DBG parallelizing single and multigpu python jobs using GHA [skip gpuci] #5096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 72 commits into from

Conversation

dantegd
Copy link
Member

@dantegd dantegd commented Dec 18, 2022

No description provided.

csadorf and others added 30 commits December 9, 2022 08:30
Consistent labels for labeler.yaml.
For consistency with other package versions, this commit moves the `ucx_py_version` variable to `conda_build_config.yaml` and adds an associated update command in `update-version.sh` so that it's automatically updated when we create new development branches.
…ersion

this commit ensures that the `CUDA` version is obtained from the `RAPIDS_CUDA_VERSION` environment variable that exists in our CI images (see link below). Additionally, it replaces the `environ.get()` syntax w/ `environ['']` syntax so that the build errors if `RAPIDS_CUDA_VERSION` is not found.

https://github.com/rapidsai/ci-imgs/blob/d8ada6950f26440d4d63fdaabc8bf8af9efb325d/Dockerfile#L14
this commit ensures that the `SCCACHE_*` variables are sourced from our CI images (see link below) to keep things DRY.

https://github.com/rapidsai/ci-imgs/blob/d8ada6950f26440d4d63fdaabc8bf8af9efb325d/Dockerfile#L17-L23
this variable is not used anywhere
this commit ensures that `cuml` will use `sccache` during its build
`ninja` was previously included in the Jenkins CI images, but it was removed from the GH Action CI images. This commit ensures that it's added explicitly as a build dependency for our recipes. Additionally, it alphabetizes the build dependency list.
This file was placed in the wrong directory, but it's superseded by the changes in the GH Actions PR anyway.
Removing this dependency will let us iterate faster for now.
this commit adds some missing `cudatoolkit` dependencies. these are necessary because the `cudatoolkit` package from `conda-forge` is incomplete. In our previous Jenkins images, this issue was masked by the fact that these libraries were available in the system-installed `cudatoolkit`. Our GH Action images are much slimmer than our Jenkins images and don't include all of these libraries pre-installed at the system level.
we need the `*-dev` packages so we get the header files
this ensures that AWS credentials are available to `sccache`. prior to this commit, `sccache` was not writing to the S3 bucket and therefore not being utilized
this package is required to be installed so that the `gtest` binaries are available
after some discussion w/ Dante and Bradley Dice, it was determined that some of these `libcu*` packages should probably be obtained transitively by `raft`. However, the `raft` recipes need to first be updated before these packages can be safely removed. Therefore I added a TODO to address it in the future.
csadorf and others added 18 commits December 15, 2022 05:59
these lines are no longer necessary since we moved the `ucx-py` version to `conda_build_config.yaml`
this aligns with the CUDA Enhanced Compatibility convention where we build with `11.5`, but can run with `11.x`
this commit updates the `dependencies.yaml` file to include the non `*-dev` packages for the missing CTK packages. Additionally, it adds a version range instead of specific version for the version specifiers
in lieu of an entirely new dependency list, a YAML anchor can be used to keep these versions in sync
In the `pytest` logs for the previous commit, it seemed that `cuml` was getting installed from the local file system, but `libcuml` was getting installed from `rapidsai-nightly`.

I was able to reproduce this issue locally and explicitly adding `libcuml` to the install line seemed to resovle it.
these probably should've been removed in rapidsai#5038
@dantegd dantegd added the DO NOT MERGE Hold off on merging; see PR for details label Dec 18, 2022
@github-actions github-actions bot added conda conda issue Cython / Python Cython or Python issue gpuCI gpuCI issue labels Dec 18, 2022
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.02@6e94e5d). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.02    #5096   +/-   ##
===============================================
  Coverage                ?   69.37%           
===============================================
  Files                   ?      192           
  Lines                   ?    12367           
  Branches                ?        0           
===============================================
  Hits                    ?     8579           
  Misses                  ?     3788           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dantegd dantegd changed the title DBG parallelizing sinle and multigpu python jobs using GHA [skip gpuci] DBG parallelizing single and multigpu python jobs using GHA [skip gpuci] Dec 19, 2022
@dantegd dantegd closed this Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue Cython / Python Cython or Python issue DO NOT MERGE Hold off on merging; see PR for details gpuCI gpuCI issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants