Skip to content

[RELEASE] cugraph v25.04 #5016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2025
Merged

[RELEASE] cugraph v25.04 #5016

merged 61 commits into from
Apr 9, 2025

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-25.04 and v25.04 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.04 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.04 into main for the release

raydouglass and others added 30 commits January 23, 2025 15:09
Forward-merge branch-25.02 into branch-25.04
Forward-merge branch-25.02 into branch-25.04
This corrects an error in the devcontainer names that might have been missed in #4907 or had some other conflict.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Don Acosta (https://github.com/acostadon)
  - James Lamb (https://github.com/jameslamb)

URL: #4908
Forward-merge branch-25.02 into branch-25.04
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4905
Forward-merge branch-25.02 into branch-25.04
Forward-merge branch-25.02 into branch-25.04
Uses a retry wrapper for `pip` commands to try to alleviate CI failures due to hash mismatches that result from network hiccups

xref rapidsai/gha-tools#132

This will retry failures that show up in CI like:

```
   Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0)
    Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB)
       ━━━━━━━━━━━━━━━━━━━━━                 350.2/604.9 MB 229.2 MB/s eta 0:00:02
  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
      nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0):
          Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3
               Got        849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be
```

ref: https://github.com/rapidsai/cugraph/actions/runs/13132982479/job/36648262815?pr=4904#step:10:147

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: #4913
Forward-merge branch-25.02 into branch-25.04
Forward-merge branch-25.02 into branch-25.04
`thrust::null_type` is deprecated and also not used anymore so it does not help looking for it

It will be removed in a future CCCL release

Authors:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4904
Exposes `build_type` as an input in `test.yaml` so that `test.yaml` can be
manually run against a specific branch/commit as needed.

The default value is still `nightly`, and without maintainer intervention, that
is what will run each night.

xref rapidsai/build-planning#147

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #4918
This completes the migration to NVKS runners now that all libraries have been tested and rapidsai/shared-workflows#273 has been merged.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #4927
It is deprecated and will be removed in an upcoming release

Authors:
  - Michael Schellenberger Costa (https://github.com/miscco)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4925
We are seeing some compiler warnings in CUDA 12.8 build.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Joseph Nke (https://github.com/jnke2016)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4924
Contributes to rapidsai/build-planning#104.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - James Lamb (https://github.com/jameslamb)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4911
…4920)

In order to better debug issues with batch distribution, this PR updates the warning message to show how many batches each rank received.  Partially resolves rapidsai/cugraph-gnn#130.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Brad Rees (https://github.com/BradReesWork)

URL: #4920
Removes obsolete GNN benchmarks that used the bulk sampler and dask APIs.  These have been replaced by the new examples in `cugraph-gnn`.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Brad Rees (https://github.com/BradReesWork)

URL: #4929
…ed heterogeneous sampling primitve. (#4922)

Update heterogeneous sampling application code to use the new heterogeneous sampling primitive.

Breaking as the `edge_type_view` input parameter is no longer optional (it is required for heterogeneous sampling).

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)

URL: #4922
Forward-merge branch-25.02 into branch-25.04
We inadvertently broke cugraph by merging this PR rapidsai/raft#2541.

Raised this PR in RAFT to fix the issues rapidsai/raft#2581.

Authors:
  - Divye Gala (https://github.com/divyegala)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4934
Update CMake minimum required to 3.30.4 across all of RAPIDS

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - https://github.com/jakirkham

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4938
This PR updates functions to consistently take raft::host_span instead of std::vector const& (we have been mixing the two); except for public functions in graph_functions.hpp.

Marked as breaking as this PR updates functions under include/cugraph/utilities/device_comm.hpp,shuffle_comm.cuh, but we don't expect public users to directly call these utility functions.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)
  - https://github.com/jakirkham

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)

URL: #4931
Follow up to #4935

There, in #4935 (comment), we discussed the that GNN packages shouldn't need to be installed in docs builds any more, as no docs in this repo (including notebooks) require them.

This PR limits dependencies on the GNN packages to only the places they're needed.

### `libwholegraph` / `pylibwholegraph`

```shell
git grep -i -E 'wholegraph'
```

Optional runtime dependency of `cugraph`:

https://github.com/rapidsai/cugraph/blob/2873ff91945c4944568ffd1aa035f6bba17746a0/python/cugraph/cugraph/gnn/feature_storage/feat_storage.py#L23

And optional test-time dependency of `cugraph`:

https://github.com/rapidsai/cugraph/blob/2873ff91945c4944568ffd1aa035f6bba17746a0/python/cugraph/cugraph/tests/data_store/test_gnn_feat_storage_wholegraph.py#L24-L25

But not used in any docs.

**Changes:** Removed from `docs` environment.

### `cugraph-dgl`

```shell
git grep -i -E 'cugraph.*dgl'
```

Not used anywhere in this repo.

**Changes:** Removed all remaining references.

### `cugraph-pyg`

```shell
git grep -i -E 'cugraph.*pyg'
```

Only used as an optional import in `cugraph-service-client`'s tests (which are not run in CI for wheels).

https://github.com/rapidsai/cugraph/blob/2873ff91945c4944568ffd1aa035f6bba17746a0/python/cugraph-service/tests/test_remote_graph.py#L662

**Changes:** Removed from the `docs` environment, added a `[test]` extra to `cugraph-service-client` including this.

## Notes for Reviewers

Related to these issues about moving more GNN stuff out of this repo:

* #4822
* #4407

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Mike Sarahan (https://github.com/msarahan)
  - Bradley Dice (https://github.com/bdice)

URL: #4947
…fle utility functions. (#4936)

This PR was initially created to add forest pruning code. But re-purposed to avoid creating a PR that adds/updates two features (shuffling & forest pruning, forest pruning requires edge source/destination shuffling functions). A separate forest pruning PR will be created pulling updates from this PR.

This PR
* Added utility functions to shuffle local edge source/destination vertices (already resides in the local GPU based on edge partitioning) to the owning GPUs (by vertex partitioning). This can be achieved just using a sub-communicator (major communicator or minor communicator involving only a subset of the GPUs). This is more efficient than using the existing `shuffle_vertices` which uses the global communicator.
* Renamed public shuffle utility functions. Several public shuffle functions have `external` in their function names (e.g. `shuffle_external_vertices`), but we are using `ext` or `int` abbreviations to refer external (unrenumbered) and internal (renumbered) vertices elsewhere (e.g. renumbering/unrenumbering functions). This PR renames `external` in those functions to `ext` for consistency.
* Splitted `graph_functions.hpp` to `graph_functions.hpp` and `shuffle_functions.hpp` and moved shuffle utility functions to the new `shuffle_functions.hpp`.
* We have been calling detail space functions in many places even when there exists a public function doing the same job. This PR updates the codebase to invoke public functions whenever possible.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Divye Gala (https://github.com/divyegala)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4936
miscco and others added 14 commits March 14, 2025 22:33
…4971)

NVCC seems to also incorrectly deduce this in 12.5

Authors:
  - Michael Schellenberger Costa (https://github.com/miscco)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4971
…4975)

This PR adds a filter to skip CUDA 11.4 jobs on PRs as a precursor to enabling them in shared-workflows.
Once the 11.4 issues are fixed, this matrix filter should be removed so 11.4 gets tested on PRs.

xref: rapidsai/build-planning#164

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #4975
…4970)

We can create a graph from an edge list in a single chunk or multiple chunks (e.g. reading from multiple files, and this has advantage in peak memory requirement).

To support the latter case, this PR updates the `remove_multi_edges` function to take an edge list in multiple chunks.

Tests were previously missing and this PR adds tests as well.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)

URL: #4970
Short PR to remove extra sorting of values and redundant need to drop indices when converted to cupy arrays

Authors:
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4899
25.04 nightly tests are now failing in CI with an OOM in the property graph tests.  We are using smaller memory GPUs, so this PR will shrink the memory usage.

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4981
…4957) (#4979)

This PR adds the nightly CI check back and removes the temporarily increase length as nightlies were blocked due to failing changes in RAFT that are now resolved as of https://github.com/rapidsai/raft/actions/runs/13913530930

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #4979
Switch to using CuPy for array creation when needed. Can still pass these to Numba or use `__cuda_array_interface__` for low-level operations.

Also switch to CUDA Python for low-level CUDA functions calls in Python.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4958
I noticed that `dependencies.yaml` had a dependency on an `x86_64` compiler on `aarch64`. This was a typo, not an attempt at cross-compilation.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Ray Douglass (https://github.com/raydouglass)

URL: #4980
We found the issue to be in some unclear compile issues within libcu++ `not_fn`, so we can revert all those hacks


Revert "Silence compiler warnings about host device destructor (#4960)"
Revert "[CTK 12.5]: Avoid another compiler issue with host device detection (#4971)"

Authors:
  - Michael Schellenberger Costa (https://github.com/miscco)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)

URL: #4985
If there exists a high degree vertex, run BFS from the highest degree vertex first to possibly find the largest connected component in the graph. BFS uses a visited flag (1 bit) instead of component ID (sizeof(vertex_t)) to store edge end point (src or dst) properties; thus use less memory.

Then, we can create a smaller graph just extracting edges with unvisited endpoints.

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #4990
…rkflows (#4975)" (#4995)

Now that nightlies are passing, we should be able to test these jobs in PRs.

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #4995
Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.

This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged.  Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests.  Each of the disabled tests is skipped with a link to that PR as the reason.

Closes #4941

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Erik Welch (https://github.com/eriknw)
  - Joseph Nke (https://github.com/jnke2016)
  - Seunghwa Kang (https://github.com/seunghwak)

URL: #4974
This PR exposes FA2 to the PLC API

closes #4881

Authors:
  - Joseph Nke (https://github.com/jnke2016)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4969
This PR leverages the CAPI to optimize the check for vertex existence which can lead to +100x speedup compared to the cudf based version as it can be noticed from the performance figure below

<img width="667" alt="Screenshot 2025-03-17 at 9 47 37 AM" src="https://github.com/user-attachments/assets/4a90cb59-c4cb-40d8-a6b5-9447b1a6cc42" />


closes #4956

Authors:
  - Joseph Nke (https://github.com/jnke2016)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4966
@raydouglass raydouglass requested review from a team as code owners April 3, 2025 19:57
@raydouglass raydouglass requested a review from jameslamb April 3, 2025 19:57
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@AyodeAwe AyodeAwe merged commit 289ef1b into main Apr 9, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.