Skip to content

Commit

Permalink
Merge pull request #386 from cjnolet/branch-21.12-merge-22.02
Browse files Browse the repository at this point in the history
Branch 21.12 merge 22.02
  • Loading branch information
ajschmidt8 authored Nov 17, 2021
2 parents 3a7e3ed + 5dd75eb commit 891ff74
Show file tree
Hide file tree
Showing 87 changed files with 1,939 additions and 986 deletions.
87 changes: 83 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,94 @@
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp;RAFT: RAPIDS Analytics Frameworks Toolset</div>
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp;RAFT: RAPIDS Analytics Framework Toolkit</div>

RAFT is a repository containining shared utilities, mathematical operations and common functions for the analytics components of RAPIDS. Both the C++ and Python components can be included in consuming libraries.
RAFT is a library containing building-blocks for rapid composition of RAPIDS Analytics. These building-blocks include shared representations, mathematical computational primitives, and utilities that accelerate building analytics and data science algorithms in the RAPIDS ecosystem. Both the C++ and Python components can be included in consuming libraries, providing building-blocks for both dense and sparse matrix formats in the following general categories:
#####
| Category | Description / Examples |
| --- | --- |
| **Data Formats** | tensor representations and conversions for both sparse and dense formats |
| **Data Generation** | graph, spatial, and machine learning dataset generation |
| **Dense Operations** | linear algebra, statistics |
| **Spatial** | pairwise distances, nearest neighbors, neighborhood / proximity graph construction |
| **Sparse/Graph Operations** | linear algebra, statistics, slicing, msf, spectral embedding/clustering, slhc, vertex degree |
| **Solvers** | eigenvalue decomposition, least squares, lanczos |
| **Tools** | multi-node multi-gpu communicator, utilities |

By taking a primitives-based approach to algorithm development, RAFT accelerates algorithm construction time and reduces
the maintenance burden by maximizing reuse across projects. RAFT relies on the [RAPIDS memory manager (RMM)](https://github.com/rapidsai/rmm) which,
like other projects in the RAPIDS ecosystem, eases the burden of configuring different allocation strategies globally
across the libraries that use it. RMM also provides RAII wrappers around device arrays that handle the allocation and cleanup.

## Getting started

Refer to the [Build and Development Guide](BUILD.md) for details on RAFT's design, building, testing and development guidelines.

Most of the primitives in RAFT accept a `raft::handle_t` object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like `cublas` and `cusolver`.


### C++ Example

The example below demonstrates creating a RAFT handle and using it with RMM's `device_uvector` to allocate memory on device and compute
pairwise Euclidean distances:
```c++
#include <raft/handle.hpp>
#include <raft/distance/distance.hpp>

#include <rmm/device_uvector.hpp>
raft::handle_t handle;

int n_samples = ...;
int n_features = ...;

rmm::device_uvector<float> input(n_samples * n_features, handle.get_stream());
rmm::device_uvector<float> output(n_samples * n_samples, handle.get_stream());

// ... Populate feature matrix ...

auto metric = raft::distance::DistanceType::L2SqrtExpanded;
rmm::device_uvector<char> workspace(0, handle.get_stream());
raft::distance::pairwise_distance(handle, input.data(), input.data(),
output.data(),
n_samples, n_samples, n_features,
workspace.data(), metric);
```
## Folder Structure and Contents
The folder structure mirrors the main RAPIDS repos (cuDF, cuML, cuGraph...), with the following folders:
The folder structure mirrors other RAPIDS repos (cuDF, cuML, cuGraph...), with the following folders:
- `cpp`: Source code for all C++ code. The code is header only, therefore it is in the `include` folder (with no `src`).
- `cpp`: Source code for all C++ code. The code is currently header-only, therefore it is in the `include` folder (with no `src`).
- `python`: Source code for all Python source code.
- `ci`: Scripts for running CI in PRs
[comment]: <> (TODO: This needs to be updated after the public API is established)
[comment]: <> (The library layout contains the following structure:)
[comment]: <> (```bash)
[comment]: <> (cpp/include/raft)
[comment]: <> ( |------------ comms [communication abstraction layer])
[comment]: <> ( |------------ distance [dense pairwise distances])
[comment]: <> ( |------------ linalg [dense linear algebra])
[comment]: <> ( |------------ matrix [dense matrix format])
[comment]: <> ( |------------ random [random matrix generation])
[comment]: <> ( |------------ sparse [sparse matrix and graph algorithms])
[comment]: <> ( |------------ spatial [spatial algorithms])
[comment]: <> ( |------------ spectral [spectral clustering])
[comment]: <> ( |------------ stats [statistics primitives])
[comment]: <> ( |------------ handle.hpp [raft handle])
[comment]: <> (```)
2 changes: 1 addition & 1 deletion ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ function hasArg {

# Set path and build parallel level
export PATH=/opt/conda/bin:/usr/local/cuda/bin:$PATH
export PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}
export PARALLEL_LEVEL=${PARALLEL_LEVEL:-8}
export CUDA_REL=${CUDA_VERSION%.*}

# Set home to the job's workspace
Expand Down
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,10 @@ endif()
# add third party dependencies using CPM
rapids_cpm_init()

# thrust and libcudacxx need to be before cuco!
include(cmake/thirdparty/get_thrust.cmake)
include(cmake/thirdparty/get_rmm.cmake)
include(cmake/thirdparty/get_libcudacxx.cmake)
include(cmake/thirdparty/get_cuco.cmake)

if(BUILD_TESTS)
Expand Down
18 changes: 5 additions & 13 deletions cpp/Doxyfile.in
Original file line number Diff line number Diff line change
Expand Up @@ -771,10 +771,7 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.

INPUT = @CMAKE_CURRENT_SOURCE_DIR@/comms \
@CMAKE_CURRENT_SOURCE_DIR@/include \
@CMAKE_CURRENT_SOURCE_DIR@/src \
@CMAKE_CURRENT_SOURCE_DIR@/src_prims
INPUT = @CMAKE_CURRENT_SOURCE_DIR@/include \

# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
Expand All @@ -799,12 +796,7 @@ INPUT_ENCODING = UTF-8
# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f, *.for, *.tcl,
# *.vhd, *.vhdl, *.ucf, *.qsf, *.as and *.js.

FILE_PATTERNS = *.cpp \
*.h \
*.hpp \
*.hxx \
*.cu \
*.cuh
FILE_PATTERNS = *.hpp

# The RECURSIVE tag can be used to specify whether or not subdirectories should
# be searched for input files as well.
Expand Down Expand Up @@ -835,8 +827,8 @@ EXCLUDE_SYMLINKS = NO
# Note that the wildcards are matched against the file with absolute path, so to
# exclude all test directories for example use the pattern */test/*

EXCLUDE_PATTERNS = columnWiseSort.h \
smoblocksolve.h
EXCLUDE_PATTERNS = **/detail/** \
**/spectral/**

# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names
# (namespaces, classes, functions, etc.) that should be excluded from the
Expand Down Expand Up @@ -873,7 +865,7 @@ EXAMPLE_RECURSIVE = NO
# that contain images that are to be included in the documentation (see the
# \image command).

IMAGE_PATH = @CMAKE_CURRENT_SOURCE_DIR@/doxygen/images
IMAGE_PATH =

# The INPUT_FILTER tag can be used to specify a program that doxygen should
# invoke to filter for each input file. Doxygen will invoke the filter program
Expand Down
2 changes: 1 addition & 1 deletion cpp/cmake/doxygen.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function(add_doxygen_target)
set(multiValueArgs "")
cmake_parse_arguments(dox "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
configure_file(${dox_IN_DOXYFILE} ${dox_OUT_DOXYFILE} @ONLY)
add_custom_target(doc
add_custom_target(docs_raft
${DOXYGEN_EXECUTABLE} ${dox_OUT_DOXYFILE}
WORKING_DIRECTORY ${dox_CWD}
VERBATIM
Expand Down
21 changes: 21 additions & 0 deletions cpp/cmake/libcudacxx.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
diff --git a/include/cuda/std/detail/__config b/include/cuda/std/detail/__config
index d55a43688..654142d7e 100644
--- a/include/cuda/std/detail/__config
+++ b/include/cuda/std/detail/__config
@@ -23,7 +23,7 @@
#define _LIBCUDACXX_CUDACC_VER_MINOR __CUDACC_VER_MINOR__
#define _LIBCUDACXX_CUDACC_VER_BUILD __CUDACC_VER_BUILD__
#define _LIBCUDACXX_CUDACC_VER \
- _LIBCUDACXX_CUDACC_VER_MAJOR * 10000 + _LIBCUDACXX_CUDACC_VER_MINOR * 100 + \
+ _LIBCUDACXX_CUDACC_VER_MAJOR * 100000 + _LIBCUDACXX_CUDACC_VER_MINOR * 1000 + \
_LIBCUDACXX_CUDACC_VER_BUILD

#define _LIBCUDACXX_HAS_NO_LONG_DOUBLE
@@ -64,7 +64,7 @@
# endif
#endif

-#if defined(_LIBCUDACXX_COMPILER_MSVC) || (defined(_LIBCUDACXX_CUDACC_VER) && (_LIBCUDACXX_CUDACC_VER < 110500))
+#if defined(_LIBCUDACXX_COMPILER_MSVC) || (defined(_LIBCUDACXX_CUDACC_VER) && (_LIBCUDACXX_CUDACC_VER < 1105000))
# define _LIBCUDACXX_HAS_NO_INT128
#endif
2 changes: 1 addition & 1 deletion cpp/cmake/thirdparty/get_cuco.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function(find_and_configure_cuco VERSION)
INSTALL_EXPORT_SET raft-exports
CPM_ARGS
GIT_REPOSITORY https://github.com/NVIDIA/cuCollections.git
GIT_TAG 729857a5698a0e8d8f812e0464f65f37854ae17b
GIT_TAG f0eecb203590f1f4ac4a9f1700229f4434ac64dc
OPTIONS "BUILD_TESTS OFF"
"BUILD_BENCHMARKS OFF"
"BUILD_EXAMPLES OFF"
Expand Down
26 changes: 26 additions & 0 deletions cpp/cmake/thirdparty/get_libcudacxx.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# =============================================================================
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing permissions and limitations under
# the License.
# =============================================================================

# This function finds libcudacxx and sets any additional necessary environment variables.
function(find_and_configure_libcudacxx)
include(${rapids-cmake-dir}/cpm/libcudacxx.cmake)

rapids_cpm_libcudacxx(
BUILD_EXPORT_SET raft-exports INSTALL_EXPORT_SET raft-exports PATCH_COMMAND patch
--reject-file=- -p1 -N < ${RAFT_SOURCE_DIR}/cmake/libcudacxx.patch || true
)

endfunction()

find_and_configure_libcudacxx()
15 changes: 8 additions & 7 deletions cpp/include/raft/comms/comms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,9 +339,9 @@ class comms_t {

/**
* Gathers data from all ranks and delivers to combined data to all ranks
* @param value_t datatype of underlying buffers
* @param sendbuff buffer containing data to send
* @param recvbuff buffer containing data to receive
* @tparam value_t datatype of underlying buffers
* @param sendbuf buffer containing data to send
* @param recvbuf buffer containing data to receive
* @param recvcounts pointer to an array (of length num_ranks size) containing the number of
* elements that are to be received from each rank
* @param displs pointer to an array (of length num_ranks size) to specify the displacement
Expand Down Expand Up @@ -376,9 +376,9 @@ class comms_t {

/**
* Gathers data from all ranks and delivers to combined data to all ranks
* @param value_t datatype of underlying buffers
* @param sendbuff buffer containing data to send
* @param recvbuff buffer containing data to receive
* @tparam value_t datatype of underlying buffers
* @param sendbuf buffer containing data to send
* @param recvbuf buffer containing data to receive
* @param sendcount number of elements in send buffer
* @param recvcounts pointer to an array (of length num_ranks size) containing the number of
* elements that are to be received from each rank
Expand All @@ -401,6 +401,7 @@ class comms_t {
* @tparam value_t datatype of underlying buffers
* @param sendbuff buffer containing data to send (size recvcount * num_ranks)
* @param recvbuff buffer containing received data
* @param recvcount number of items to receive
* @param op reduction operation to perform
* @param stream CUDA stream to synchronize operation
*/
Expand Down Expand Up @@ -476,7 +477,7 @@ class comms_t {
* @param sendbuf pointer to array of data to send
* @param sendsizes numbers of elements to send
* @param sendoffsets offsets in a number of elements from sendbuf
* @param dest destination ranks
* @param dests destination ranks
* @param recvbuf pointer to (initialized) array that will hold received data
* @param recvsizes numbers of elements to recv
* @param recvoffsets offsets in a number of elements from recvbuf
Expand Down
11 changes: 7 additions & 4 deletions cpp/include/raft/comms/std_comms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,13 @@ class std_comms : public comms_iface {

/**
* @brief Constructor for collective + point-to-point operation.
* @param comm initialized nccl comm
* @param nccl_comm initialized nccl comm
* @param ucp_worker initialized ucp_worker instance
* @param eps shared pointer to array of ucp endpoints
* @param size size of the cluster
* @param num_ranks number of ranks in the cluster
* @param rank rank of the current worker
* @param stream cuda stream for synchronizing and ordering collective operations
* @param subcomms_ucp use ucp for subcommunicators
*/
std_comms(ncclComm_t nccl_comm, ucp_worker_h ucp_worker,
std::shared_ptr<ucp_ep_h *> eps, int num_ranks, int rank,
Expand All @@ -79,9 +81,10 @@ class std_comms : public comms_iface {

/**
* @brief constructor for collective-only operation
* @param comm initilized nccl communicator
* @param size size of the cluster
* @param nccl_comm initilized nccl communicator
* @param num_ranks size of the cluster
* @param rank rank of the current worker
* @param stream stream for ordering collective operations
*/
std_comms(const ncclComm_t nccl_comm, int num_ranks, int rank,
cudaStream_t stream)
Expand Down
Loading

0 comments on commit 891ff74

Please sign in to comment.