rapidsai · rapids-bot · Apr 2, 2024 · Mar 7, 2024 · Mar 7, 2024 · Mar 7, 2024
diff --git a/docs/source/api_docs.rst b/docs/source/api_docs.rst
@@ -1,5 +1,5 @@
-API Documentation
-=================
+API Reference
+=============
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/source/basics.rst b/docs/source/basics.rst
@@ -0,0 +1,91 @@
+cuVS API Basics
+===============
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   `Memory management`_
+   `Resource management`_
+
+Memory management
+-----------------
+
+Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the [RMM](https://github.com/rapidsai/rmm) library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.
+
+RMM currently has APIs for C++ and Python.
+
+C++
+^^^
+
+Here's an example of configuring RMM to use a pool allocator in C++ (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#example)):
+
+.. code-block:: c++
+
+    rmm::mr::cuda_memory_resource cuda_mr;
+    // Construct a resource that uses a coalescing best-fit pool allocator
+    // With the pool initially half of available device memory
+    auto initial_size = rmm::percent_of_free_device_memory(50);
+    rmm::mr::pool_memory_resource<rmm::mr::cuda_memory_resource> pool_mr{&cuda_mr, initial_size};
+    rmm::mr::set_current_device_resource(&pool_mr); // Updates the current device resource pointer to `pool_mr`
+    rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource(); // Points to `pool_mr`
+
+Python
+^^^^^^
+
+And the corresponding code in Python (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#memoryresource-objects)):
+
+.. code-block:: python
+
+    import rmm
+    pool = rmm.mr.PoolMemoryResource(
+      rmm.mr.CudaMemoryResource(),
+      initial_pool_size=2**30,
+      maximum_pool_size=2**32)
+    rmm.mr.set_current_device_resource(pool)
+
+
+Resource management
+-------------------
+
+cuVS uses an API from the [RAFT](https://github.com/rapidsai/raft) library of ML and data mining primitives to centralize and reuse expensive resources, such as memory management. The below code examples demonstrate how to create these resources for use throughout this guide.
+
+See RAFT's [resource API documentation](https://docs.rapids.ai/api/raft/nightly/cpp_api/core_resources/) for more information.
+
+C
+^
+
+.. code-block:: c
+
+    #include <cuda_runtime.h>
+    #include <cuvs/core/c_api.h>
+
+    cuvsResources_t res;
+    cuvsResourcesCreate(&res);
+
+    // ... do some processing ...
+
+    cuvsResourcesDestroy(res);
+
+C++
+^^^
+
+.. code-block:: c++
+
+    #include <raft/core/device_resources.hpp>
+
+    raft::device_resources res;
+
+Python
+^^^^^^
+
+.. code-block:: python
+
+    import pylibraft
+
+    res = pylibraft.common.DeviceResources()
+
+
+Rust
+^^^^
+
diff --git a/docs/source/build.md b/docs/source/build.md
@@ -1,4 +1,4 @@
-**# Installation
+# Installation
 
 The cuVS software development kit provides APIs for C, C++, Python, and Rust languages. This guide outlines how to install the pre-compiled packages, build it from source, and use it in downstream applications. 
 
@@ -43,7 +43,7 @@ mamba install -c rapidsai -c conda-forge -c nvidia libcuvs_c cuda-version=12.0
 
 #### Python Package
 ```bash
-mamba install -c rapidsai -c conda-forge -c nvidia pycuvs cuda-version=12.0
+mamba install -c rapidsai -c conda-forge -c nvidia cuvs cuda-version=12.0
 ```
 
 ### Python through Pip
@@ -52,12 +52,12 @@ The cuVS Python package can also be [installed through pip](https://rapids.ai/pi
 
 For CUDA 11 packages:
 ```bash
-pip install pycuvs-cu11 --extra-index-url=https://pypi.nvidia.com
+pip install cuvs-cu11 --extra-index-url=https://pypi.nvidia.com
 ```
 
 And CUDA 12 packages:
 ```bash
-pip install pycuvs-cu12 --extra-index-url=https://pypi.nvidia.com
+pip install cuvs-cu12 --extra-index-url=https://pypi.nvidia.com
 ```
 
 Note: these packages statically link the C and C++ libraries so the `libcuvs` and `libcuvs_c` shared libraries won't be readily available to use in your code. 
@@ -175,4 +175,4 @@ The documentation requires that the C, C++ and Python libraries have been built
 
 ## Use cuVS in your application
 
-The [examples/](https://github.com/rapidsai/raft/tree/HEAD/examples) directory at the root of the cuVS repository has self-contains drop-in projects to build and use the cuVS SDK in your applications.
+The [examples/](https://github.com/rapidsai/raft/tree/HEAD/examples) directory at the root of the cuVS repository has self-contains drop-in projects to build and use the cuVS SDK in your applications.
diff --git a/docs/source/getting_started.rst b/docs/source/getting_started.rst
@@ -0,0 +1,12 @@
+Getting Started
+===============
+
+This guide provides an initial starting point of the basic concepts and using the various APIs in the cuVS software development kit.
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   basics.rst
+   interoperability.rst
+   working_with_ann_indexes.rst
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -21,7 +21,8 @@ cuVS is a library for vector search and clustering on the GPU.
    :maxdepth: 1
    :caption: Contents:
 
-   quick_start.md
+   getting_started.rst
+   integrations.rst
    build.md
    api_docs.rst
    contributing.md

diff --git a/docs/source/integrations.rst b/docs/source/integrations.rst
@@ -0,0 +1,13 @@
+Integrations
+============
+
+In addition to using cuVS through any one of its different language APIs
+
+FAISS
+-----
+
+Milvus
+------
+
+Kinetica
+--------
@@ -0,0 +1,104 @@
+Interoperability
+================
+
+DLPack (C)
+^^^^^^^^^^
+
+Approximate-Nearest-Neighbor Indexes provide an interface to build and search an index via a C API. [DLPack](https://github.com/dmlc/dlpack/blob/main/README.md), a tensor interface framework, is used as the standard to interact with our C API.
+
+Representing a tensor with DLPack is simple, as it is a POD struct that stores information about the tensor at runtime.
+
+.. code-block:: c
+
+    #include <dlpack/dlpack.h>
+
+    // Create data representation in host memory
+    float dataset[2][1] = {{0.2, 0.1}};
+    // copy data to device memory
+    float *dataset_dev;
+    cudaMalloc(&dataset_dev, sizeof(float) * 2 * 1);
+    cudaMemcpy(dataset_dev, dataset, sizeof(float) * 2 * 1, cudaMemcpyDefault);
+
+    // Use DLPack for representing the data as a tensor
+    DLManagedTensor dataset_tensor;
+    dataset_tensor.dl_tensor.data               = dataset;
+    dataset_tensor.dl_tensor.device.device_type = kDLCPU;
+    dataset_tensor.dl_tensor.ndim               = 2;
+    dataset_tensor.dl_tensor.dtype.code         = kDLFloat;
+    dataset_tensor.dl_tensor.dtype.bits         = 32;
+    dataset_tensor.dl_tensor.dtype.lanes        = 1;
+    int64_t dataset_shape[2]                    = {2, 1};
+    dataset_tensor.dl_tensor.shape              = dataset_shape;
+    dataset_tensor.dl_tensor.strides            = nullptr;
+
+    // free memory after use
+    cudaFree(dataset_dev);
+
+Please refer to cuVS C API [documentation](c_api.rst) to learn more.
+
+Multi-dimensional span (C++)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+cuVS is built on top of the GPU-accelerated machine learning and data mining primitives in the [RAFT](https://github.com/rapidsai/raft) library. Most of the C++ APIs in cuVS accept [mdspan](https://arxiv.org/abs/2010.06474) multi-dimensional array view for representing data in higher dimensions similar to the `ndarray` in the Numpy Python library. RAFT also contains the corresponding owning `mdarray` structure, which simplifies the allocation and management of multi-dimensional data in both host and device (GPU) memory.
+
+The `mdarray` is an owning object that forms a convenience layer over RMM and can be constructed in RAFT using a number of different helper functions:
+
+.. code-block:: c++
+
+    #include <raft/core/device_mdarray.hpp>
+
+    int n_rows = 10;
+    int n_cols = 10;
+
+    auto scalar = raft::make_device_scalar<float>(handle, 1.0);
+    auto vector = raft::make_device_vector<float>(handle, n_cols);
+    auto matrix = raft::make_device_matrix<float>(handle, n_rows, n_cols);
+
+The `mdspan` is a lightweight non-owning view that can wrap around any pointer, maintaining shape, layout, and indexing information for accessing elements.
+
+We can construct `mdspan` instances directly from the above `mdarray` instances:
+
+.. code-block:: c++
+
+    // Scalar mdspan on device
+    auto scalar_view = scalar.view();
+
+    // Vector mdspan on device
+    auto vector_view = vector.view();
+
+    // Matrix mdspan on device
+    auto matrix_view = matrix.view();
+
+Since the `mdspan` is just a lightweight wrapper, we can also construct it from the underlying data handles in the `mdarray` instances above. We use the extent to get information about the `mdarray` or `mdspan`'s shape.
+
+.. code-block:: c++
+
+    #include <raft/core/device_mdspan.hpp>
+
+    auto scalar_view = raft::make_device_scalar_view(scalar.data_handle());
+    auto vector_view = raft::make_device_vector_view(vector.data_handle(), vector.extent(0));
+    auto matrix_view = raft::make_device_matrix_view(matrix.data_handle(), matrix.extent(0), matrix.extent(1));
+
+Of course, RAFT's `mdspan`/`mdarray` APIs aren't just limited to the `device`. You can also create `host` variants:
+
+.. code-block:: c++
+
+    #include <raft/core/host_mdarray.hpp>
+    #include <raft/core/host_mdspan.hpp>
+
+    int n_rows = 10;
+    int n_cols = 10;
+
+    auto scalar = raft::make_host_scalar<float>(handle, 1.0);
+    auto vector = raft::make_host_vector<float>(handle, n_cols);
+    auto matrix = raft::make_host_matrix<float>(handle, n_rows, n_cols);
+
+    auto scalar_view = raft::make_host_scalar_view(scalar.data_handle());
+    auto vector_view = raft::make_host_vector_view(vector.data_handle(), vector.extent(0));
+    auto matrix_view = raft::make_host_matrix_view(matrix.data_handle(), matrix.extent(0), matrix.extent(1));
+
+Please refer to RAFT's `mdspan` [documentation](https://docs.rapids.ai/api/raft/stable/cpp_api/mdspan/) to learn more.
+
+
+CUDA array interface (Python)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^