Skip to content

Latest commit

 

History

History
126 lines (89 loc) · 3.62 KB

README.md

File metadata and controls

126 lines (89 loc) · 3.62 KB

Install NVSHMEM

Important notices

This project is neither sponsored nor supported by NVIDIA.

Use of NVIDIA NVSHMEM is governed by the terms at NVSHMEM Software License Agreement.

Prerequisites

  1. GDRCopy (v2.4 and above recommended) is a low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology, and it requires kernel module installation with root privileges.

  2. Hardware requirements

Installation procedure

1. Install GDRCopy

GDRCopy requires kernel module installation on the host system. Complete these steps on the bare-metal host before container deployment:

Build and installation

git clone https://github.com/NVIDIA/gdrcopy
cd gdrcopy
make -j$(nproc)
sudo make prefix=/opt/gdrcopy install

Kernel module installation

cd packages
CUDA=/path/to/cuda ./build-deb-packages.sh
sudo dpkg -i gdrdrv-dkms_2.4-4_amd64.deb \
             libgdrapi_2.4-4_amd64.deb \
             gdrcopy-tests_2.4-4_amd64.deb \
             gdrcopy_2.4-4_amd64.deb
sudo ./insmod.sh  # Load kernel modules on bare-metal system

Container environment notes

For containerized environments:

  1. Host: keep kernel modules loaded (gdrdrv)
  2. Container: install DEB packages without rebuilding modules:
    sudo dpkg -i gdrcopy_2.4-4_amd64.deb \
                 libgdrapi_2.4-4_amd64.deb \
                 gdrcopy-tests_2.4-4_amd64.deb

Verification

gdrcopy_copybw  # Should show bandwidth test results

2. Acquiring NVSHMEM source code

Download NVSHMEM v3.1.7 from the NVIDIA NVSHMEM Archive.

3. Apply our custom patch

Navigate to your NVSHMEM source directory and apply our provided patch:

git apply /path/to/deep_ep/dir/third-party/nvshmem.patch

4. Configure NVIDIA driver

Enable IBGDA by modifying /etc/modprobe.d/nvidia.conf:

options nvidia NVreg_EnableStreamMemOPs=1 NVreg_RegistryDwords="PeerMappingOverride=1;"

Update kernel configuration:

sudo update-initramfs -u
sudo reboot

For more detailed configurations, please refer to the NVSHMEM Installation Guide.

5. Build and installation

The following example demonstrates building NVSHMEM with IBGDA support:

CUDA_HOME=/path/to/cuda && \
GDRCOPY_HOME=/path/to/gdrcopy && \
NVSHMEM_SHMEM_SUPPORT=0 \
NVSHMEM_UCX_SUPPORT=0 \
NVSHMEM_USE_NCCL=0 \
NVSHMEM_IBGDA_SUPPORT=1 \
NVSHMEM_PMIX_SUPPORT=0 \
NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
NVSHMEM_USE_GDRCOPY=1 \
cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/path/to/your/dir/to/install

cd build
make -j$(nproc)
make install

Post-installation configuration

Set environment variables in your shell configuration:

export NVSHMEM_DIR=/path/to/your/dir/to/install  # Use for DeepEP installation
export LD_LIBRARY_PATH="${NVSHMEM_DIR}/lib:$LD_LIBRARY_PATH"
export PATH="${NVSHMEM_DIR}/bin:$PATH"

Verification

nvshmem-info -a # Should display details of nvshmem