deepseek-ai · sphish · Jun 10, 2025 · Jun 10, 2025 · Jun 10, 2025
diff --git a/third-party/README.md b/third-party/README.md
@@ -8,74 +8,27 @@
 
 ## Prerequisites
 
-1. [GDRCopy](https://github.com/NVIDIA/gdrcopy) (v2.4 and above recommended) is a low-latency GPU memory copy library based on NVIDIA GPUDirect RDMA technology, and *it requires kernel module installation with root privileges.*
-
-2. Hardware requirements
-   - GPUDirect RDMA capable devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
+Hardware requirements:
+   - GPUs inside one node needs to be connected by NVLink
+   - GPUs across different nodes needs to be connected by RDMA devices, see [GPUDirect RDMA Documentation](https://docs.nvidia.com/cuda/gpudirect-rdma/)
    - InfiniBand GPUDirect Async (IBGDA) support, see [IBGDA Overview](https://developer.nvidia.com/blog/improving-network-performance-of-hpc-systems-using-nvidia-magnum-io-nvshmem-and-gpudirect-async/)
    - For more detailed requirements, see [NVSHMEM Hardware Specifications](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#hardware-requirements)
 
 ## Installation procedure
 
-### 1. Install GDRCopy
-
-GDRCopy requires kernel module installation on the host system. Complete these steps on the bare-metal host before container deployment:
-
-#### Build and installation
-
-```bash
-wget https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.4.4.tar.gz
-cd gdrcopy-2.4.4/
-make -j$(nproc)
-sudo make prefix=/opt/gdrcopy install
-```
-
-#### Kernel module installation
-
-After compiling the software, you need to install the appropriate packages based on your Linux distribution.
-For instance, using Ubuntu 22.04 and CUDA 12.3 as an example:
-
-```bash
-pushd packages
-CUDA=/path/to/cuda ./build-deb-packages.sh
-sudo dpkg -i gdrdrv-dkms_2.4.4_amd64.Ubuntu22_04.deb \
-             libgdrapi_2.4.4_amd64.Ubuntu22_04.deb \
-             gdrcopy-tests_2.4.4_amd64.Ubuntu22_04+cuda12.3.deb \
-             gdrcopy_2.4.4_amd64.Ubuntu22_04.deb
-popd
-sudo ./insmod.sh  # Load kernel modules on the bare-metal system
-```
-
-#### Container environment notes
-
-For containerized environments:
-1. Host: keep kernel modules loaded (`gdrdrv`)
-2. Container: install DEB packages *without* rebuilding modules:
-   ```bash
-   sudo dpkg -i gdrcopy_2.4.4_amd64.Ubuntu22_04.deb \
-                libgdrapi_2.4.4_amd64.Ubuntu22_04.deb \
-                gdrcopy-tests_2.4.4_amd64.Ubuntu22_04+cuda12.3.deb
-   ```
-
-#### Verification
-
-```bash
-gdrcopy_copybw  # Should show bandwidth test results
-```
-
-### 2. Acquiring NVSHMEM source code
+### 1. Acquiring NVSHMEM source code
 
 Download NVSHMEM v3.2.5 from the [NVIDIA NVSHMEM OPEN SOURCE PACKAGES](https://developer.nvidia.com/downloads/assets/secure/nvshmem/nvshmem_src_3.2.5-1.txz).
 
-### 3. Apply our custom patch
+### 2. Apply our custom patch
 
 Navigate to your NVSHMEM source directory and apply our provided patch:
 
 ```bash
 git apply /path/to/deep_ep/dir/third-party/nvshmem.patch
 ```
 
-### 4. Configure NVIDIA driver
+### 3. Configure NVIDIA driver (required by inter-node communication)
 
 Enable IBGDA by modifying `/etc/modprobe.d/nvidia.conf`:
 
@@ -92,26 +45,31 @@ sudo reboot
 
 For more detailed configurations, please refer to the [NVSHMEM Installation Guide](https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html).
 
-### 5. Build and installation
+### 4. Build and installation
 
-The following example demonstrates building NVSHMEM with IBGDA support:
+DeepEP uses NVLink for intra-node communication and IBGDA for inter-node communication. All the other features are disabled to reduce the dependencies.
 
 ```bash
-CUDA_HOME=/path/to/cuda \
-GDRCOPY_HOME=/path/to/gdrcopy \
-NVSHMEM_SHMEM_SUPPORT=0 \
-NVSHMEM_UCX_SUPPORT=0 \
-NVSHMEM_USE_NCCL=0 \
-NVSHMEM_MPI_SUPPORT=0 \
-NVSHMEM_IBGDA_SUPPORT=1 \
-NVSHMEM_PMIX_SUPPORT=0 \
-NVSHMEM_TIMEOUT_DEVICE_POLLING=0 \
-NVSHMEM_USE_GDRCOPY=1 \
-cmake -S . -B build/ -DCMAKE_INSTALL_PREFIX=/path/to/your/dir/to/install
-
-cd build
-make -j$(nproc)
-make install
+export CUDA_HOME=/path/to/cuda
+# disable all features except IBGDA
+export NVSHMEM_IBGDA_SUPPORT=1
+
+export NVSHMEM_SHMEM_SUPPORT=0
+export NVSHMEM_UCX_SUPPORT=0
+export NVSHMEM_USE_NCCL=0
+export NVSHMEM_PMIX_SUPPORT=0
+export NVSHMEM_TIMEOUT_DEVICE_POLLING=0
+export NVSHMEM_USE_GDRCOPY=0
+export NVSHMEM_IBRC_SUPPORT=0
+export NVSHMEM_BUILD_TESTS=0
+export NVSHMEM_BUILD_EXAMPLES=0
+export NVSHMEM_MPI_SUPPORT=0
+export NVSHMEM_BUILD_HYDRA_LAUNCHER=0
+export NVSHMEM_BUILD_TXZ_PACKAGE=0
+export NVSHMEM_TIMEOUT_DEVICE_POLLING=0
+
+cmake -G Ninja -S . -B build -DCMAKE_INSTALL_PREFIX=/path/to/your/dir/to/install
+cmake --build build/ --target install
 ```
 
 ## Post-installation configuration

diff --git a/third-party/nvshmem.patch b/third-party/nvshmem.patch
@@ -435,3 +435,40 @@ index c89f408..f99018a 100644
 
      return NVSHMEMX_ERROR_INTERNAL;
  }
+
+
+From 099f608fcd9a1d34c866ad75d0af5d02d2020374 Mon Sep 17 00:00:00 2001
+From: Kaichao You <youkaichao@gmail.com>
+Date: Tue, 10 Jun 2025 00:35:03 -0700
+Subject: [PATCH] remove gdrcopy dependency
+
+---
+ src/modules/transport/ibgda/ibgda.cpp | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+diff --git a/src/modules/transport/ibgda/ibgda.cpp b/src/modules/transport/ibgda/ibgda.cpp
+index ef325cd..16ee09c 100644
+--- a/src/modules/transport/ibgda/ibgda.cpp
++++ b/src/modules/transport/ibgda/ibgda.cpp
+@@ -406,6 +406,7 @@ static size_t ibgda_get_host_page_size() {
+     return host_page_size;
+ }
+
++#ifdef NVSHMEM_USE_GDRCOPY
+ int nvshmemt_ibgda_progress(nvshmem_transport_t t) {
+     nvshmemt_ibgda_state_t *ibgda_state = (nvshmemt_ibgda_state_t *)t->state;
+     int n_devs_selected = ibgda_state->n_devs_selected;
+@@ -459,6 +460,11 @@ int nvshmemt_ibgda_progress(nvshmem_transport_t t) {
+     }
+     return 0;
+ }
++#else
++int nvshmemt_ibgda_progress(nvshmem_transport_t t) {
++    return NVSHMEMX_ERROR_NOT_SUPPORTED;
++}
++#endif
+
+ int nvshmemt_ibgda_show_info(struct nvshmem_transport *transport, int style) {
+     NVSHMEMI_ERROR_PRINT("ibgda show info not implemented");
+--
+2.34.1