Skip to content

EFA provider: CUDA memory registration fails with EFAULT — missing dmabuf path for FI_HMEM_CUDA #12019

@sara4dev

Description

@sara4dev

Summary

The EFA provider's efa_mr_reg_ibv_mr() in prov/efa/src/efa_mr.c does not attempt ibv_reg_dmabuf_mr for CUDA device memory, unlike Neuron and ROCr. It falls through to plain ibv_reg_mr() with the GPU virtual address, which returns EFAULT because the kernel EFA driver cannot resolve GPU addresses without dmabuf.

There is already a TODO in the code acknowledging this gap (in the aws/libfabric fork, tag v2.4.0amzn1.0):

/*
 * TODO: need such fallback for cuda as well when
 * FI_CUDA_API_PERMITTED is true
 */
if (efa_mr_is_neuron(efa_mr) || efa_mr_is_rocr(efa_mr)) {

Environment

  • Platform: AWS GB200 (p6e-gb200.36xlarge), aarch64 Grace Blackwell
  • EFA SDK: v1.47.0, libfabric v2.4.0amzn1.0
  • EFA kernel module: 2.15.3g
  • NVIDIA driver: 580.95.05 (GPU Operator)
  • CUDA: 13.x (dmabuf support confirmed: cuda dmabuf support status: 1)
  • Workload: NIXL disaggregated KV cache transfer using the LIBFABRIC backend with FI_HMEM_CUDA

Reproduction

Register a large CUDA buffer (~11 GB KV cache) via fi_mr_regattr() with attr->iface = FI_HMEM_CUDA on the EFA provider.

Error output:

libfabric::efa:mr:efa_mr_reg_impl():893<warn> Unable to register MR of 11279546368 bytes: Bad address, flags 0
libfabric::efa:mr:efa_mr_regattr():1060<warn> Unable to register MR: Bad address

Root Cause

In efa_mr_reg_ibv_mr() (line ~549 of prov/efa/src/efa_mr.c), the dmabuf path via ofi_hmem_get_dmabuf_fd() + ibv_reg_dmabuf_mr() is only attempted for Neuron and ROCr interfaces. For CUDA, execution falls through to the default ibv_reg_mr() at the end of the function, which passes the GPU virtual address directly. The kernel returns EFAULT because GPU memory cannot be pinned via standard get_user_pages().

The efa_nv_peermem kernel module does not intercept this path — it is not an ib_core peer memory client in the upstream kernel sense.

Modern CUDA drivers (12.x+) support cuMemGetHandleForAddressRange() for dmabuf export, and libfabric's cuda_get_dmabuf_fd() already works (confirmed by cuda_hmem_detect_dmabuf_support() returning status 1 during init). The infrastructure is all in place; the condition just needs to include CUDA.

Fix

Add efa_mr_is_cuda(efa_mr) to the existing dmabuf condition:

// Before (line ~549):
if (efa_mr_is_neuron(efa_mr) || efa_mr_is_rocr(efa_mr)) {

// After:
if (efa_mr_is_neuron(efa_mr) || efa_mr_is_rocr(efa_mr) ||
    efa_mr_is_cuda(efa_mr)) {

This makes the EFA provider call ofi_hmem_get_dmabuf_fd(FI_HMEM_CUDA, ...) to obtain a dmabuf fd, then use ibv_reg_dmabuf_mr() for the registration. If dmabuf is not supported, it falls back to ibv_reg_mr() (same as the existing Neuron/ROCr behavior).

We have validated this fix on the GB200 platform. With the one-line change, 11 GB VRAM buffers register successfully and NIXL disaggregated inference runs end-to-end:

libfabric_rail_manager.cpp:480] Registered memory on rail 2 (mr=0x28653320, key=7340312)
libfabric_backend.cpp:811] Rail Manager successfully registered VRAM memory on 1 rails with GPU Direct RDMA support

Note: This issue is specific to the EFA provider in the aws/libfabric fork (issues are disabled on that repo). The EFA provider code is maintained by AWS. CC @shijin-aws @shuozhang-amzn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions