Skip to content

Conversation

@wukaixingxp
Copy link

Implements a new transport layer using CUDA IPC (Inter-Process Communication) for direct GPU-to-GPU memory sharing within a single node.

This enables zero-copy transfers between processes by sharing GPU memory handles instead of copying data through CPU memory.

Key features:

  • Direct GPU memory access using CUDA IPC handles
  • Eliminates GPU->CPU->GPU copies for intra-node transfers
  • Leverages NVLink/PCIe P2P when available
  • Automatic fallback for non-CUDA tensors

Implements a new transport layer using CUDA IPC (Inter-Process
Communication) for direct GPU-to-GPU memory sharing within a single node.

This enables zero-copy transfers between processes by sharing GPU memory
handles instead of copying data through CPU memory.

Key features:
- Direct GPU memory access using CUDA IPC handles
- Eliminates GPU->CPU->GPU copies for intra-node transfers
- Leverages NVLink/PCIe P2P when available
- Automatic fallback for non-CUDA tensors

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 6, 2026
@wukaixingxp wukaixingxp marked this pull request as ready for review February 10, 2026 19:19
@amirafzali
Copy link
Member

a general question is when do we find cuda IPC useful? my understanding is it would only be single host synchronous weight sync cases. do you see this as a common usage case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants