deepEP run two nodes, each nodes equipped 1 A100+1CX6, hit assert.

Hi ,

   I have two nodes, each node have 1 A100 + 1 CX6, 
   just want to try internode.py between those two nodes, 
  python3 -m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=0  --master_addr=192.168.3.40 --master_port=12345  tests/test_internode.py
   python3 -m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=1  --master_addr=192.168.3.40 --master_port=12345  tests/test_internode.py

   But hit the issue as below, seems DeepEP only can run 8 GPU setups, can you please confirm?  And if I need run deepep on my setups, 
   just modify the macro NUM_MAX_NVL_PEERS=1 should be enough? 
   self.runtime = deep_ep_cpp.Buffer(self.rank, self.group_size, num_nvl_bytes, num_rdma_bytes, low_latency_mode)
RuntimeError: Failed: Assertion error /home1/DeepEP/csrc/deep_ep.cpp:32 'num_ranks > NUM_MAX_NVL_PEERS or low_latency_mode'

   Looking forward to see your feedback and thanks a  lot.
   Qingsong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepEP run two nodes, each nodes equipped 1 A100+1CX6, hit assert. #104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deepEP run two nodes, each nodes equipped 1 A100+1CX6, hit assert. #104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions