Skip to content

deepEP run two nodes, each nodes equipped 1 A100+1CX6, hit assert. #104

Open
@Knight-Cai

Description

@Knight-Cai

Hi ,

I have two nodes, each node have 1 A100 + 1 CX6,
just want to try internode.py between those two nodes,
python3 -m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=0 --master_addr=192.168.3.40 --master_port=12345 tests/test_internode.py
python3 -m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=1 --master_addr=192.168.3.40 --master_port=12345 tests/test_internode.py

But hit the issue as below, seems DeepEP only can run 8 GPU setups, can you please confirm? And if I need run deepep on my setups,
just modify the macro NUM_MAX_NVL_PEERS=1 should be enough?
self.runtime = deep_ep_cpp.Buffer(self.rank, self.group_size, num_nvl_bytes, num_rdma_bytes, low_latency_mode)
RuntimeError: Failed: Assertion error /home1/DeepEP/csrc/deep_ep.cpp:32 'num_ranks > NUM_MAX_NVL_PEERS or low_latency_mode'

Looking forward to see your feedback and thanks a lot.
Qingsong

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions