-
Notifications
You must be signed in to change notification settings - Fork 127
Description
I'm running two systems (sharing the same DRAM) on a single chip, each with its own dedicated network interface card for wireup. I'm then modifying a POSIX template to explore the possibility of shared memory across nodes. First, I changed it to INTERNODE to enable cross-node functionality, and then I implemented my own memory allocation functions. Currently, pt2pt is working perfectly. However, when I tried to test allreduce, I found that all RDMA operations were being simulated using AM (and it appears to be done by UCC, since UCX has reported lane availability). Why is this happening? Here is the information I can provide.
The ucx log(I added):
[1764656842.337472] [a:31893:0] mpool.c:281 UCX DEBUG mpool tl_ucp_req_mp: allocated chunk 0x2ae862cb44 of 6228 bytes with 8 elements
[1764656842.337930] [a:31893:0] ucp_ep.c:408 UCX DEBUG created ep 0x3f93ede000 to <no debug data> from api call
[1764656842.338350] [a:31893:0] ucp_ep.c:2954 UCX WARN Lane 0 iface flags: PUT_SHORT=YES PUT_BCOPY=YES GET_SHORT=YES GET_BCOPY=YES
[1764656842.338376] [a:31893:0] ucp_ep.c:2963 UCX WARN *************put_short: 4294967295, iface->put_max_short: 4294967295*********
[1764656842.338401] [a:31893:0] ucp_ep.c:3011 UCX WARN RMA lane 0: max_put_short=4294967295, max_get_short=4294967295
[1764656842.338421] [a:31893:0] ucp_worker.c:1892 UCX WARN !!!!!!!!!!!!!!!!!!!!!!!!rma_emul: 0!!!!!!!!!!!!!!!!!!!, rma_lanes_map = 1
[1764656842.338442] [a:31893:0] ucp_worker.c:1905 UCX INFO UCC_UCP_CONTEXT intra-node cfg#0 tag(mytest/memory) rma(mytest/memory) amo(mytest/memory) am(mytest/memory)
And I got:
[1764656842.339034] [a:31893:0] +----------------------------------+-------------------------------------------------------------+
[1764656842.339054] [a:31893:0] | UCC_UCP_CONTEXT intra-node cfg#0 | remote memory write by ucp_put* from host memory to host |
[1764656842.339063] [a:31893:0] +----------------------------------+-----------------------------------------------+-------------+
[1764656842.339072] [a:31893:0] | 0..inf | software emulation | mytest/memory |
[1764656842.339079] [a:31893:0] +----------------------------------+-----------------------------------------------+-------------+
I suspect it's because the intra-node flag is printed here, but I have indeed set up the inter-node and they can communicate normally (if the intra-node flag is displayed, MPI+UCC+UCX cannot organize the two hosts).
Can anyone offer some advice?