Consider the MWE:
import pytato as pt
size = 4
rank = 2
send_rank = 3
recv_rank = 1
x = pt.make_placeholder("x", 10, "float64")
recv = pt.make_distributed_recv(
src_rank=recv_rank, comm_tag=42,
shape=x.shape, dtype=x.dtype)
y = x + recv
send1 = pt.staple_distributed_send(
x, dest_rank=send_rank, comm_tag=43,
stapled_to=y)
send2 = pt.staple_distributed_send(
send1 + recv, dest_rank=send_rank, comm_tag=44,
stapled_to=send1)
out = pt.make_dict_of_named_arrays({"out": send1 + send2})
parts = pt.find_distributed_partition(out)
pt.show_dot_graph(parts)
Notice how there is only one receive-node, but the partition is emitted as --

This is definitely a bug in pt.find_distributed_partition's _PartIdTagAssigner which introduces another receive-node.
I'm not too sure about the implementation in execute_partition, but I guess this could lead to deadlocks?