Skip to content

UCX shutdown sequence is crashing #11087

@maor-lb

Description

@maor-lb

We are trying to implement shudwon logic for listener but it seems we are hitting segfault with UCX 1.19

Our Shutdown Sequence:
// 1. Set application-level shutdown flag
callback_shutdown = true;

// 2. Drain all pending callbacks from progress queue
while (ucp_worker_progress(worker) > 0) { }

// 3. Destroy the listener
ucp_listener_destroy(listener);

But this does not prevent from new connection to arrive between 2 and 3 , which result in crash
What we saw is the ucp_cm_server_conn_request_cb can be called .

and in err flow it calls

err_reject:
      ucp_listener->conn_reqs--;
      status = uct_listener_reject(listener, conn_request);  // CRASH!
      // ^^^^^^^^ UCT listener was already destroyed by uct_listener_destroy()! 

it's not clear what is the sequence to solve this race

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions