Skip to content

Conversation

@hoopoepg
Copy link
Contributor

  • added UCX progress priority

Signed-off-by: Sergey Oblomov sergeyo@mellanox.com

@hoopoepg
Copy link
Contributor Author

@yosefe could you review?

if (status != UCS_INPROGRESS) {
mca_pml_ucx_set_recv_status_safe(mpi_status, status, &info);
return OMPI_SUCCESS;
while (1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reuse MCA_COMMON_UCX_WAIT_LOOP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added macro for progress loop

if (ucp_msg != NULL) {
mca_pml_ucx_set_recv_status_safe(mpi_status, UCS_OK, &info);
return OMPI_SUCCESS;
while (1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also update iprobe/improbe/mprobe
the non-blocking versions should trigger opal_progress 1/progress_iterations of the times [if not found a match]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get - add loop for UCX/opal progress calling into non-blocking calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today non-blocking probe calls opal_progress
we want it to call (g_count % progress_iterations) ? ucx_progress : opal_progress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will require additional static counter inside of routine, is it ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the counter does not even have to be thread-safe for correctness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hoopoepg
Copy link
Contributor Author

bot:mellanox:retest

@hoopoepg
Copy link
Contributor Author

@yosefe could you look?

mca_pml_ucx_set_recv_status_safe(mpi_status, UCS_OK, &info);
} else {
opal_progress();
ucx_progress_cntr = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make it unsigned and let it wrap around? then no need to reset to 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset is used to count "progress iterations" after every completed request to avoid unpredictable calls of opal_progress.

do you think it is bad idea?

int mca_pml_ucx_iprobe(int src, int tag, struct ompi_communicator_t* comm,
int *matched, ompi_status_public_t* mpi_status)
{
static int ucx_progress_cntr = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's just inside this function, we rename it to "progress_count"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to progress_count

int *matched, struct ompi_message_t **message,
ompi_status_public_t* mpi_status)
{
static int ucx_progress_cntr = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static unsigned progress count = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

/* progress loop to allow call UCX/opal progress */
/* used C99 for-statement variable initialization */
#define MCA_COMMON_UCX_PROGRESS_LOOP(_worker) \
for (int iter = 0;; (++iter % opal_common_ucx.progress_iterations) ? \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned iter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

ucs_status_t status; \
/* call UCX progress */ \
MCA_COMMON_UCX_PROGRESS_LOOP(_worker) { \
if (UCS_INPROGRESS != (status = opal_common_ucx_request_status(_request))) { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's split this to 2 lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*matched = 1;
mca_pml_ucx_set_recv_status_safe(mpi_status, UCS_OK, &info);
ucx_progress_cntr = 0;
progress_count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

*matched = 1;
mca_pml_ucx_set_recv_status_safe(mpi_status, UCS_OK, &info);
ucx_progress_cntr = 0;
progress_count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Sergey Oblomov added 4 commits August 27, 2018 09:50
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- refactoring of opal/UCX progress calls

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- renamed of internal variable names
- used unsigned datatypes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
@hoopoepg hoopoepg force-pushed the topic/optimize-blocked-calls branch from 42bf169 to c201c0a Compare August 27, 2018 07:08
@yosefe yosefe merged commit 68206a5 into open-mpi:master Aug 29, 2018
@yosefe
Copy link
Contributor

yosefe commented Aug 29, 2018

@hoopoepg need this one also in v4.0.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants