Skip to content

Conversation

@AboorvaDevarajan
Copy link
Member

@AboorvaDevarajan AboorvaDevarajan commented Mar 3, 2021

ompi/group: fix proc pointer comparison in groups

To avoid checking sentinel process pointers to the original ompi_proc_t pointers compare the processes in the groups using process names.

Signed-off-by: Aboorva Devarajan abodevar@in.ibm.com

@awlauria
Copy link
Contributor

awlauria commented Mar 3, 2021

bot:aws:retest

* to NULL.
*/
static inline struct ompi_proc_t *ompi_group_dense_lookup (ompi_group_t *group, const int peer_id, const bool allocate)
static inline struct ompi_proc_t *ompi_group_dense_lookup (ompi_group_t *group, const int peer_id, unsigned char flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can still be a const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the passed value is not a pointer, I guess const can safely be ignored,

* to reach the process pointer
*/
static inline ompi_proc_t *ompi_group_get_proc_ptr (ompi_group_t *group, int rank, const bool allocate)
static inline ompi_proc_t *ompi_group_get_proc_ptr (ompi_group_t *group, int rank, unsigned char flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here - wonder if the const can still be kept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters, these arguments are const by default as they cannot be returned to the caller.

Copy link
Contributor

@awlauria awlauria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a few questions.

@bosilca
Copy link
Member

bosilca commented Mar 3, 2021

We threaded very carefully to avoid global locks, a design at the opposite to what this PR does. Moreover, creating and adding a new proc_t (from the sentinel), is an expensive operation (calling into other libraries), and always taking the global lock when manipulating the group will prevent any other proc_t operations from succeeding (even on totally unrelated processes).

I understand that due to the use of the sentinel, we need to be careful how we create and add the new proc_t to the group, but at the end we only need to protect the update operations in the hash table while allowing all reads to go unhindered.

@AboorvaDevarajan AboorvaDevarajan force-pushed the fix_group_op branch 2 times, most recently from 92ac56b to c98f4a6 Compare March 4, 2021 13:33
@AboorvaDevarajan
Copy link
Member Author

AboorvaDevarajan commented Mar 4, 2021

We threaded very carefully to avoid global locks, a design at the opposite to what this PR does. Moreover, creating and adding a new proc_t (from the sentinel), is an expensive operation (calling into other libraries), and always taking the global lock when manipulating the group will prevent any other proc_t operations from succeeding (even on totally unrelated processes).

I understand that due to the use of the sentinel, we need to be careful how we create and add the new proc_t to the group, but at the end we only need to protect the update operations in the hash table while allowing all reads to go unhindered.

Thanks for the review George,

I guess having a lock only during the updates in the proc hash-table might not be sufficient here, because during an MPI group operation (eg, MPI_Group_union), we have to iterate through the proc pointers in each group to get the overlapping set.

The snippet pasted below does the comparison of two groups (group1 and group2) during MPI_Group_union, where each proc pointers in one group is compared with other to check if it is equal, but during the execution of the inner loop if a thread from the same process updates the proc pointer (because of sentinels) this will result in inconsistent / erroneous comparison result.

https://github.com/open-mpi/ompi/blob/master/ompi/group/group_plist.c#L43

    for (int proc1 = 0 ; proc1 < group1->grp_proc_count ; ++proc1) {
        proc1_pointer = ompi_group_get_proc_ptr_raw (group1, proc1);  

        /* check to see if this proc is in group2 */
        for (int proc2 = 0 ; proc2 < group2->grp_proc_count ; ++proc2) {
            proc2_pointer = ompi_group_get_proc_ptr_raw (group2, proc2);
            if( proc1_pointer == proc2_pointer ) {
                rc = opal_bitmap_set_bit (bitmap, proc2);
                if (OPAL_SUCCESS != rc) {
                    return rc;
                }
                ++overlap_count;

                break;
            }
        }  /* end proc1 loop */
    }  /* end proc loop */

Here is a snippet from the test case used to identify the issue,
https://raw.githubusercontent.com/AboorvaDevarajan/mpi-tests/main/group_mt.c

void *
thread_exec_fn(void *args)
{
    int res;
    MPI_Comm the_comm;
    the_comm = MPI_COMM_WORLD;
    MPI_Group wgroup, group;
    MPI_Comm_group(the_comm, &wgroup);
    MPI_Group_union(wgroup, wgroup, &group);
    MPI_Group_compare(wgroup, group, &res);
    if (res != MPI_IDENT) {
        printf("not ident : %d\n", res);
        exit(1);
    }
	return 0;
}

When the above test is launched with 80 procs, even when both the groups passed to MPI_Group_union are the same, the overlap_count is 78 instead of 80 which is due to above race condition.

And similar group comparison code block is present in multiple group operations which will yield the same inconsistent result. If a lock is taken just while updating the proc pointers in the proc hash-table the issue will still persist.

Please let me know your suggestions.

Thanks.

@bosilca
Copy link
Member

bosilca commented Mar 4, 2021

I saw your issue only after reviewing the code, I would have had a slightly different review based on the code you provided. The real issue with the code is that we are resolving the ompi_proc_t when we should not, because at the end the only information we need is the process name. And we have this in the sentinel, without any need to lock global structures.

Thus, we could simply change the way we compare groups, and this would make your example work. I guess we can build a case where multiple threads need to resolve the same ompi_proc_t from two different groups/communicators at the same moment, which will bring us back to the need for some locking mechanism, but I need to revisit the PML code again in order to validate this assumption.

@AboorvaDevarajan
Copy link
Member Author

@bosilca Thanks for the suggestion, yes just by changing the way we do the comparison of group by using the process name as here AboorvaDevarajan@3399031 instead of process pointers seems to be fixing this specific issue. I will do some review and push the updates.

And also as you pointed out, will try to see if there will be any other race conditions during group allocation.

@bosilca
Copy link
Member

bosilca commented Mar 9, 2021

I looked at the new version and I think the code is cleaner. There might be an impact on the performance, because of the calls to opal_compare_proc, but that could be the topic for another time.

@AboorvaDevarajan
Copy link
Member Author

@bosilca I have pushed the changes, since we are just concerned about the equality while comparing the processes here, I'm just checking if it is pids and vpids are equal directly without using the opal_compare_proc function. This avoids some conditional branches and also the indirect function pointer jump.

@bosilca
Copy link
Member

bosilca commented Mar 10, 2021

Sorry, I did not expected you to do this last change, because I don't think we should do it. Long story short, the naming is something that we are supposed to inherit from the RTE, and we should always manipulate it via the accessors provided. I let the other chime in, but I think everything is good in this PR, with the exception of the removal of the use of opal_compare_proc.

To avoid checking sentinel process pointers to the original `ompi_proc_t`
pointers compare the processes in the groups using process names.

Signed-off-by: Aboorva Devarajan <abodevar@in.ibm.com>
@AboorvaDevarajan
Copy link
Member Author

sure George, Thanks. reverted back. I guess comparison function can be separately optimized.

@awlauria
Copy link
Contributor

bot:aws:retest

@awlauria
Copy link
Contributor

bot:aws:retest

let's try again...

@awlauria awlauria merged commit 2edb461 into open-mpi:master Mar 11, 2021
@gpaulsen
Copy link
Member

gpaulsen commented Mar 15, 2021

v4.0.x: #8610
v4.1.x: #8609
v5.0.x: #8592

@awlauria
Copy link
Contributor

@AboorvaDevarajan can you please cherry-pick this to v5.0.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants