Skip to content

v3.0.x vader on PPC (wmb moved to end of set_header) #4937

Closed
@markalle

Description

@markalle

This is for the v3.0.x branch.

I have some tests that fail with vader on PPC (pass on x86 due to more generous memory ordering rules there). It looks to me like one of the wmb calls has been moved. I don't have much knowledge of what vader's doing, but I'm guessing the use of the function mca_btl_vader_fbox_set_header() should boil down to

    set data
    wmb
    set header that says the data is there

but the fbox_set_header function has its wmb() call at the bottom so I think it's probably ending up as

    set data
    set header that says the data is there
    wmb

which wouldn't ensure the data is visible to the reader.

I can hit the problem using the below "maxsoak.c" testcase as
mpicc -o x maxsoak.c
mpirun -np 6 -mca pml ob1 -mca btl vader,self ./x
and the testcase will detect corruption.

For me the failure message from the testcase ends up something like

4: Invalid data: Act:525138 Exp:850 Peer:2 Datasize:32 Mult:50
I don't know the maxsoak.c testcase well, it's just something I know we didn't write so I don't have to go through any special approval process to share that code:
https://gist.github.com/markalle/a1c203297cb6af22a3fb5c24e62b2ba3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions