-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct some operations modifying status.MPI_ERROR when it is disallowed #9128
Correct some operations modifying status.MPI_ERROR when it is disallowed #9128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fix ucx pml in a separate PR?
I'll add it, I also found some similar issues in Portals, I can fix but not test as I don't have access to hardware. |
by the standard. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> Cleanup Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
bffb449
to
eba0892
Compare
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
eba0892
to
3797220
Compare
bot:retest |
@@ -67,7 +67,7 @@ int MPI_Iprobe(int source, int tag, MPI_Comm comm, int *flag, MPI_Status *status | |||
if (MPI_PROC_NULL == source) { | |||
*flag = 1; | |||
if (MPI_STATUS_IGNORE != status) { | |||
*status = ompi_request_empty.req_status; | |||
OMPI_COPY_STATUS(status, ompi_request_empty.req_status, false); | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the memcheck below ? Why are we checking that the user passes a pointer to a struct of a size at least equal to MPI_Status ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I figured out what is going on. According to section 3.2.5 the MPI_ERROR might not be set for calls working on a single request. However, the MPI standard does not prohibit this, and on current architectures is faster to flush entire line of cache than having a branch with multiple instructions on the critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca MPI-4.0 3.2.5 says:
In general, message-passing calls do not modify the value of the error code field of status variables. This field may be updated only by the functions in Section 3.7.5 which return multiple statuses. The field is updated if and only if such function returns with an error code of MPI_ERR_IN_STATUS.
This, to me, unambiguously says that MPI only updates MPI_ERROR from MPI_[TEST|WAIT][SOME|ALL|ANY].
I agree with @bosilca that this might mean losing a potentially useful cache line optimization.
That being said, MPI-4.0 3.7.3 says:
An empty status is a status which is set to return tag = MPI_ANY_TAG, source = MPI_ANY_SOURCE, error = MPI_SUCCESS...
So it seems like there's at least some cases of contradiction here: if we're supposed to set the empty status but it's a single completion function (e.g., MPI_TEST / MPI_WAIT), do we set MPI_ERROR or not?
Bottom line(s):
- I think that in the cases where we're not setting the empty status, it's unambiguous: we should only be setting status.MPI_ERROR in the multi-completion functions MPI_[TEST|WAIT][SOME|ALL|ANY].
- In the cases where we're setting the empty status from a single completion function, it's unclear what we're supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says "may" which gives us some latitude of updating it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are parsing the text wrong. It has two operative sentences:
This field may be updated only by the functions in Section 3.7.5 which return multiple statuses.
This says that MPI_[TEST|WAIT][ALL|ANY|SOME] may update the field. And only those functions.
The field is updated if and only if such function returns with an error code of MPI_ERR_IN_STATUS.
This says that status.MPI_ERROR will be set only when MPI_[TEST|WAIT][ALL|ANY|SOME] returns MPI_ERROR_IN_STATUS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, he first part excludes everything except what is allowed by the sentences that follows. Still strange for the standard to be that precise on how to handle the error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, he first part excludes everything except what is allowed by the sentences that follows. Still strange for the standard to be that precise on how to handle the error code.
Totally agree. Unfortunately, we're stuck with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we bring this to the forum, apparently the only rationale is to 'help implementations go faster'. which it definitely doesn't do, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably wouldn't hurt.
This is in response to issue #9032
Signed-off-by: Aurelien Bouteiller bouteill@icl.utk.edu