-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be clear about when MPI_ERROR is set, from the start #814
Comments
This also came up recently in pmodels/mpich#6751. We would support this change. cc @hzhou. |
@devreal It is incorrect that I think the rationale -
makes sense. Users in general don't need to check the status However, I am strongly against the interpretation that implementations must not touch the
The if and only if seems to force this interpretation. In reality, it is often simpler for implementations to blindly update the entire statuses struct. To untouch certain field, e.g. Thus, I am against requiring implementations to always set the |
I assumed the error case:
Maybe this was true 20 years ago, it certainly isn't today. The potential saving (one load, one store) from allowing it is irrelevant on today's architectures so there is no rationale for allowing such "optimizations" in a public API.
That is at the heart of this issue: we should never leaves values undefined in our return values. They can easily cause undefined behavior and I hope we can agree that that is a bad idea. The trade-off between avoiding undefined behavior and saving two instructions should be clear. "Premature optimization is the root of all evil." |
Also #522 |
Is it only two instructions when MPI_{Wait,Test}all return MPI_SUCCESS and you require the implementation to set the MPI_ERROR field of every single status object to MPI_SUCCESS? If we are so concerned about uninitialized data, shouldn't we make underflow illegal? Or should we require all buffers passed to receive functions to be initialized to ensure that users can never read uninitialized data? No, because we have MPI_Get_count and the user is expected to determine how much of the buffer has been written, and not read more than that. In the same manner, it is clear when the fields of status objects can/should be read. There is no serious threat from UB unless users write obviously incorrect code. |
OK, one instruction per status, in the worst case. In the best case, the compiler is able to use wider move instructions if we don't have to poke holes into the status object. See OMPI for example: https://github.com/open-mpi/ompi/blob/main/ompi/request/request.h#L219
There is an important difference here: the buffer is updated the same way no matter how you send or receive your data. It doesn't matter whether you use MPI_Recv or MPI_Irecv, MPI_Send or MPI_Isend. The problem is that the resulting value of
I went through the standard to try and collect all the places where we define the semantics of Section 3.2.5, at the top:
Similar verbiage follows for Fortran. So the Not so fast! Later in this section we say:
So this contradicts the first paragraph: Section 3.7.3:
So now we say that the value of the Section 3.7.5 (
Now we know. There is more text explaining these rules in different colors for What I am proposing is that we cut out all the special cases littered throughout the standard and stick to the semantics at the very top of 3.2.5: the Save some trees! |
Just so we're talking about what counts in terms of overhead, it isn't instructions - its memory references, especially writes. Getting a single byte out to memory, while it might happen in the background, can impact subsequent operations. |
The status structure in OMPI is 24B (12B public, 12B private), The error field is 4B, or 16% of the total structure size. In each function call, we're writing several kB of data to memory. Most of that (and likely all of the status) will never make its way out of a cache and then its part of a cache line anyway. The overhead of storing 4 additional B is negligible. From where I stand, the discussion on overhead in this context is moot and we should focus on usability here. |
If an application cares about ub, it should just initialize the error field before passing the status to MPI. We only need to make sure then that MPI will never copy uninitiated memory to this field. |
That's not the problem. See my initial post. The problem is that some procedure calls set the field and others don't. Of course, to avoid UB you just don't make the mistake of reading that field after a call to |
It is a strawman arguing for the instruction cost. The key points are:
The cost argument is ultimately an implementation issue. For some implementations, it may be more efficient or easier always to set the status fields. But for some implementations that may be a cost or concern, say, due to some weird internal design. Anyway, until we determine it is necessary, why should the standard restrict the implementation to one way or another? |
How is this different from "don't read the receive buffer until the operation is complete"? The specification states if and when one can read various parts of memory. We say when the user can and cannot read The best solution for users who are confused is to just use Given that very few users allow errors to return in the first place, the hand-wringing over this issue seems undeserved. Writing error-aware code is nontrivial and it is not too much to ask to require that users only read |
If we change the specification to allow but not require implementations to set the Maybe I missed it, but any change here is backwards-incompatible, although I doubt any user code depends on the current behavior. |
Regarding performance, I think some folks need a refresher on CPU microarchitecture. When one stores any number of bytes to memory with an x86 store instruction, it causes RFO (request for ownership), which fetches the 64B lines (usually 1) containing those bytes. This means that the 4B store generates 64B of memory traffic from memory into cache and consumes a cache line, regardless of when the line is evicted. It also consumes an entry in one or more microarchitectural state buffers, such as the store buffer and - if the memory page was not already translated - a TLB entry. In the worst case, when the line is cold, the store will take 50+ nanoseconds (100-200 instructions), so we must hope that our CPU can have a sufficient number of instructions, stores and TLB walkers in-flight to hide this latency. Note that in the case of MPI RMA request-based operations, Do implementations mitigate this by checking whether errors return and only initializing the If we add the branch on can-errors-return to hide the aforementioned 4B store overheads, modern CPUs will still speculatively execute both sides of the branch, so we can still end up taking a TLB and cache miss anyways. |
@jeffhammond In OMPI we had a lax interpretation of the Such a trivial matter resulting in such a waste of time. We can argue on how users shall be doing their coding, but if we put ourselves in their place for a little bit we would realize how ridiculous all this debate is. Writing correct distributed code is already hard enough without forcing the users to learn all the convoluted cases in which they should or should not test a field in a structure the user had explicitly requested to be returned. |
My position is that we should relax the specification to allow both the OMPI 2020 and MPICH implementations. I'm sorry that you had to fix that, but it was a choice. You could have treated that trivial noncompliance the same as MPI_THREAD_MULTIPLE and RMA were treated for years. Is it your position that we should now change the spec to force MPICH to change their implementation (and break backwards-compatibility of MPI)? What is your proposal to fix the undefined situation with RMA and NBC requests, so that users do not have to learn that .MPI_SOURCE and .MPI_TAG can't be read in those cases? |
It was a indeed a choice OMPI devs made to abide by the standard. Based on the fact that nobody complained about this change in a little over 2 years, I don't think we affected any codes because as you mentionned above 99.*% of users rely on the default error handler for MPI applications. My position is "make it simple and consistent for the users", and if some implementations have to change their ways so be it. What @devreal proposed above seems very sensible, initialize the MPI_ERROR field all the time, and let the users decide how to use it. |
A reasonable choice might be to explicitly state that the value of the field is undefined in these cases. That would allow both implementations and at the same time shout out that accessing the field is ub, even if the value was initialized before the call. The argument of not returning uninitialized memory to the application would reach beyond the receive buffer, just think of the index array in waitsome. |
I will continue to object to this, because it forces implementations to do completely unnecessary operations in order to accommodate user illiteracy. The majority of users don't handle errors at all. You are foisting backwards-compatibility, implementation changes, and runtime overhead onto every usage of status objects, in order to support illiterate users trying to write error-aware code. How many MPI users do you know who write error-aware MPI code that is correct except for its use of the .MPI_ERROR field when the return code is not MPI_ERROR_IN_STATUS?
This is what I want. We should clean up the text to say very clearly that .MPI_ERROR is only relevant when the return code is MPI_ERROR_IN_STATUS. This is not a hard rule to learn. |
This is what I would like to see as well. |
If your proposal is to replace
with
then we are in complete agreement. |
I would love to remove those tests from MPICH testsuite. I don't understand why we went through so much trouble engineering those tests and fixing the code back then instead of protesting. |
FWIW, RMA requests are the only requests where we do not have to touch the the status object if one is provided. For all other requests, the status object must at least contain some information on whether the request was cancelled (even in cases where cancellation is not allowed, like collective operations). That's a rather narrow justification for leaving the error field unset... |
Here is a nugget from Section 3.7.2:
Based on the highlighted text, a user can reasonably assume that for all but receive requests the returned status is empty as defined here, i.e., has a well-defined content. After all, for all but receive requests the value of the status object is not significant if no error occurred. |
@devreal This has been read and can still make it for MPI 4.2 on our current timeline. Did you want to bring it up for a vote at the December meeting? |
This passed a no-no vote.
|
This passed a 1st vote.
|
Problem
Section 3.2.5 states:
I understand that for single-operation message passing calls (
MPI_RECV
for example) the error code is returned to the user, if the appropriate error handler is set. However, I would argue that the difference in behavior betweenMPI_Testany
(does not set theMPI_ERROR
field),MPI_Testsome
(does set theMPI_ERROR
field), andMPI_Test
(does not set theMPI_ERROR
field) is confusing for users and breaks with the principle of least surprise. Intentionally leaving a value undefined in some cases in a structure we return to the user can cause all sorts of issues if users are not careful. That's bad API design.Case in point: open-mpi/ompi#12049
I should note that not all fields of the status are set in all cases. There are operations where
MPI_TAG
orMPI_SOURCE
have no meaning (e.g., RMA request-based operations). Leaving them undefined worries me less since this is a property of the operation and consistent across all test/wait functions. Every relevant MPI operation produces a return value so we should set the error field in every associated status object.Proposal
The
MPI_ERROR
field of the status should always be set, even if the error value is returned to the user directly (MPI_Test
, MPI_Recv`, ...).Changes to the Text
Strike the statement in Section 3.2.5 and add verbiage that the error field is always set.
Impact on Implementations
This could simplify implementations since they don't have to deal with the difference between single- and multi-request functions.
Impact on Users
Less surprises and exceptions to think about.
There may be applications out there that rely on the status not being set (i.e., carry information through the
MPI_ERROR
field across MPI function calls) but I expect the chances of that are very slim and it's bad design anyway. Such applications can easily be fixed by storing that information in a local variable instead.References and Pull Requests
Brought up in an Open MPI ticket: open-mpi/ompi#12049
The text was updated successfully, but these errors were encountered: