Skip to content

ULFM: fault model is not completely defined #816

@abouteiller

Description

@abouteiller

Problem

The fault model in the Fault Tolerance chapter is only alluded to. We had intentionally kept it somewhat blurry to give freedom to implementors in what fault types would manifest as MPI errors, but that has led to the fault model being insufficiently defined.
Bill Gropp proposed that we defined the fault model as experienced by the user very firmly, and add an advice to implementors clarifying that they do have freedom in what fault types they can tolerate, but not on how it is exposed to the user.

Proposal

Specify fault model strictly in terms of user-visible behavior.
Add an advice to implementor explaining what to do if they want to tolerate non-process failure fault types.

Changes to the Text

Impact on Implementations

No impact on implementation (beyond being more clear what to do).

Impact on Users

Clarification of expectation for both users and implementors.

References and Pull Requests

https://github.com/mpi-forum/mpi-standard/pull/947

Metadata

Metadata

Assignees

Labels

had readingCompleted the formal proposal readingmpi-nextFor inclusion in the MPI 5.1 or 6.0 standardwg-ftFault Tolerance Working Group

Type

No type

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions