Skip to content
Permalink

Comparing changes

This is a direct comparison between two commits made in this repository or its related repositories. View the default comparison for this range or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: kleveross/ftlib
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: c59bed7527238a8a50e9e25d48a3aaa0381b5905
Choose a base ref
..
head repository: kleveross/ftlib
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: f8d8381d2924f03ccf54545371c564ba4bca95b8
Choose a head ref
Showing with 5 additions and 5 deletions.
  1. +5 −5 docs/design/consensus.md
10 changes: 5 additions & 5 deletions docs/design/consensus.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
## Consensus

The consensus protocol in FTLib acts as a shadow precondition for any collective communication operations. Any changes from the consensus protocol will reset the initialization flag of FTLib to `False`, deferring any collective communication operations, and lead to the rebuild procedure.
The consensus protocol in FTLib acts as a shadow precondition for any collective communication operations. Any changes from the consensus protocol will reset the initialization flag of FTLib to `False`, deterring any communication operations after the rebuild procedure returns success.

A member list is maintained by the implementation of consensus protocol.

The rank-assign scheme in FTLib can extract worker identification from the member list, such like address of each workers. Such unique identification helps the rank-assign scheme to designate unique rank number to each worker, which most collective communication libraries require during initialization.
The rank-assign scheme in FTLib can extract worker identification from the member list, such like address of each worker. This unique identification helps the rank-assign scheme to designate individual rank number to each worker, which most communication libraries require when initializing.

When FTLib start to `rebuild`, it uses the `confirm` API of consensus protocol to check the consensus of member list is agreed by all existing workers.
When FTLib start to `rebuild`, it uses the `confirm` API of consensus protocol to check the member list is agreed by all existing workers.

Not exposed to FTLib though, a `report_join` function in consensus protocol will be called inside the `confirm` function if the worker is just launched and has not reported before. For a process in a worker's lifetime, the `report_join` will be called only once.
Not exposed to FTLib though, a `report_join` API in consensus protocol will be called inside the `confirm` function if the worker is freshly launched and has not successfully reported before. During the whole lifetime of a worker, the `report_join` API will not be called for a second time after a successfully trial.

Every time FTLib succeeds or fails to perform collective operations, it will call the corresponding consensus function to inform the consensus protocol of whether any actions need to be taken. However, these two functions do not have to act meaningfully.
Every time FTLib succeeds or fails to perform collective operations, it will call the corresponding functions to inform the consensus protocol whether any actions need to be taken. However, these two functions can be ignored if no actions needed by the specific consensus protocol.

## Consensus API Introduction