-
Notifications
You must be signed in to change notification settings - Fork 60
Persistence
Being a peer to peer bus, Zebus cannot rely on a central broker to deliver the messages to a peer that was down when it comes back up. We worked around this problem by creating a Persistence Service, a peer that stores the transmitted messages to replay them to a peer when it comes back up.
During normal operations, a peer transmits a message to a destination peer directly, but it also sends a copy of that message to the persistence peer.
When a message is processed by the destination peer, it sends a message to the Persistence Service to acknowledge the fact that it was processed.
This means that if a message is not processed, for example when a peer is down, it will be stored in the persistence for the time being.
When a Peer restarts, it needs to process the messages that were sent to it during its downtime. Those messages are stored in the Persistence Service storage.
Upon restart, the Peer will connect to the Persistence and ask for the messages sent to its PeerId during its downtime. The Persistence will then send all the missed messages to the starting Peer. This means that migrating a Peer from one machine to the other is seamless as long as you use the same PeerId.
Once the Replay is over, a service should be able to switch to its normal way of functioning right away. But since some network links could be slower than others, a Peer A sending a message to the starting Peer B could send the message to the Persistence instead of Peer B because it is not aware that B is up. If in the meantime Peer B switched to Normal mode, the message would not be received.
This is why we have a temporary phase during which we process messages normally AND through the Persistence, after deduplicating them upon reception.
After an arbitrary 30 seconds of Safety phase, the link to the Persistence is stopped and the Peer is functioning normally.