Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover when an originator becomes unresponsive, and return to originator once responsive #269

Open
Tracked by #118
mkysel opened this issue Oct 28, 2024 · 4 comments
Assignees

Comments

@mkysel
Copy link
Contributor

mkysel commented Oct 28, 2024

No description provided.

@mkysel
Copy link
Contributor Author

mkysel commented Oct 28, 2024

This might already be done. To be investigated.

@mkysel mkysel self-assigned this Oct 28, 2024
@mkysel
Copy link
Contributor Author

mkysel commented Oct 28, 2024

We will re-establish a connection in the syncworker if an originator crashes. I have seen this work plenty of times. I don't think that we have any explicit testing for it.

Is there anything else that needs to happen here @richardhuaaa?

@richardhuaaa
Copy link
Contributor

I think there's two more cases we need to handle:

  1. If the originator is in the registry, but for some reason we are unable to connect to it even on retry, we should pull that originator's payloads from other nodes on the network (perhaps multiple).

  2. if the originator was recently removed from the registry (e.g. within 1 day), or they have been removed long ago but we are a brand new node that has no history, we should pull that originator's payloads from other nodes on the network (perhaps multiple)

The XIP could def do a better job of going through these

@mkysel
Copy link
Contributor Author

mkysel commented Oct 29, 2024

Gotcha. Thanks for the details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants