-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockchain backup RFC #3885
Comments
I think we can use combination of "offline" and "online" approaches, we can choose some subset of peers in the network and make "offline" backup for their storage. In case of whole network failure we can then recover network (probably with lose of some blocks) through loading "offline" backup and gossiping. |
My personal vision of the issue:Native backup mechanism shouldn't be considered before the major features releaseReferring to the experience in working with other services/applications, in most cases an infrastructure engineer is responsible for persisting the data thus maintaining a backup/recovery mechanism. Commonly they'd implement something you call the "offline" approach here, i.e. redundantly storing/mirroring the data directories somewhere else. I would strongly recommend inviting DevOps specialists to the discussion to make sure we don't allocate resources to something completely unnecessary. Upgradeability should be addressed as a priorityCurrently, a living project using Iroha, and willing to upgrade to a newer version, may encounter major inconvenience implied from an inability of performing native upgrades within Iroha. To achieve this for now there is the only option, which I'd say violates decentralization principles, and even worse than a hard-fork, exists. I'd recommend addressing this aspect as a priority instead. |
at some point we discussed archive nodes in #3527. These nodes do not participate in a consensus but they receive blocks from the network. Could these nodes be used as a backup? |
What I'm worried about is the data being safely backed up. It is not new, but it wasn't discussed enough, given the current situation.
I will ask for recommendations from the DevOps; at the same time, how aware is the DevOps of Iroha architecture? So far, backups in Iroha look like an open question to me.
I would like to know whether it needs to be stopped at a certain point so the backup can proceed. I don't know Kura deeply enough to claim it's safe or isn't. Randomly copying the data of a database with a journal may lead to damage, for example.
While I believe this should be an architecture-related consideration, the decision is yours to make
Certainly, we've discussed upgradeability as a part of the workflow.
I believe they could be. I am not sure when to stop the node and proceed with the backup. |
I'm not sure if this issue should be solved at the network level or if the blockchain should handle it. I mean it should look like an off-chain service/tool/utility that is responsible for backing up all blocks using off-chain technologies (SQL databases or something). |
Please elaborate on the concern. What exactly determines 'safety'?
If you think the architecture knowledge is required for the discussion please provide as much useful references as possible for the context.
Please elaborate on the examples/concerns related to that. Why should it be stopped at all? We could consider snapshotting of a previous state in parallel to ongoing consensus participation. Researching a technical solution is a next step and we should not limit ourselves at this point.
I wouldn't say we should complicate this process with another layer of consensus around snapshots/backups or I just got a wrong impression of the statement. Please don't portrait me as the only decision maker, I just want to have as much context as possible to conveniently communicate with all stakeholders and perform further roadmap adjustments.
Certainly. |
Iroha is a system with many parts that can influence the process and the amount of restored data. There's a BFT consensus in addition to the blockchain itself. The frequency of updates on the disk wasn't discussed often and I'm not sure if a scenario of Kura being damaged due to an unexpected stopping is possible. In my opinion, the maximum amount of data peers agree on that can be restored should be determined, and we should proceed from then on.
I believe I'm not the best person to do so, so I started discussing the architectural side with both @mversic and @Erigara, who added a lot of code to the recent codebase, as well as you since you're involved in Iroha 1 and other related projects. I'm not the best person to point out architectural aspects of Iroha, but I can imagine a realistic data loss scenario.
As I said before, "I don't know Kura deeply enough to claim it's safe or isn't". If there's something like a journal, a part of the Kura is stored in RAM, and the Kura data depends on both, I see a risk with simply copying data: while data would be copied, the information may or may not be unreadable. I am not sure which risks are there, this is why I'm asking the people who have more information to decide. |
From DevOps side, it should be explained in more details for us regarding which application part backup is required. As for now, we can support iroha services fault-tolerance by pods replication. Or if it's requited to save, keep and restore application state data (perhaps this concerns the Kura subsystem), we can use external backups. We also have a successful case of iroha volumes and block storage backup and restore after peers fault. |
There are two groups of approaches for the backups with different advantages and pitfalls: "online" (using other peers) and "offline" (relying on data stored on a separate medium).
The online approaches would be prone to catastrophic events. If the peers run on the physically nearby machines, something like an electric failure is a risk for all these machines.
The offline approach leads to many questions regarding the BFT and which data we should trust. We should discuss this in detail because, despite all the issues, it guarantees a significant part of the data is safe and would help our users. Block streaming may be a way to perform those, but this raises another question: "Which peer to stream from?".
According to our discussion today, we need to stabilize the API to have the ability to restore the previous version of the chain.
The text was updated successfully, but these errors were encountered: