Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockchain backup RFC #3885

Open
6r1d opened this issue Sep 13, 2023 · 8 comments
Open

Blockchain backup RFC #3885

6r1d opened this issue Sep 13, 2023 · 8 comments
Assignees
Labels
Documentation Documentation changes iroha2-dev The re-implementation of a BFT hyperledger in RUST

Comments

@6r1d
Copy link
Contributor

6r1d commented Sep 13, 2023

There are two groups of approaches for the backups with different advantages and pitfalls: "online" (using other peers) and "offline" (relying on data stored on a separate medium).


The online approaches would be prone to catastrophic events. If the peers run on the physically nearby machines, something like an electric failure is a risk for all these machines.

The offline approach leads to many questions regarding the BFT and which data we should trust. We should discuss this in detail because, despite all the issues, it guarantees a significant part of the data is safe and would help our users. Block streaming may be a way to perform those, but this raises another question: "Which peer to stream from?".


According to our discussion today, we need to stabilize the API to have the ability to restore the previous version of the chain.

@6r1d 6r1d added the Documentation Documentation changes label Sep 13, 2023
@6r1d 6r1d changed the title Blockchain RFC Blockchain backup RFC Sep 13, 2023
@Erigara
Copy link
Contributor

Erigara commented Sep 13, 2023

I think we can use combination of "offline" and "online" approaches, we can choose some subset of peers in the network and make "offline" backup for their storage.

In case of whole network failure we can then recover network (probably with lose of some blocks) through loading "offline" backup and gossiping.

@Mingela
Copy link
Contributor

Mingela commented Sep 13, 2023

My personal vision of the issue:

Native backup mechanism shouldn't be considered before the major features release

Referring to the experience in working with other services/applications, in most cases an infrastructure engineer is responsible for persisting the data thus maintaining a backup/recovery mechanism. Commonly they'd implement something you call the "offline" approach here, i.e. redundantly storing/mirroring the data directories somewhere else. I would strongly recommend inviting DevOps specialists to the discussion to make sure we don't allocate resources to something completely unnecessary.
@6r1d please elaborate on the motivation and a business goal you're addressing with this suggestion as well as an overview of approaches other blockchains usually implement this.
Imo, what called "online" is not really related to a backup, it's an essential property of a blockchain for a node to be able to catch the relevant up using p2p connections and perform a block (re-)validation & verification.
Additionally, we might consider an improvement towards instantiating an empty node, currently it will download all blocks from the genesis. Some kind of network-trusted snapshots/checkpoints would reduce the synchronization time for such a node.
p.s. a node of other type may also be helpful for a 'backup', i.e. syncing/archival.

Upgradeability should be addressed as a priority

Currently, a living project using Iroha, and willing to upgrade to a newer version, may encounter major inconvenience implied from an inability of performing native upgrades within Iroha. To achieve this for now there is the only option, which I'd say violates decentralization principles, and even worse than a hard-fork, exists. I'd recommend addressing this aspect as a priority instead.

@mversic
Copy link
Contributor

mversic commented Sep 14, 2023

at some point we discussed archive nodes in #3527. These nodes do not participate in a consensus but they receive blocks from the network. Could these nodes be used as a backup?

@6r1d
Copy link
Contributor Author

6r1d commented Sep 14, 2023

Commonly they'd implement something you call the "offline" approach here, i.e. redundantly storing/mirroring the data directories somewhere else

What I'm worried about is the data being safely backed up. It is not new, but it wasn't discussed enough, given the current situation.

I would strongly recommend inviting DevOps specialists to the discussion to make sure we don't allocate resources to something completely unnecessary.

I will ask for recommendations from the DevOps; at the same time, how aware is the DevOps of Iroha architecture? So far, backups in Iroha look like an open question to me.

Additionally, we might consider an improvement towards instantiating an empty node, currently it will download all blocks from the genesis. Some kind of network-trusted snapshots/checkpoints would reduce the synchronization time for such a node.

I would like to know whether it needs to be stopped at a certain point so the backup can proceed. I don't know Kura deeply enough to claim it's safe or isn't. Randomly copying the data of a database with a journal may lead to damage, for example.

Native backup mechanism shouldn't be considered before the major features release

While I believe this should be an architecture-related consideration, the decision is yours to make

Upgradeability should be addressed as a priority

Certainly, we've discussed upgradeability as a part of the workflow.


at some point we discussed archive nodes. These nodes do not participate in a consensus but they receive blocks from the network. Could these nodes be used as a backup?

I believe they could be. I am not sure when to stop the node and proceed with the backup.

@pesterev
Copy link
Contributor

I'm not sure if this issue should be solved at the network level or if the blockchain should handle it. I mean it should look like an off-chain service/tool/utility that is responsible for backing up all blocks using off-chain technologies (SQL databases or something).

@Mingela
Copy link
Contributor

Mingela commented Sep 14, 2023

What I'm worried about is the data being safely backed up. It is not new, but it wasn't discussed enough, given the current situation.

Please elaborate on the concern. What exactly determines 'safety'?

I will ask for recommendations from the DevOps; at the same time, how aware is the DevOps of Iroha architecture? So far, backups in Iroha look like an open question to me.

If you think the architecture knowledge is required for the discussion please provide as much useful references as possible for the context.

I would like to know whether it needs to be stopped at a certain point so the backup can proceed. I don't know Kura deeply enough to claim it's safe or isn't. Randomly copying the data of a database with a journal may lead to damage, for example.

Please elaborate on the examples/concerns related to that. Why should it be stopped at all? We could consider snapshotting of a previous state in parallel to ongoing consensus participation. Researching a technical solution is a next step and we should not limit ourselves at this point.

While I believe this should be an architecture-related consideration, the decision is yours to make

I wouldn't say we should complicate this process with another layer of consensus around snapshots/backups or I just got a wrong impression of the statement. Please don't portrait me as the only decision maker, I just want to have as much context as possible to conveniently communicate with all stakeholders and perform further roadmap adjustments.

at some point we discussed archive nodes. These nodes do not participate in a consensus but they receive blocks from the network. Could these nodes be used as a backup?

Certainly.

@6r1d
Copy link
Contributor Author

6r1d commented Sep 14, 2023

Please elaborate on the concern. What exactly determines 'safety'?

Iroha is a system with many parts that can influence the process and the amount of restored data. There's a BFT consensus in addition to the blockchain itself. The frequency of updates on the disk wasn't discussed often and I'm not sure if a scenario of Kura being damaged due to an unexpected stopping is possible. In my opinion, the maximum amount of data peers agree on that can be restored should be determined, and we should proceed from then on.

If you think the architecture knowledge is required for the discussion please provide as much useful references as possible for the context.

I believe I'm not the best person to do so, so I started discussing the architectural side with both @mversic and @Erigara, who added a lot of code to the recent codebase, as well as you since you're involved in Iroha 1 and other related projects. I'm not the best person to point out architectural aspects of Iroha, but I can imagine a realistic data loss scenario.

Please elaborate on the examples/concerns related to that. Why should it be stopped at all?

As I said before, "I don't know Kura deeply enough to claim it's safe or isn't".

If there's something like a journal, a part of the Kura is stored in RAM, and the Kura data depends on both, I see a risk with simply copying data: while data would be copied, the information may or may not be unreadable. I am not sure which risks are there, this is why I'm asking the people who have more information to decide.

@BAStos525
Copy link
Member

From DevOps side, it should be explained in more details for us regarding which application part backup is required. As for now, we can support iroha services fault-tolerance by pods replication. Or if it's requited to save, keep and restore application state data (perhaps this concerns the Kura subsystem), we can use external backups. We also have a successful case of iroha volumes and block storage backup and restore after peers fault.

@nxsaken nxsaken added the iroha2-dev The re-implementation of a BFT hyperledger in RUST label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Documentation changes iroha2-dev The re-implementation of a BFT hyperledger in RUST
Projects
None yet
Development

No branches or pull requests

8 participants