Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Garbage-collection in op-based CRDTs #407

Open
pgte opened this issue Apr 13, 2018 · 10 comments
Open

Garbage-collection in op-based CRDTs #407

pgte opened this issue Apr 13, 2018 · 10 comments
Labels

Comments

@pgte
Copy link

pgte commented Apr 13, 2018

A naive implementation of operation-based CRDTs require the nodes to keep the entire history of operations so that new nodes may join the CRDT. This brings some issues: local storage and convergence time for new nodes (shopping the entire set of operations may take a long time).

The seminal CRDT paper briefly touches the subject, claiming that garbage-collection requires consensus amongst participating nodes.

If so, what strategies are available? If not, what are the alternatives?

@dvc94ch
Copy link

dvc94ch commented Sep 12, 2019

Here are some half-baked ideas:

You only care about most recent entries in the db. This can allow for garbage collection without consensus, by checking the timestamps. This could be used as a shared bitswap ledger.

Based on this idea we can extend it to allow republishing old entries to keep them fresh, maybe useful for indexing blocks.

@dvc94ch
Copy link

dvc94ch commented Oct 2, 2019

It's not mentioned in the merkle-crdt paper as a reference, but it looks like an option to perform compaction. It reaches consensus without voting by turning the gossip tree that spreads messages into a merkle dag. Then it performs virtual voting to get linearizability. I'm not sure how it handles nodes dynamically joining and leaving yet, but it looks like an application could have commutative and non-commutative transactions (like for example log compaction), and during a network partition only support commutative transactions.

@aschmahmann
Copy link

From what I recall hashgraph either has trouble dealing with dealing with nodes dynamically leaving and joining. However, if each node has some voting power, they can choose to give that some of that power to another node.

If you look at #379 you'll notice that life is a whole lot easier in closed groups (e.g. all parties known in advance) than open groups. Using some type of implicit voting in a closed group can certainly make compaction easier. Open groups is a harder though.

@dvc94ch
Copy link

dvc94ch commented Oct 3, 2019

What you are describing is essentially proof-of-authority without a leader selection mechanism. I agree that it is much harder and at that point you are building a full-fledged blockchain. Currently I'm interested in the single-writer and closed multi-writer use cases, that I think have tons of applications, don't require a blockchain and are underrepresented. A blockchain is poorly suited for those use cases.

@pgte
Copy link
Author

pgte commented Oct 3, 2019

Here I briefly describe causal stability: ipfs-inactive/dynamic-data-and-capabilities#2 (comment)

Unlike the solution I point out (using vector clocks), theoretically it boils down to knowing the envolved peers and finding the lowest common ancestor in a causal tree.

@aschmahmann
Copy link

Currently I'm interested in the single-writer and closed multi-writer use cases

👍 I agree that closed multi-writer has plenty of use.

it boils down to knowing the envolved peers and finding the lowest common ancestor in a causal tree.

👍. This should hopefully work in high traffic + participation CRDTs, however if participation is low (and users do not or cannot leave the CRDT) then we have trouble. One escape hatch here is to do a "vote with your feet" approach by basically forking the CRDT and advertising your fork in (or related to) the original CRDT. This allows the CRDT to compact as long as enough relevant users move to the fork,

@dvc94ch
Copy link

dvc94ch commented Oct 3, 2019

@pgte can you explain how delta-crdts avoid the issue? also to be byzantine fault tolerant you can't just compact without a consensus on the hash of the compacted log, since that is what you are sending to someone who joins the network after compaction. EDIT: I guess you could request the log from multiple nodes and check that they are equal.

@pgte
Copy link
Author

pgte commented Oct 3, 2019

@dvc94ch this solution is not BFT.

@hsanjuan
Copy link
Contributor

hsanjuan commented Oct 3, 2019

you can't just compact without a consensus on the hash of the compacted log

The whole gist of this thing is whether there is a way to actually do it without consensus.

Or otherwise a way to do it given some small constraints that do not amount to having full consensus (and what are those, and for what scenarios would work).

@dvc94ch
Copy link

dvc94ch commented Oct 3, 2019

That severly constrains it's utility in a p2p context imo. I started implementing a consensus algorithm. I think I should have a naive implementation in 1-2 weeks. My plan is to use that to create a p2p replicated sled. (If you don't know sled it's an embedded db written in rust, I'm pretty sure it's going to get known outside the rust community soon).

@daviddias daviddias transferred this issue from ipfs-inactive/research-CRDT Feb 4, 2020
@daviddias daviddias added the CRDTs label Feb 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants