Garbage-collection in op-based CRDTs #407

pgte · 2018-04-13T07:44:47Z

A naive implementation of operation-based CRDTs require the nodes to keep the entire history of operations so that new nodes may join the CRDT. This brings some issues: local storage and convergence time for new nodes (shopping the entire set of operations may take a long time).

The seminal CRDT paper briefly touches the subject, claiming that garbage-collection requires consensus amongst participating nodes.

If so, what strategies are available? If not, what are the alternatives?

dvc94ch · 2019-09-12T08:26:39Z

Here are some half-baked ideas:

You only care about most recent entries in the db. This can allow for garbage collection without consensus, by checking the timestamps. This could be used as a shared bitswap ledger.

Based on this idea we can extend it to allow republishing old entries to keep them fresh, maybe useful for indexing blocks.

dvc94ch · 2019-10-02T19:33:55Z

It's not mentioned in the merkle-crdt paper as a reference, but it looks like an option to perform compaction. It reaches consensus without voting by turning the gossip tree that spreads messages into a merkle dag. Then it performs virtual voting to get linearizability. I'm not sure how it handles nodes dynamically joining and leaving yet, but it looks like an application could have commutative and non-commutative transactions (like for example log compaction), and during a network partition only support commutative transactions.

aschmahmann · 2019-10-02T20:13:05Z

From what I recall hashgraph either has trouble dealing with dealing with nodes dynamically leaving and joining. However, if each node has some voting power, they can choose to give that some of that power to another node.

If you look at #379 you'll notice that life is a whole lot easier in closed groups (e.g. all parties known in advance) than open groups. Using some type of implicit voting in a closed group can certainly make compaction easier. Open groups is a harder though.

dvc94ch · 2019-10-03T07:24:36Z

What you are describing is essentially proof-of-authority without a leader selection mechanism. I agree that it is much harder and at that point you are building a full-fledged blockchain. Currently I'm interested in the single-writer and closed multi-writer use cases, that I think have tons of applications, don't require a blockchain and are underrepresented. A blockchain is poorly suited for those use cases.

pgte · 2019-10-03T08:33:34Z

Here I briefly describe causal stability: ipfs-inactive/dynamic-data-and-capabilities#2 (comment)

Unlike the solution I point out (using vector clocks), theoretically it boils down to knowing the envolved peers and finding the lowest common ancestor in a causal tree.

aschmahmann · 2019-10-03T09:13:29Z

Currently I'm interested in the single-writer and closed multi-writer use cases

👍 I agree that closed multi-writer has plenty of use.

it boils down to knowing the envolved peers and finding the lowest common ancestor in a causal tree.

👍. This should hopefully work in high traffic + participation CRDTs, however if participation is low (and users do not or cannot leave the CRDT) then we have trouble. One escape hatch here is to do a "vote with your feet" approach by basically forking the CRDT and advertising your fork in (or related to) the original CRDT. This allows the CRDT to compact as long as enough relevant users move to the fork,

dvc94ch · 2019-10-03T09:15:40Z

@pgte can you explain how delta-crdts avoid the issue? also to be byzantine fault tolerant you can't just compact without a consensus on the hash of the compacted log, since that is what you are sending to someone who joins the network after compaction. EDIT: I guess you could request the log from multiple nodes and check that they are equal.

pgte · 2019-10-03T09:19:47Z

@dvc94ch this solution is not BFT.

hsanjuan · 2019-10-03T11:24:43Z

you can't just compact without a consensus on the hash of the compacted log

The whole gist of this thing is whether there is a way to actually do it without consensus.

Or otherwise a way to do it given some small constraints that do not amount to having full consensus (and what are those, and for what scenarios would work).

dvc94ch · 2019-10-03T15:52:58Z

That severly constrains it's utility in a p2p context imo. I started implementing a consensus algorithm. I think I should have a naive implementation in 1-2 weeks. My plan is to use that to create a p2p replicated sled. (If you don't know sled it's an embedded db written in rust, I'm pretty sure it's going to get known outside the rust community soon).

daviddias transferred this issue from ipfs-inactive/research-CRDT Feb 4, 2020

daviddias added the CRDTs label Feb 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage-collection in op-based CRDTs #407

Garbage-collection in op-based CRDTs #407

pgte commented Apr 13, 2018

dvc94ch commented Sep 12, 2019

dvc94ch commented Oct 2, 2019 •

edited

Loading

aschmahmann commented Oct 2, 2019

dvc94ch commented Oct 3, 2019

pgte commented Oct 3, 2019

aschmahmann commented Oct 3, 2019

dvc94ch commented Oct 3, 2019 •

edited

Loading

pgte commented Oct 3, 2019

hsanjuan commented Oct 3, 2019

dvc94ch commented Oct 3, 2019 •

edited

Loading

Garbage-collection in op-based CRDTs #407

Garbage-collection in op-based CRDTs #407

Comments

pgte commented Apr 13, 2018

dvc94ch commented Sep 12, 2019

dvc94ch commented Oct 2, 2019 • edited Loading

aschmahmann commented Oct 2, 2019

dvc94ch commented Oct 3, 2019

pgte commented Oct 3, 2019

aschmahmann commented Oct 3, 2019

dvc94ch commented Oct 3, 2019 • edited Loading

pgte commented Oct 3, 2019

hsanjuan commented Oct 3, 2019

dvc94ch commented Oct 3, 2019 • edited Loading

dvc94ch commented Oct 2, 2019 •

edited

Loading

dvc94ch commented Oct 3, 2019 •

edited

Loading

dvc94ch commented Oct 3, 2019 •

edited

Loading