Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

Merkle-CRDTs paper (draft) #403

Open
hsanjuan opened this issue Feb 12, 2019 · 40 comments
Open

Merkle-CRDTs paper (draft) #403

hsanjuan opened this issue Feb 12, 2019 · 40 comments
Labels

Comments

@hsanjuan
Copy link
Contributor

Hi all,

In November/December I allocated time to dive into CRDTs because of upcoming features in IPFS Cluster[https://github.com/ipfs/ipfs-cluster]. I read a lot of the literature and this eventually led me to the current and past CRDTs efforts within our ecosystem. Unfortunately I could not find anything officially published or formalized, and I had some some ideas of my own.

The result is a paper on Merkle-CRDTs, that is, CRDTs backed by Merkle-DAGs. The paper formalizes what we called a Merkle-Clock (a logical clock) and goes on to describe Merkle-CRDTs as CRDTs on Merkle-Clocks. It describes implementation approaches, advantages and disadvantages, along with some of the easier optimization methods. It gives due mention to all the ecosystem projects that have worked on this direction and also includes a rather long background section so that interested readers can get up to speed in the notions relevant to digest the rest of the paper if they need to (but otherwise can be skipped).

This paper is meant to give an official, formal description of techniques used by Peerpad (in the past), OrbitDB and now IPFS Cluster. It is also meant to be a base for future -hopefully shorter- papers and research on the topic, for example on optimizations (I've seen we even have Research RFCs open for it).

Samuli (@haadcode) and Pedro (@pgte) were looped in very early in the process for their amazing work on Merkle-CRDTs (OrbitDB, Peerpad/peer-crdt) and provided very valuable feedback, comments and proofreading, thus they appear as additional authors and I'm extremely grateful to them.

We are now, finally, opening up the paper draft to a wider audience (Protocol Labs/IPFS Community). The repository for any contributions is at:

https://github.com/protocol/merkle-crdts-paper/
(if you don't have access, send me your @github handle)

An updated version of the PDF is at: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf

Feel free to forward this to any interested parties in our community.

Cheers!

@gritzko
Copy link

gritzko commented Feb 12, 2019

Very interesting.
Right now, I am working on blending Causal (CRDT) and Merkle structures (cause they are essentially the same).
Some preliminary texts are online at http://replicated.cc
I would definitely like to read your paper.

@gritzko
Copy link

gritzko commented Feb 12, 2019

I think, we may even add RON 2.1 Merkle structure to the paper.
It has some special features. RON 2.1 has a formally defined op format, including the so-called "nominal" (internal) fixed-width form. That allows to define a Merkle structure on the op graph itself. Hence, RON2.1 hashes are completely independent of any actual serialization.
http://replicated.cc/specs/hash/
http://replicated.cc/specs/nominal/

@pgte
Copy link

pgte commented Feb 13, 2019

@vitorenesduarte @CBaquero @gyounes This may interest you.

@ianopolous
Copy link
Member

In Peergos we use a merkle CHAMP as a state based CRDT.

@vitorenesduarte
Copy link

Thanks @pgte.
@hsanjuan Can you add us to the repository?

@gritzko
Copy link

gritzko commented Feb 13, 2019

I am on page 15 so far. The Merkle clock idea has its logic. Albeit, I would advise against using that term.
You'll need a full event database to use those. Sequence numbers and Lamport clocks are directly comparable. Hashes are not.
If any timestamp comparison requires multiple round-trips to the database, that's brutal.
Hence, this type of clock is hardly distinguishable from the dataset itself. In big-O terms, it is the same thing.
If you don't call it "clocks" or "timestamps" then it is perfectly OK.

@gritzko
Copy link

gritzko commented Feb 13, 2019

One way of mixing Merkle and Lamport identifiers is to de-collision logical time with hash bits. If we define such a "Merkport timestamp": 64 bits for a logical time counter, 64 bits for the truncated data hash. That is not cryptographically strong, of course. It is merely collision/corruption resistant. It is also directly comparable.
This scheme has its issues too: hash bits are not compressible and you have to carry them around if you need order.

@gritzko
Copy link

gritzko commented Feb 13, 2019

It is possible to reduce the total size of the Merkle-CRDT DAG by including only a CID as payload, which can then be fetched separately.

With the op based approach, the op hash is often bigger than the op itself.

@hsanjuan
Copy link
Contributor Author

@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.

@vitorenesduarte done!

@hsanjuan
Copy link
Contributor Author

@ianopolous where can I read more about it?

@gyounes
Copy link

gyounes commented Feb 13, 2019

@gritzko please let's discuss in issues the actual paper repo. Short answer is Merkle-Clocks are a full event datastore (the DAG). The storage space and comparison trade-off is a well known and highlighted limitation. But it's still a clock because it can provide ordering for events.

@vitorenesduarte done!

@hsanjuan thanks

@duynht
Copy link

duynht commented Feb 16, 2019

@hsanjuan I am starting my research as an undergrad and would like to learn more about your paper. Could you please add me to the repo?
Thank you very much

@ianopolous
Copy link
Member

@hsanjuan We haven't written up any prose explanations of it, but here's a summary:

Merkle-champs have a property that makes them perfect for building CRDTs with - convergence. Unlike, say, a merkle-btree whose final root hash depends on the order you add the mappings, a CHAMP is insertion order independent. So if different nodes for example are adding different mappings/elements to a champ, then there is a well defined merge operation (commutative, associative and idempotent) on the champs (this assumes the mappings/values in the champ are themselves a CRDT). E.g. it is trivial to make a 2P-Set using a champ.

@julianpistorius
Copy link

@hsanjuan could you please add my handle to the repo?

@geoah
Copy link

geoah commented Feb 24, 2019

@hsanjuan woa this sounds amazing!! :D would it be possible to either get added to the repo? I'd be really interested in this.

@duynht
Copy link

duynht commented Feb 26, 2019

@hsanjuan Too bad I have been kicked out by github. Can you please add me in the repo again? Thank you very much.

@doubtingben
Copy link

could you please add my handle to the repo? 🙏

@5310
Copy link

5310 commented Feb 27, 2019

If possible I would like to read the draft as well.

@olebedev
Copy link

If possible I would like to read the draft too.

@npfoss
Copy link

npfoss commented Feb 27, 2019

Could you please add me as well? This sounds super cool

@DePizzottri
Copy link

DePizzottri commented Feb 28, 2019

@hsanjuan I'm doing some research on decreasing the dimension of logical clocks regarding CRDTs. It would be very interesting to take a look on your paper.

@songjiayang
Copy link

songjiayang commented Mar 1, 2019

Sounds good, could you please add me? I really want to read and learn more about it, thank you very much.

@ldub
Copy link

ldub commented Mar 1, 2019

Could you also add me? Would love to read this paper over the weekend.

@mastercyb
Copy link

mastercyb commented Mar 2, 2019

+1
But I am novice and just started with CRDT

@zhujo01
Copy link

zhujo01 commented Mar 2, 2019

could you please add my handle to the repo? Thanks.

@rgrover
Copy link

rgrover commented Mar 3, 2019

I'd like access as well, please.

@kjzz
Copy link

kjzz commented Mar 6, 2019

could you please add my handle to the repo? Thx a lot @hsanjuan

@tg-x
Copy link

tg-x commented Mar 28, 2019

@hsanjuan I'd like to have a look at the paper repo as well

@alinz
Copy link

alinz commented Apr 15, 2019

@hsanjuan could you add me to the paper as well?

@atomgardner
Copy link

@hsanjuan could I also have a look at your preprint?

@glesserd
Copy link

Me too :) @hsanjuan

@jsimnz
Copy link

jsimnz commented Apr 23, 2019

Would love to get access to the paper. Thanks!

@llelf
Copy link

llelf commented Apr 23, 2019

@hsanjuan I’d love to look at it too

@axefrog
Copy link

axefrog commented Apr 23, 2019

@hsanjuan I'd love access to this too, please!

@bonedaddy
Copy link

Would love access as well :D

@cyborgshead
Copy link
Member

@hsanjuan please, give me access to the paper. Appreciate

@schrepfler
Copy link

Hi @hsanjuan, can you share the access to this paper with me please?

@hsanjuan
Copy link
Contributor Author

Hi all, after the last batch of revisions, I have put the latest version of the paper draft here: http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf and I will update it regularly.

@daviddias daviddias transferred this issue from ipfs-inactive/research-CRDT Feb 4, 2020
@daviddias daviddias added the CRDTs label Feb 4, 2020
@itsmeadi
Copy link

Hi @hsanjuan, unable to access http://hector.link/presentations/merkle-crdts/merkle-crdts.pdf- timeout
could you add me to the paper as well?

@hsanjuan
Copy link
Contributor Author

hey @itsmeadi , the final paper was uploaded to https://arxiv.org/abs/2004.00107. Working on fixing my site too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests