-
Notifications
You must be signed in to change notification settings - Fork 30
IPFS database: pubsub, consistency and persistence #244
Comments
Thank you for writing these notes @pgte! Really happy to have you working on these parts of the stack. All concerns are valid, the way I would like to approach this is to identify the requirements of for the use-cases we want to support and then understand what are the parts that need to be built to create such a system and what can we use today (or quickly ship) that would mitigate some of the problems (i.e unreliable messages), always with the distributed nature in mind (no patching by putting a centralised DB somewhere). Another question is also: What can be built without attaching new protocols to IPFS directly (i.e Note, this also goes over the work happening at I believe what we need is:
|
By "IPFS database" I mean: a universal key-value store (supporting the LevelDown interface), (where each value is opaque and atomic), homogenous, fully distributed and eventually consistent, supporting multiple concurrent writers, where the storage is backed by IPFS.
I don know much about IPNS, but I've seen performance concerns about it not being usable for supporting head propagation (source) - please correct me if I'm wrong. (If this is the case, I assume we'd have to fall back to pubsub or a custom protocol on top of libp2p.) Perhaps IPNS could be used for a complete reboot, but I have this concern (not sure if valid): how could concurrency be handled in IPNS or DHT? Could we build some way of managing multiple writers in a reliable and deterministic way? (Perhaps using it to store a vector clock instead of just the head?..)
A pubsub broadcast is surely simpler than a custom protocol, but I believe a custom protocol could be much more optimised for traffic, latency and convergence. I would like to explore this a bit more. I'm almost sure that there may already be some protocol out there that supports clusters with high churn. (Perhaps this is where it overlays some concerns with ipfs-cluster? Or not..) Anyway, as you said, I'm more interested if and how we could overlay this on top of IPFS as it is today (at least as a starting point), which would remit us into pubsub. |
Partitioning and scaling
Instead of a global store, approaches like OrbitDB split a database into "tables" or "partitions" (I'll just call it "partition" from now on, as table reminds me of relational DBs too much). This model allows a database to scale, and keep all the messages related to that partition only to the nodes that are interested in it.
Assuming that each operation is appended to a log, and that this log is saved onto IPFS. The hash of the latest head of the log is then broadcasted using a pubsub channel that is specific to that partition.
Each node that is interested in a given partition subscribes to updates on that partition, and then it keeps receiving updates on the latest known head. When a node gets an update, it then uses IPFS to retrieve the content of the head (and parents) until it has all the operation log data that is needed to apply to the database. (Conflicts may arise and each node must be able to resolve them in a deterministic way, but I think this may be a discussion to have in the realm of CRDTs and related topics).
Unreliable message delivery
But, as we know, pubsub does not have reliable delivery, which means that messages can be lost. Poorly connected nodes and new nodes need to have a way to query what is the latest head of a given partition.
This can be a problem, which can be circumvented in several ways, and I enumerate a few:
If there is enough operations on that partition, these pubsub messages pertaining to a given partition will have enough frequency that all interested nodes will eventually get a message
Instead of one broadcast once there is a new operation, if nodes that participate in a partition keep broadcasting the head (at least while they think the network has interest in that partition), all interested nodes will eventually get the latest head.
I think that, for some use cases, scenario 1 may be too weak, since it's bound to database activity.
Scenario 2 provides more consistency and persistence guarantees, but at the expense of network activity, which is, if not carefully designed, increases linearly with the number of nodes. (even though it is limited to the nodes that participate in that partition). Some mechanisms to soften this would be to:
a) broadcast when a new new node is added to the topic
b) back-off the frequency of broadcasts once the activity stops
c) broadcast less frequently as the number of interested nodes increase
Is this a valid concern? If so, what are your thoughts or experience on these?
The text was updated successfully, but these errors were encountered: