-
Notifications
You must be signed in to change notification settings - Fork 30
fast and simple IPFS pubsub on the application layer #108
Comments
Tons of this SGTM! 👍 -- agree on pushing for getting something that just works now, and moving to generalizing / scalability / doing it right later. it would be nice to be able to expose this service through ipfs itself. we had a proposal for pluggable transports and protocols earlier, where you can register a protocol handler with an ipfs node, and get a pipe of all the data coming through. this may be a good way to leverage existing connections and so on. but maybe it's a P2 need. |
👍 |
@jbenet thanks for the comments! (are you referring to this example?) |
Great proposal to get things moving with pubsub! It's a highly anticipated feature and I'm always for getting something working as soon as possible. Some thoughts and questions: I agree with @jbenet that the command should be exposed in ipfs, ie.
In a chat (orbit), everyone's a publishers AND a subscriber within a chat room (topic). With the proposed solution, would it work so that only some members (of a chat room) are publishers while rest are subscribers? How, on application level, a developer should decide who become publishers and who remain subscribers? Or would it work so that there are dedicated nodes in the network that form the "publishing" service and all chat clients are subscribers? Either way, there needs to be a way to have a large number of peers be able publish to a topic, regardless of the network topology and roles. Perhaps a subscriber (in terms of the role in the network) can send a message to one of the publishers who then publishes the message on behalf of the subscriber?
How would the "list of publisher addresses" be handled if it's an immutable ipfs hash? What happens, in terms of the list, when a new publishers joins or leaves the network? If it's saved in IPNS, in addition to perf problems, we're limited to one node to handle the updates to the list (per-node pubkey publishing in ipns). Can we use the
We can do this already: {
Links: [
{ "Name": "previousHead", "Hash": "QmPreviousHead" }
],
Data: "QmFoo"
} As an additional feature, and as a performance optimization, we can add "skip lists" to the Links which point to the previousHead at 10 (heads ago), 20, 50, 100, etc. so that each topic intrinsically contain the full history of that topic. This feature is not usually present in traditional pubsub systems but would allow us to take the pubsub feature closer to a message queue / event log (eg. Kafka).
I don't understand this part. You intent to have multiple heads per topic? Shouldn't there be just one? |
cc @whyrusleeping you mentioned you had some ideas for pubsub? |
Earlier discussion on pubsub for the reference: #64 |
Thanks for the comments @haadcode!
I disagree. The
This approach isn't going to help you much, then. 😿 The use case I had in mind was a small set of privileged publishers providing to a larger set of unprivileged subscribers. The sort of case where individuals or small orgs control their own publish channel for authoritative data over the head of their data dag on the IPFS network.
That could work! Individual publishers could choose peers they trust, and permit them to use their own privileged publisher node as a relay. This could actually work as an orthogonal layer on top of this model. However, it may be worth reconsidering this problem again for your use case specifically. Making a list of all the pubsub use cases we can think of could help us model the solution better, too.
You're right; scratch that.
Humour me: what does this optimize vs each message storing its previous heads? Traversal?
In an append-only log CRDT you can always have forks, where e.g. two publishers publish a new head, but both point at the same old head. These forks are acceptable, but new publishers should link to all of the previous heads to indicate causality (i.e. that THIS head came temporally after ALL of this other heads). |
Fair points. My strongest reason to put pubsub into
Could your solution still work if it was indeed a set of "privileged" nodes acting as publishers in the network and subscribers can communicate to them to publish? A service within a network. This would still be an acceptable solution to get started, imo, it's decentralized at least. With the upcoming private networks, small orgs can set it up within their network by providing a publisher node, and in the public ipfs network anyone can provide a publisher node. What do you think?
Yes, purely a traversal optimization. But a helpful one where you need to, say, "warm up the cache" with the full history of a topic. With skip lists you can parallelize the traversal. Each new head would still point to the immediate next head, these would be additional pointers.
I see. So in case there are two new heads A and B, both pointing to an old head C, how does the next new head, D, look like? D points to A and B? How can the publisher know that it needs to link to both of them instead of just "the most recent" (say A was posted before B)? Can A and B ever point to different old heads? |
The list structure with only one link to previous is very inefficient when it come to fetching the past. If there is only link one back it means that we need to wait for resolution of one, read it, wait for resolution of second, read it and so on. It would be much better, when node knows that history, not every node needs to know all past, to include exponential links to the past (2, 4, 8, 16, ... before). This means that lookup for Nth node in the past is no longer O(N) but in worst case O(logN) and best case O(1). This allows also to fetch big chunk of history concurrently, not sequentially as in case of storing only link to previous. |
If it helps, and the pubsub (publisher?) implementation is not dependent on the message order, we could try to think of it without (and leave ordering to the application to handle) and see what would be the simplest way to "write/read messages to/from a topic"? |
@haadcode @Kubuxu: Point well received re: skip lists. This makes a lot of
I think that's a good idea. Like you said it still relies on a central I'm still not sure whether this is something that needs to be a part of core
A node doesn't need to know or make sure it has all of the heads before E.g. from your example, D points to A and B if the peer publishing knows about
Agreed. Don't we get this by just dropping the |
I am very interested in the #pubsub area within IPFS. Does anyone have a reference to any of the time-tested research papers on the topic? Is there one I could start with and then follow the end-notes to other important papers? |
@jmsmcfrlnd you might want to talk to @nicola, who is working on a gossip implementation (JS) based on a paper, which could be a great cornerstone for a gossip-based many-to-many pubsub implementation. Also check out @whyrusleeping's work on one-to-many pubsub (Go), which is pretty far along! -- found here. |
thanks for the mention @noffle, I would underline that a general purpose pubsub would probably not be based on gossip, but instead should try to reuse the dht itself. (will update you on this) |
If you simulated a quorum one might choose how many peers are needed to quantify a write. |
Maybe this bit about Redis is relevant |
To everyone following this thread, check out the latest Tutorial published by @pgte on how to use PubSub to create Real-Time apps over IPFS |
Also, in case you missed it, PubSub has been here for a while, see the spec at https://github.com/ipfs/interface-ipfs-core/tree/master/API/pubsub |
The spec is now at https://github.com/ipfs/interface-ipfs-core/blob/master/SPEC/PUBSUB.md |
proposal
integrating pubsub into IPFS core is going to take time and effort to do it properly. this proposes a small CLI tool (or, trivially, a streaming module) that enables decentralized (but not distributed) pubsub on the application layer.
goals
this puts ease of integration via a simple api and speed as its highest priorities.
any IPFS application that wishes /ipns was fast enough for realtime use (that is: getting the HEAD of a frequently updating merkle dag). I wrote this with applications like orbit vaguely in mind.
while this isn't a fully distributed solution, it has a number of benefits over an ad-hoc central server approach.
CLI: publisher usage
$ my-periodic-data-generator | pubsub -p [topic] [publisher-addr [publisher-addr [...]]] /ipfs/Qmfoobar/topic
the output is a single multiaddress that can be considered a subscription token.
the publisher shares this topic token as widely as they wish
the command expects an input stream of messages to be published to subscribers
CLI: subscriber usage
the subscriber receives published data as /ipfs addresses, which are output as a
stream newlined delimited addresses. subscribers can they resolve each to see
the data that was published
an optional optimization for higher trust environments could be to permit the
publisher to include the above /ipfs address, but also inline the content.
subscribers could verify data asynchronously
network topology
publishers form a fully connected graph with each other
subscribers are each connected to one of the publishers (uniform dist)
this topology is not strictly necessary, since no connections are "privileged"
-- they all just replicate the topic dag quickly to 'subscribers'
^ said differently: anyone can be a publisher in the sense that they disseminate
the topic dag to others, but only publishers listed in the topic metadata can
add to the topic dag
subscription token/key/identifier format
a topic is defined by a multiaddress (probably /ipfs or /ipns). it resolves to a
list of newline-delimited multiaddresses of peers
if stored on /ipfs, it forms an immutable contract of publishers for that topic
if stored on /ipns, it forms a mutable contract of publishers
since each publisher's public key is listed, it means all published messages
can be verified by subscribers
message publishing
format
this should be fully free-form (anything the publisher wants): newline delimited
text, json, protocol buffers, et cetera. let's not force e.g. IPLD on people who
just want p2p pubsub and IPFS happens to be the underlying transport
this document proposes a JSON structure to be used that enables a "topic merkle
dag" -- it permits linking to the previous heads of the topic dag, giving
integrity benefits:
json structure
something IPLD-esque, e.g.
where
prevHeads
is a list of the heads of the current topic dag. ('head'refers to each node in the dag that has no links pointing to it)
summary: nice properties
fallback, message inlining)
messages can be verified
store-and-forward subscribers, and pure subscribers
questions
existing peer connections if we're already connected to a publisher?
the) public ipfs gateway for resolving messages
The text was updated successfully, but these errors were encountered: