Optimized selector traversal over graphsync #253
Labels
effort/weeks
Estimated to take multiple weeks
exp/intermediate
Prior experience is likely helpful
need/analysis
Needs further analysis before proceeding
need/maintainer-input
Needs input from the current maintainer(s)
P2
Medium: Good to have, but can wait until someone steps up
status/blocked
Unable to be worked further until needs are met
What
What we'd ultimately like to do is prevent duplicate selector traversals of the same block for certainly types of IPLD graph traversals. While the logic to detect "duplicates" is quite complicated and will evolve over time (our very first implementation is to special case just the traverse-all selector), we need a way to communicate about it over the network.
Rather than try to communicate about how to perform optimizations between peers, we're going to allow each side to optimize according to their OWN best interest and let the other know what they did. Most likely responders will almost always turn this on cause it minimizes their processing time. For client requests, we'll probably make it configurable when you initialize graphsync or perhaps at the request level as well. As we evolve our logic for detecting duplicate traversals, this will enable us to upgrade only one side and still gracefully finish a request.
How
currently, the schema for metadata about links traversed is as follows, encoded in the extension `graphsync/response-metadata:
This ticket proposes a
graphsync/metadata/1.1
extension of this format:When a responder sends DuplicateTraversal, it is telling the requestor that they their own traversal skipped this link, even though they have this block, because they believe it to be a duplicate. The requestor can expect no blocks or links to be sent for this part of the tree, whether or not the responder is telling the truth that it's a duplicate.
When a requestor receives this message, how it proceeds is up to the local configuration around duplicate traversals, and whether the traversal is indeed a duplicate according to local logic for detecting duplicate traversals. If it does proceed down the path, it can be assured that till it returns to that part of the traversal, all blocks loads will be local based on data previously received. If there is no local block, the requestor should not wait for a remote block.
Backwards Compatibility
It's safe to assume many requestors will support only the legacy metadata format for some time, but responders may still want to limit duplicate traversals for requestors supporting the new format.
With this in mind, I think we had better add a requestor side extension
graphsync/response-metadata-version
:And those that send the extension with a 1.1 are telling the responder ahead of time they support metadata 1.1
Alternatively, we know we don't want metadata to be an extension at all but rather part of the core protoocol, so maybe we should fold this into graphsync 1.1 encoded as CBOR (#88, ipld/specs#354, ipld/specs#355)
This is a meta-issue that will lead to implementation tickets once I have sign-off from @warpfork and others
The text was updated successfully, but these errors were encountered: