Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized selector traversal over graphsync #253

Open
hannahhoward opened this issue Oct 23, 2021 · 2 comments
Open

Optimized selector traversal over graphsync #253

hannahhoward opened this issue Oct 23, 2021 · 2 comments
Labels
effort/weeks Estimated to take multiple weeks exp/intermediate Prior experience is likely helpful need/analysis Needs further analysis before proceeding need/maintainer-input Needs input from the current maintainer(s) P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met

Comments

@hannahhoward
Copy link
Collaborator

hannahhoward commented Oct 23, 2021

What

What we'd ultimately like to do is prevent duplicate selector traversals of the same block for certainly types of IPLD graph traversals. While the logic to detect "duplicates" is quite complicated and will evolve over time (our very first implementation is to special case just the traverse-all selector), we need a way to communicate about it over the network.

Rather than try to communicate about how to perform optimizations between peers, we're going to allow each side to optimize according to their OWN best interest and let the other know what they did. Most likely responders will almost always turn this on cause it minimizes their processing time. For client requests, we'll probably make it configurable when you initialize graphsync or perhaps at the request level as well. As we evolve our logic for detecting duplicate traversals, this will enable us to upgrade only one side and still gracefully finish a request.

How

currently, the schema for metadata about links traversed is as follows, encoded in the extension `graphsync/response-metadata:

type LinkMetadata struct {
  link Cid
  blockPresent Bool
}

type ResponseMetadata [LinkMetadata]

This ticket proposes a graphsync/metadata/1.1 extension of this format:

type BlockStatus enum {
  | Present  ("0")
  | Absent   ("1")
  | DuplicateTraversal ("2")
} representation int

type LinkMetadata struct {
  link Cid
  blockStatus BlockStatus
}

type ResponseMetadata [LinkMetadata]

When a responder sends DuplicateTraversal, it is telling the requestor that they their own traversal skipped this link, even though they have this block, because they believe it to be a duplicate. The requestor can expect no blocks or links to be sent for this part of the tree, whether or not the responder is telling the truth that it's a duplicate.

When a requestor receives this message, how it proceeds is up to the local configuration around duplicate traversals, and whether the traversal is indeed a duplicate according to local logic for detecting duplicate traversals. If it does proceed down the path, it can be assured that till it returns to that part of the traversal, all blocks loads will be local based on data previously received. If there is no local block, the requestor should not wait for a remote block.

Backwards Compatibility

It's safe to assume many requestors will support only the legacy metadata format for some time, but responders may still want to limit duplicate traversals for requestors supporting the new format.

With this in mind, I think we had better add a requestor side extension graphsync/response-metadata-version:

type Version int

And those that send the extension with a 1.1 are telling the responder ahead of time they support metadata 1.1

Alternatively, we know we don't want metadata to be an extension at all but rather part of the core protoocol, so maybe we should fold this into graphsync 1.1 encoded as CBOR (#88, ipld/specs#354, ipld/specs#355)

This is a meta-issue that will lead to implementation tickets once I have sign-off from @warpfork and others

@hannahhoward hannahhoward added need/triage Needs initial labeling and prioritization effort/days Estimated to take multiple days, but less than a week exp/intermediate Prior experience is likely helpful need/analysis Needs further analysis before proceeding need/maintainer-input Needs input from the current maintainer(s) status/blocked Unable to be worked further until needs are met and removed need/triage Needs initial labeling and prioritization labels Oct 23, 2021
@hannahhoward
Copy link
Collaborator Author

Note re labels -- I believe this will be intermediate once an implementation path is well defined, but for now it's blocked pending approval of the overall strategy

@hannahhoward hannahhoward added effort/weeks Estimated to take multiple weeks P1 High: Likely tackled by core team if no one steps up P2 Medium: Good to have, but can wait until someone steps up and removed effort/days Estimated to take multiple days, but less than a week P1 High: Likely tackled by core team if no one steps up labels Oct 23, 2021
@rvagg
Copy link
Member

rvagg commented Oct 26, 2021

This approach seems fine to me, although "DuplicateTraversal" may be too specific, unless you're considering adding further specific reasons to the enum. Some examples (pending work on selectors upstream):

  • Exhaustive selector where we don't want to touch duplicate blocks - "DuplicateTraversal" makes sense for this
  • We've seen this block 6 times already, but this time I've decided that it doesn't need to be traversed again (example suggests that traversals can decide to switch modes when they switch from "selector is not exhaustive" to "all selection below me is exhaustive"). "DuplicateTraversal" is probably still fine for this, but it might not send a specific enough signal?
  • Exceeded traversal budget on this branch, but let's not consider it a failure and agree that the DAG stops there because we've imposed that limit (on one or both sides).

These could all be separate reasons, and maybe that's OK and we're anticipating expanding the enum? But if we just want to send a "I have good reasons not to send you this block and it's not a failure", then the name should probably be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/intermediate Prior experience is likely helpful need/analysis Needs further analysis before proceeding need/maintainer-input Needs input from the current maintainer(s) P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met
Projects
None yet
Development

No branches or pull requests

2 participants