-
Notifications
You must be signed in to change notification settings - Fork 108
Graphsync v1.1.0, encoded as CBOR #354
base: master
Are you sure you want to change the base?
Conversation
block-layer/graphsync/graphsync.md
Outdated
Graphsync network messages are encoded in DAG-CBOR. They have the following schema | ||
|
||
```ipldsch | ||
type GraphSyncExtensions {string:Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string
should be String
in this position. I think Any
should be fine for now if it can't be enumerated or properly pinned down, Any
just hasn't fully filtered through our specs or stack because it's quite tricky. The original pb has bytes
so might that be a safe choice for now? It could be turned into a union later or enumerated in some other way if needed. What types of things go into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, just re-read your comment about why you didn't use bytes! ignore that bit then, but I'm still interested in what kind of data this might contain and if it could be enumerated? could it be limited to scalar values to rule out maps and lists perhaps? we have AnyScalar
for that case if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm Any
should be fine.
(We still have some things to wade through in #318, it seems, but I think it's something we'll have to reach a conclusion on sooner or later, and I don't think there's any way we won't have an Any
by the end.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely isn't AnyScalar. We already use arrays and maps.
There are some other challenges for actually writing this in code-- the string represents the extension name, and the type is based on that, but we support user defined extensions, so it's not possible to know everything ahead of time.
There is a whole concept in cbor-gen
world of Deferred
-- where you don't deserialize till later, leaving in byte representation till you know what you want to deserialize to. I'm not sure this is the right approach for IPLD. But this will probably come up as well with https://github.com/ipld/go-ipld-adl-hamt -- I see there you've defined an Any type, but many HAMTs have a specific type at the bottom -- people may want to paramaterize their HAMTs and use code gen'd nodes for the day in the leaves. We should probably figure out how this might work.
Anyway though, just putting Any here at least allows me to define the schema
Very nice. Is this going to be a breaking change, or an optional upgrade? I can't figure out if/how graphsync might do versioning but would it be a "can you talk v1.1.0? great, here's my DAG-CBOR messages"? The map representation question might come down to how important shaving bytes off these messages is. It's nice having self-describing data, but it does introduce quite a bit of overhead to have the keys in there too. If I take The problems with
So, not strong reasons to avoid |
block-layer/graphsync/graphsync.md
Outdated
|
||
type GraphSyncMetadatum struct { | ||
Link &Any | ||
BlockPresent bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bool/Bool/
The rule is based on whether you're refering to a kind (which is a keyword) or a type (which are by convention TitleCase). (We're not quite the same as golang here, and don't have support them interchangably.)
type Foo int
is a declaration using a kind keyword, so it's lowercase.
type Foo struct { Field Bool }
is refering to a type, so it's using a name.
(There's an implicit prelude in a schema which defines type Bool bool
, type String string
, etc, so those are all available as a type name to make this less arduous.)
block-layer/graphsync/graphsync.md
Outdated
Cancel bool # whether this cancels a request | ||
Update bool # whether this is an update to an in progress request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance of type name should be TitleCase
block-layer/graphsync/graphsync.md
Outdated
Prefix bytes # CID prefix (cid version, multicodec and multihash prefix (type + length) | ||
Data bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instance of type name should be TitleCase
block-layer/graphsync/graphsync.md
Outdated
|
||
type GraphSyncRequest struct { | ||
Id GraphSyncRequestID # unique id set on the requester side | ||
Root &Any # a CID for the root node in the query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda an interesting one. It's a link, alright, so this line as stated is certainly valid. But I'd like to think out loud about it for a moment.
While this is a link, it's not a link we'd ever really mean to traverse when processing the graphsync messages. It's something we expect the agent processing this message to pull out -- as a link, not immediately traversing it -- and then do some logic before it dives in to operations that load that link.
It was just "bytes" in the old protobuf, and that worked fine because of the above. And it seems we could readily still have it be Bytes
in this revamp, too. In the old implementation with protobuf, the application logic did a step to reify those bytes into a CID. We could have that step be explicit in a new implementation too, if we wanted to.
This doesn't seem to make a huge difference either way. I guess the main discernible difference is that if someone ran a selector with a bunch of wildcards in it over the GraphSyncRequest itself, if this field is &Any
, they might get a mouthful back, and if it's Bytes
, the results wouldn't recurse as far. But does this matter? Not really, as far as I can imagine. (I don't know why someone would do that; and even if they did, selectors are full of mechanisms that can be used to control this.)
Alright, that's it for thinking out loud. This seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, that is an interesting point! probably doesn't matter here but worth keeping in mind that viewing this schema from a certain lens might lead to a misunderstanding about intent. &Any
does provide some constraints on what the bytes can be, mainly the leading varints must make for a valid CID, but beyond that if you're not traversing then it really is bytes.
We also typically give fields (When you feed this into codegen in golang, it's going to generate accessor methods with appropriate export case.) |
Syntax nittery aside, this logically LGTM :D |
Re: lowercase names -- FYI, this is the opposite of cbor-gen, though I guess I could use the rename facility to handle this. Actually, usually cbor-gen uses tuple representation so naming is irrelevant. Unfortunately, when using map representation it does not allow renames. I have a PR to do this but it remains unmerged. (whyrusleeping/cbor-gen#46) Sorry to keep referencing cbor-gen -- but if go-ipld-prime is to get merged through the filecoin ecosystem (which is my hope eventually) , these two will need to move toward compatibility. |
@rvagg yes the intent is to support both v1.0.0 and v1.1.0 for a time -- protocol negotiation is done at the libp2p level - the protocol name is already versioned, so we simply add a fallback if 1.1.0 doesn't work. We do this in Bitswap and several other protocols. |
f00894a
to
23eea5a
Compare
@rvagg & @warpfork I believe I've addressed the comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty darn reasonable to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the schema works really well as a documentation tool here 👍
What's still needed to advance this? At last read, both this and #355 (I assume that's the other follow-up PR you mentioned) look reasonable to me. If this is pending an implementation, and we don't expect that to happen presently, should we... perhaps merge both these PRs, but then also relocate the file and give it a header to mark it explicitly as a future-oriented draft? (I'm happy to do the git-fu and commenting on that, if that's the path forward.) |
Ping for if we can merge this :) |
Goals
Move graphsync to a CBOR based protocol instead of one based on protobuf
Implementation
Graphsync messages should be valid IPLD data structures encoded in DAG-CBOR. I have constructed an IPLD schema to represent what they should look like.
I've also promoted Metadata out of an extension to a core part of the response message as opposed to an extension -- this is a frequent design issue with v1.0.0.
For Discussion
This is a first stab and I have many questions: