-
Notifications
You must be signed in to change notification settings - Fork 9
Represent storetheindex v0 finder protocol in schema #7
Conversation
|
Thank you for submitting this PR!
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. |
|
@warpfork: what is the recommended ipldsch representation of an 'open union' - we can further specify the metadata union bytes when the keyed number 'ProtocolId' is one of the known / agreed-upon variants |
|
IPLD Schemas have no innate support for that, if I understand what you mean. If one needs to do business logic to parse things, or defer parsing entirely because the structure could be completely unknown but still considered valid, then there's not much useful that can be done other than describing the data as I imagine one could use a secondary schema to pattern match further on the data inside, in multiple passes, after it's been parsed into the data model while being only described as (It's sorta like having an embedded messages in protobuf messages.) |
| type ProviderResult struct { | ||
| Metadata Metadata | ||
| Provider Provider | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott I wanted to clarify + document some things I understood from our conversation today.
- This schema is representing how the indexers are currently storing their data internally
- The system is trying to be agnostic about the types of records it supports and as a result has defined the things it cares about (e.g. who published who has the data rather than who has the data) while relegating everything else to the metadata field
- The metadata field then becomes the primary point of interest for us. Could you or @gammazero give a (even if rough) schema for what the current metadata things look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, following up from discussions today the metadata schema seems like it'll be important within the indexer as well as once you separate the "who published these records" from "who has these records" you'll want some way to evaluate if the records are good for your own reputation system purposes which means understanding some (even if not all) of the record types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The schema is representing an answer to a client asking, "who has the data". Who published this record is not part of this, and that is a separate concern from delegated routing and responses to clients.
- Only the metadata is trying to be agnostic about what records it supports since that is consumed mostly by the provider (who has the data) to determine where/how to retrieve that data.
- The metadata contains a "protocol" field which determines the protocol used to retrieve data (graphsync, bitswap). This protocol field is used by the client that wants the data. The remainder of the metadata is a payload that is used only by the provider. It is likely something like a deal ID, but could be anything (e.g. record key for their internal database) that tells the provider how to find and retrieve the data.
| | GetP2PProvideResponse "get-p2p-provide-response" | ||
| } representation keyed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warpfork continuing from #7 (comment) (GitHub's lack of threads are brutal and I'd like to isolate this topic a bit).
hen there's not much useful that can be done other than describing the data as Any, as far as I can figure.
Can Any currently be parsed by codegen?
but that's an accurate representation of what would be mechanically needed, even if there were syntactic facades over it; I'm not actually sure we want such a thing to be easy to describe: it's sort of a negative-feedback form of mechanical sympathy
Is this necessarily true? Say I have a keyed union for the keys "foo" and "bar", but my parser sees a "baz". Why would having a keyed union like the below be so rough to process, as soon as we notice the key isn't "foo" or "bar" we just treat it like an Any which offhand doesn't seem like it'd be more work than processing a regular field that happens to be an Any.
type OpenKeyedUnion union {
| String "foo"
| String "baz"
| Any
} representation keyed
Also, some union types are more efficient than others for parsing so adding one more that eases consumer pain here doesn't seem too bad, but I could be missing things so lmk 😄.
|
Resolving because #8 has been closed. We have the schema defined. |
Here's the difference between the current
findProvidersAsyncand the StoreTheIndex v0finderstructures.