Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

Conversation

@willscott
Copy link
Contributor

Here's the difference between the current findProvidersAsync and the StoreTheIndex v0 finder structures.

@welcome
Copy link

welcome bot commented Dec 5, 2021

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@willscott
Copy link
Contributor Author

cc @gammazero, @aschmahmann

@willscott
Copy link
Contributor Author

@warpfork: what is the recommended ipldsch representation of an 'open union' - we can further specify the metadata union bytes when the keyed number 'ProtocolId' is one of the known / agreed-upon variants

@warpfork
Copy link
Member

warpfork commented Dec 6, 2021

IPLD Schemas have no innate support for that, if I understand what you mean. If one needs to do business logic to parse things, or defer parsing entirely because the structure could be completely unknown but still considered valid, then there's not much useful that can be done other than describing the data as Any, as far as I can figure. (Or &Any, which also means that the codec parse gets pushed off until another step later too.)

I imagine one could use a secondary schema to pattern match further on the data inside, in multiple passes, after it's been parsed into the data model while being only described as Any. (That would be complex, and not single pass, yes -- but that's an accurate representation of what would be mechanically needed, even if there were syntactic facades over it; I'm not actually sure we want such a thing to be easy to describe: it's sort of a negative-feedback form of mechanical sympathy.)

(It's sorta like having an embedded messages in protobuf messages.)

Comment on lines +22 to +25
type ProviderResult struct {
Metadata Metadata
Provider Provider
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott I wanted to clarify + document some things I understood from our conversation today.

  1. This schema is representing how the indexers are currently storing their data internally
  2. The system is trying to be agnostic about the types of records it supports and as a result has defined the things it cares about (e.g. who published who has the data rather than who has the data) while relegating everything else to the metadata field
  3. The metadata field then becomes the primary point of interest for us. Could you or @gammazero give a (even if rough) schema for what the current metadata things look like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, following up from discussions today the metadata schema seems like it'll be important within the indexer as well as once you separate the "who published these records" from "who has these records" you'll want some way to evaluate if the records are good for your own reputation system purposes which means understanding some (even if not all) of the record types.

Copy link

@gammazero gammazero Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann

  1. The schema is representing an answer to a client asking, "who has the data". Who published this record is not part of this, and that is a separate concern from delegated routing and responses to clients.
  2. Only the metadata is trying to be agnostic about what records it supports since that is consumed mostly by the provider (who has the data) to determine where/how to retrieve that data.
  3. The metadata contains a "protocol" field which determines the protocol used to retrieve data (graphsync, bitswap). This protocol field is used by the client that wants the data. The remainder of the metadata is a payload that is used only by the provider. It is likely something like a deal ID, but could be anything (e.g. record key for their internal database) that tells the provider how to find and retrieve the data.

Comment on lines 4 to 5
| GetP2PProvideResponse "get-p2p-provide-response"
} representation keyed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warpfork continuing from #7 (comment) (GitHub's lack of threads are brutal and I'd like to isolate this topic a bit).

hen there's not much useful that can be done other than describing the data as Any, as far as I can figure.

Can Any currently be parsed by codegen?

but that's an accurate representation of what would be mechanically needed, even if there were syntactic facades over it; I'm not actually sure we want such a thing to be easy to describe: it's sort of a negative-feedback form of mechanical sympathy

Is this necessarily true? Say I have a keyed union for the keys "foo" and "bar", but my parser sees a "baz". Why would having a keyed union like the below be so rough to process, as soon as we notice the key isn't "foo" or "bar" we just treat it like an Any which offhand doesn't seem like it'd be more work than processing a regular field that happens to be an Any.

type OpenKeyedUnion union {
     | String "foo"
     | String "baz"
     | Any
} representation keyed

Also, some union types are more efficient than others for parsing so adding one more that eases consumer pain here doesn't seem too bad, but I could be missing things so lmk 😄.

@BigLep BigLep linked an issue Mar 9, 2022 that may be closed by this pull request
@BigLep BigLep mentioned this pull request Mar 9, 2022
@BigLep
Copy link

BigLep commented Apr 22, 2022

Resolving because #8 has been closed. We have the schema defined.

@BigLep BigLep closed this Apr 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Determine v1 schema

5 participants