Skip to content

Retrieval query ask v2 #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

hannahhoward
Copy link
Collaborator

Goals

fix #666 #667 #668

Implementation

  1. Generate query ask v2 types & ipld schema based on the one documented in QueryAsk v2 Protocol #666
  2. Create a libp2p implementation to send and request v2 Query's & QueryResponses -- retrievalmarket/lp2pimpl, based on storagemarket/lp2pimpl
  3. Build a retrieval provider which implements query handling. Most of the code here is copied from the go-fil-markets retrieval provider query handling, with additional logic to add http info when neccesary, and to handle piece only queries
  4. I also removed the RetrievalProviderNode abstraction since the storagemarket side in boost removes this abstraction
  5. Setup a test of the retrieval query system, based on tests in go-fil-markets retrievalmarket impl, plus additional tests for http logic
  6. Add config for HTTPRetrievalURL to DealMaking config
  7. Add builder logic to construct the retrieval provider for boost
  8. Add a cli command to send a query and output a response, specifically for piece only retrievals for now (tailored for Evergreen usage)

For Discussion

  • to do: complete an integration test, manual test the retrieval command

Comment on lines +138 to +140
{(*QueryProtocols)(nil), "QueryProtocols"},
{(*GraphsyncFilecoinV1Response)(nil), "GraphsyncFilecoinV1Response"},
{(*HTTPFilecoinV1Response)(nil), "HTTPFilecoinV1Response"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these aren't used directly via the registry but they are wired up through QueryResponse so they shouldn't need to be registered here.

# Query is a query to a given provider to determine information about a piece
# they may have available for retrieval
type Query struct {
# If kind is Payload, response will error if not included
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some space vs tabs inconsistencies in this file

if resp.Error != "" {
return fmt.Errorf("query response error: %s", resp.Error)
}
fmt.Println(resp.Protocols.HTTPFilecoinV1.URL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resp.Protocols.HTTPFilecoinV1 could be nil couldn't it? Maybe something like this:

Suggested change
fmt.Println(resp.Protocols.HTTPFilecoinV1.URL)
if resp.Protocols.HTTPFilecoinV1 != nil {
fmt.Println(resp.Protocols.HTTPFilecoinV1.URL)
} else {
return fmt.Errorf("peer does not support HTTP retrievals")
}


// Write the retrieval query to the stream
// Write the re to the client
err = types.BindnodeRegistry.TypeToWriter(query, s, dagcbor.Encode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe at this point it should interrogate query to see if it's valid, i.e. has a PieceCID or a PayloadCID, before proceeding to send it since it'll be rejected anyway


# QueryResponseError indicates something went wrong generating a query response
| QueryResponseError ("2")
} representation int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string enums are a lot nicer to see when looking at message contents and they shouldn't cost much more if you choose short names; I think we have a couple of places of regret in the current filecoin stack where int enums exist and we could have gone with strings (IIRC there was a discussion in graphsync when we did the initial bindnode convert). Up to you, not a big deal but the message format here is already fairly verbose and would be pretty nice to look at if you're just decoding the raw messages because of all the maps, but then you get to these numbers and they're not self explanatory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'd favour strings 👍

@hannahhoward hannahhoward force-pushed the feat/retrieval-query-ask-v2 branch 2 times, most recently from 43b6abf to f4e0672 Compare July 31, 2022 09:06
@hannahhoward hannahhoward requested a review from nonsense August 5, 2022 15:56
@jacobheun jacobheun requested a review from dirkmc August 5, 2022 16:35
@hannahhoward hannahhoward marked this pull request as ready for review August 10, 2022 03:38
@jacobheun jacobheun added this to the v1.4.0 milestone Aug 10, 2022
@hannahhoward hannahhoward force-pushed the feat/retrieval-query-ask-v2 branch from eabe821 to 62e8a24 Compare August 12, 2022 23:32
}
opts := []libp2p.Option{
libp2p.DefaultTransports,
libp2p.ListenAddrStrings("/ip4/127.0.0.1/tcp/0"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the ListenAddrStrings option if we're also supplying NoListenAddr?

}

// Query sends a retrieval query v2 to another peer
func (c *RetrievalClient) Query(ctx context.Context, providerID peer.ID, query types.Query) (*types.QueryResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the RetrievalClient only has one method, I wonder if we can consolidate everything into the QueryClient?

// Start the Boost RP
log.Info("starting boost retrieval provider")
lp2pnet.Start(ctx)
log.Info("boost storage provider started successfully")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("boost storage provider started successfully")
log.Info("boost retrieval provider started successfully")

Comment on lines +392 to +394
// Wait for the legacy SP to fire the "ready" event before starting
// the boost SP.
// Boost overrides some listeners so it must start after the legacy SP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct?

modules.HandleRetrieval(host, lc, lrp, j)
return nil
}
func HandleBoostRetrievals(lc fx.Lifecycle, h host.Host, prov *retrievalmarket.Provider, legacyRP lotus_retrievalmarket.RetrievalProvider) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename to HandleBoostRetrievalQueryAsk?

// add http payload url if we are supporting HTTP retrievals
if p.config.HTTPRetrievalURL != "" {
answer.Protocols.HTTPFilecoinV1 = &types.HTTPFilecoinV1Response{
URL: p.config.HTTPRetrievalURL + "/payload/" + q.PayloadCID.String() + ".car",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we just respond with the base URL. That way we can add more options in future without having to change any code here.
The paths can be "well-known", eg /payload/<cid> vs /payload/<cid>.car

// Given the CID of a block, find a piece that contains that block.
// If the client has specified which piece they want, return that piece.
// Otherwise prefer pieces that are already unsealed.
func (p *Provider) getPieceInfoFromCid(ctx context.Context, payloadCID, clientPieceCID cid.Cid) (piecestore.PieceInfo, bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a lot of the code in this file is similar to the code in markets. Can we share it instead of copy and pasting?


# QueryResponseError indicates something went wrong generating a query response
| QueryResponseError ("2")
} representation int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'd favour strings 👍

)

var retrievePieceCmd = &cli.Command{
Name: "retrieve-piece",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding a retrieve category, and renaming this action to query-ask, so the CLI command would be
boost retrieve query-ask --provider=t01234 --piece-cid=1234

I also don't think we really need to provide the piece cid. The path to get things can be "well-known", eg /piece/<cid>.car

if resp.Error != "" {
return fmt.Errorf("query response error: %s", resp.Error)
}
fmt.Println(resp.Protocols.HTTPFilecoinV1.URL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest outputting all protocols and let the user parse for the one they want to use.
Let's also implement output json if the --json flag is set, so that it's easy for people to write parsers (see boost deal for an example)

@dirkmc
Copy link
Contributor

dirkmc commented Aug 19, 2022

Current Design

  1. Client sends query over libp2p with piece cid / payload cid
    Server queries sealing subsystem to check if it's possible to retrieve the piece / block
    Server responds with either
  2. Client makes an HTTP request to get data
    Server queries sealing subsystem to check if it's possible to retrieve the piece / block
    Server responds with either
    • data
    • error message (eg no unsealed piece found)

Alternative Design

  1. Client sends query over libp2p to ask what protocols server supports
    Server responds with supported protocols (graphsync, http at http://1.2.3.4:1234/)
  2. Client makes an HTTP request to get data
    Server queries sealing subsystem to check if it's possible to retrieve the piece / block
    Server responds with either
    • data
    • error message (eg no unsealed piece found)

The reason I'm proposing the alternative design is because

  • It simplifies the libp2p part (no need to query sealing subsystem), just returns an IP & port (not the entire URL with the path to the specific car file)
  • Once the client knows the address of the miner's HTTP interface the client doesn't need to send 2 requests every time it asks for data (one request to libp2p and a second request to http). It can just directly query the http interface.

@nonsense nonsense deleted the feat/retrieval-query-ask-v2 branch January 17, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryAsk v2 Protocol
4 participants