Include per-provider bitswap interactions in response timing headers #348

willscott · 2023-07-06T16:48:43Z

With #332 we have response headers on a per-peer basis.
We should make sure these also include attempts / observed bitswap peers as we get visibility into those connections.

In particular, if we get back bitswap peers from IPNI, but fail to connect to them, the response headers should indicate that / which peers couldn't be contacted / could-be-contacted-but-didn't-serve-data

rvagg · 2023-07-07T07:15:12Z

How useful is this kind of information if we only get back data on peers up until the point at which someone starts serving us data and therefore we write data out and spit out the headers in their current state? It's not going to be very representative, or are we just looking for scraps of information about peers over a large enough number of calls to make some kind of inference?

Event recording or other instrumentation would be a more reliable mechanism for this wouldn't it?

willscott · 2023-07-07T15:01:35Z

it will still make it visible when a peer breaks / becomes unavailable, because we'll start to see a bunch of connection attempts that don't resolve until the overall request either errors or completes with someone else.

hannahhoward · 2023-07-13T23:44:52Z

@willscott you want to note when the actual dial attempt for a peer in Bitswap doesn’t work? I’m not sure that’s currently accessible from within Bitswap, which handles the actual dialing. I guess we could attempt to dial ourselves also and record the result -- if Bitswap already dialed, the connect attempt would happen instantaneously, same the other way if our dial succeeded first.

hannahhoward · 2023-07-13T23:46:42Z

@rvagg assuming this lands on you, you could put dial attempts in https://github.com/filecoin-project/lassie/blob/main/pkg/retriever/bitswaphelpers/indexerrouting.go and then dispatch events accordingly.

hannahhoward · 2023-07-13T23:47:43Z

#345 adds something like this, but I think is not what you all care about -- it enables tracking of data received by peer including through bitswap.

willscott · 2023-07-13T23:48:14Z

that seems plausible - if slightly inefficient.

Two goals:

in a failure case where saturn needs to render an error page, we want to say:

was found on providers <a, b, c>
Providers <a, b> could not be connected to
Provider was connected to but didn't provide the data.

as a trailer that is just consumed on the saturn l1, there's a desire to know how upstream data was obtained / from which peer IDs.

hannahhoward · 2023-07-19T02:16:19Z

when we say failure case, are we talking about only non-200 responses for Lassie or also premature stream close errors?

I assume for non-200 responses, we can say that those we could not get data from are simply the diff between providers found and providers we couldn't connect to

For premature stream close, we received some data from some peers presumably -- but eventually found a block where no data was returned from anyone. Should we simply say we failed with all peers? Or maybe do we need the CID we failed on?

hannahhoward · 2023-07-19T02:19:22Z

I believe #345 is ultimately a blocker here, which in turn is blocked on ipfs/boxo#308

rvagg · 2023-09-21T04:43:29Z

@hannahhoward @willscott thoughts on the status of this? We could separate it out into two separate questions—the continued use of server-timings and whether it's really the best tool to achieve the goals here, and the actual collecting of dial attempts within lassie.

We could do the latter separately to the former, maybe even record all the failed dial attempts as failed retrieval attempts in our database; although we're yet to see what kind of volume this information requires (I'm concerned it might be way too much information even just storing the successful block retrievals we've enabled with v0.18.0).

It might also be the case that pre-dialing the peers before handing off to bitswap could get us past some of the problems we have with bitswap that are currently requiring us to pin to a particular boxo commit.

hannahhoward assigned willscott Jul 13, 2023

hannahhoward assigned rvagg Jul 13, 2023

hannahhoward added saturn L1-node Related to filecoin-saturn L1-Nodes rhea Related to project Rhea labels Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include per-provider bitswap interactions in response timing headers #348

Include per-provider bitswap interactions in response timing headers #348

willscott commented Jul 6, 2023

rvagg commented Jul 7, 2023

willscott commented Jul 7, 2023

hannahhoward commented Jul 13, 2023

hannahhoward commented Jul 13, 2023

hannahhoward commented Jul 13, 2023

willscott commented Jul 13, 2023

hannahhoward commented Jul 19, 2023 •

edited

Loading

hannahhoward commented Jul 19, 2023

rvagg commented Sep 21, 2023

Include per-provider bitswap interactions in response timing headers #348

Include per-provider bitswap interactions in response timing headers #348

Comments

willscott commented Jul 6, 2023

rvagg commented Jul 7, 2023

willscott commented Jul 7, 2023

hannahhoward commented Jul 13, 2023

hannahhoward commented Jul 13, 2023

hannahhoward commented Jul 13, 2023

willscott commented Jul 13, 2023

hannahhoward commented Jul 19, 2023 • edited Loading

hannahhoward commented Jul 19, 2023

rvagg commented Sep 21, 2023

hannahhoward commented Jul 19, 2023 •

edited

Loading