Skip to content

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented May 22, 2025

This PR adds an HTTP retrieval check (close #73)

  • The multiaddresses of the peer are passed to a function that performs HTTP retrieval check for the CID
  • If no multiaddresses are */http or cannot connect, the relevant errors are returned.
  • If there is no support for HEAD, this is indicated
  • We do not download the CID, we just perform a HEAD request and inform if the destination has the CID (200). Downloading the content opens us to a waste of bandwidth for a request that the user can perform themselves.

Related ipshipyard/roadmaps#12

@2color
Copy link
Member

2color commented May 23, 2025

We do not download the CID, we just perform a HEAD request and inform if the destination has the CID (200). Downloading the content opens us to a waste of bandwidth for a request that the user can perform themselves.

Do Boxo/Rainbow/Kubo also try a HEAD request first before trying to download? Or just optimistically try a GET?

@hsanjuan
Copy link
Contributor Author

Do Boxo/Rainbow/Kubo also try a HEAD request first before trying to download? Or just optimistically try a GET?

They do HEAD if the endpoint supports it.

@2color
Copy link
Member

2color commented May 23, 2025

They do HEAD if the endpoint supports it.

How do they know if the endpoint supports without trying?

Will they try to download if the endpoint doesn't support HEAD?

@hsanjuan hsanjuan force-pushed the http-retr branch 2 times, most recently from a746211 to 7db6ecf Compare May 26, 2025 08:13
@hsanjuan
Copy link
Contributor Author

Some screenshots:

image
image

@hsanjuan
Copy link
Contributor Author

They do HEAD if the endpoint supports it.

How do they know if the endpoint supports without trying?

Will they try to download if the endpoint doesn't support HEAD?

There is a first "Connect" step where we probe the peers/http addresses. We probe HEAD and if that doesn't work, GET. We request the identity CID. Whether the peer supports HEAD is stored in the peerstore. With that information, subsequent requests for WANT_HAVE are performed via HEAD if supported, if not, a GET requests is triggered.

In our case here, we don't trigger GET requests if HEAD is not supported (which we know after probing).

@hsanjuan hsanjuan marked this pull request as ready for review May 26, 2025 08:31
@hsanjuan hsanjuan changed the title Add HTTP Retrieval check (WIP) Add HTTP Retrieval check May 26, 2025
@hsanjuan
Copy link
Contributor Author

(can be reviewed while I make tests)

@2color
Copy link
Member

2color commented May 26, 2025

They do HEAD if the endpoint supports it.

How do they know if the endpoint supports without trying?
Will they try to download if the endpoint doesn't support HEAD?

There is a first "Connect" step where we probe the peers/http addresses. We probe HEAD and if that doesn't work, GET. We request the identity CID. Whether the peer supports HEAD is stored in the peerstore. With that information, subsequent requests for WANT_HAVE are performed via HEAD if supported, if not, a GET requests is triggered.

In our case here, we don't trigger GET requests if HEAD is not supported (which we know after probing).

From the large providers, do you know which already support HEAD?

@hsanjuan
Copy link
Contributor Author

From the large providers, do you know which already support HEAD?

Pinata and Storacha do for sure.

@hsanjuan hsanjuan requested a review from 2color May 26, 2025 19:00
web/index.html Outdated

if (respObj.DataAvailableOverHTTP.Enabled === true) {
if (respObj.DataAvailableOverHTTP.Error !== "") {
outText += "❌ There was an error downloading the CID via HTTP: " + respObj.DataAvailableOverHTTP.Error + "\n"
Copy link
Member

@lidel lidel May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan Will this be how we show that the endpoint is missing HTTPS or that TLS certificate validation failed?

We need to surface that in a clear way, to help storage providers fix their setups without bothering us for analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about certificate validity, but we do care about http2 (which doesn't get setup unless certificate is valid I think).

In that case, when no http/2 present, we would return an error which would be displayed here.

However we return a standard "no valid addresses" error. We would need to be more specific regarding the error, but that needs to be fixed in boxo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about certificate validity

I might be missing something but why not? Without valid certs you won't be accessible from a browser or by default from boxo, kubo, etc. so why wouldn't we care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok the browser, that thing... so we do care, sorry.

@2color
Copy link
Member

2color commented May 29, 2025

When running a check with an HTTPS maddr I consistently get a 500 with context deadline exceeded.

cid: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
multiaddr: /dns4/dag.w3s.link/tcp/443/https/p2p/QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp
ipniIndexer: https://cid.contact
timeoutSeconds: 60
httpRetrieval: on

Adds a mock trustless gateway server and a mock routing v1 server and checks
that finding a CID works via both.
@hsanjuan
Copy link
Contributor Author

When running a check with an HTTPS maddr I consistently get a 500 with context deadline exceeded.

cid: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
multiaddr: /dns4/dag.w3s.link/tcp/443/https/p2p/QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp
ipniIndexer: https://cid.contact
timeoutSeconds: 60
httpRetrieval: on

This is an issue with the existing code: it will timeout on DHT lookups and fail everything that comes after, including testing bitswap providers found via IPNI. In production the issue is not apparent because AcceleratedDHT is true, and the DHT lookup finishes quickly within the timeout. Fixing that warrants a bit of refactoring, merging of the two check paths (with and without one given multiaddr, it run the same checks) etc.

@hsanjuan hsanjuan requested review from 2color and lidel May 30, 2025 15:02
@2color
Copy link
Member

2color commented Jun 2, 2025

This is an issue with the existing code: it will timeout on DHT lookups and fail everything that comes after, including testing bitswap providers found via IPNI. In production the issue is not apparent because AcceleratedDHT is true, and the DHT lookup finishes quickly within the timeout. Fixing that warrants a bit of refactoring, merging of the two check paths (with and without one given multiaddr, it run the same checks) etc.

It's strange that the with one maddr + cid check would fail because of a DHT lookup, causing it to not achieve it's main goal of checking retrievability from the peer. I just pushed 8745dff (feel free to revert) thinking we can reduce the chances of a timeout by doing more work concurrently (though it didn't help with this).

What do you think about only performing the DHT checks after or while we check HTTP retrievability (the primary purpose of this check), since it's not dependent on the results from the DHT/IPNI lookups, and is likely a cheaper check to run anyways. This would result in a useful result, even if we don't have the results for ProviderRecordFromPeerInDHT, ProviderRecordFromPeerInIPNI, PeerFoundInDHT (we'd need to adjust for a more graceful failure).

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Jun 2, 2025

probably 8745dff doesn't fix it. As the idea is also that DHT addresses are used to test retrieability later iirc. And I think this will still block.

My opinion is that:

  • things work in production due to accelerated dht
  • no need to make the code flow more complicated than it is and we can leave things as they are.
  • these issues warrant a refactor, and if that happens, it is worth doing a big refactor so that the same flow is followed in all checks, and the given peer address is just a hint that gets added to everything else discovered.

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Jun 2, 2025

Otherwise:

  • HTTP check can be done in parallel to the rest.
  • DHT lookup should have its own timeout, so that other checks have time to proceed.

@2color
Copy link
Member

2color commented Jun 2, 2025

I think that this working in production or with the accelerated DHT client ins't good reason to not fix this as part of this PR.

I don't have a strong opinion on how, but adding a separate timeout for the DHT lookups seems reasonable, given that for routable peers/cids, we should be able to get a response under 5 seconds. If we don't find it in that time window, the chances are pretty slim that waiting out the default 60 second timeout is going to yield any results.

Also, we don't even support HTTP providers yet in the DHT, so we're doing all of this work right now, blocking on the main purpose of this check.

Copy link
Member

@2color 2color left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I pushed two fixes:

  • Add separate timeout for dht peer lookup in peer-specific check
  • make sure that when we don't filter out http providers from IPNI, when doing a peer-specific check (so that we can correctly render whether an http provider was found in the IPNI)

@hsanjuan hsanjuan merged commit 9d34f34 into main Jun 3, 2025
8 checks passed
@hsanjuan hsanjuan deleted the http-retr branch June 3, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for testing HTTP retrieval

4 participants