Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCIP-read fallback on network errors #4842

Closed
eth-limo opened this issue Sep 23, 2024 · 14 comments
Closed

CCIP-read fallback on network errors #4842

eth-limo opened this issue Sep 23, 2024 · 14 comments
Labels
bug Verified to be an issue. fixed/complete This Bug is fixed or Enhancement is complete and published. next-patch Issues scheduled for the next arch release.

Comments

@eth-limo
Copy link

eth-limo commented Sep 23, 2024

Describe the Feature

Hello,

Per this thread: https://discuss.ens.domains/t/erc-3668-edge-case-for-clientside-http-ssl-errors/19617, it looks like network errors or non-http error responses can prevent CCIP clients from falling back to a secondary gateway URL. I believe this is also impacting the CCIP-read client in ethers.

This would be a great addition to ethers.js as user implemented offchain resolvers continue to grow.

Thanks!

Code Example

No response

@eth-limo eth-limo added the enhancement New feature or improvement. label Sep 23, 2024
@ricmoo
Copy link
Member

ricmoo commented Sep 30, 2024

Working on reproduction steps now, but if anyone has an existing CCIP-read contract set up that can replicate this issue, it'd be appreciated. :)

@ricmoo
Copy link
Member

ricmoo commented Oct 1, 2024

I've recreated the original CCIP-Read Test Contract; the original source was lost, but the new version should be completely compatible. It also now has a test to reproduce the problem in this issue, so a fix should be available soon.

Please see the test contract.

@ricmoo ricmoo added bug Verified to be an issue. on-deck This Enhancement or Bug is currently being worked on. next-patch Issues scheduled for the next arch release. and removed enhancement New feature or improvement. labels Oct 1, 2024
@ricmoo
Copy link
Member

ricmoo commented Oct 1, 2024

This has been fixed in v6.13.3.

Let me know if you still have any issue!

Thanks! :)

@eth-limo
Copy link
Author

eth-limo commented Oct 1, 2024

Thanks @ricmoo!

@ricmoo ricmoo added fixed/complete This Bug is fixed or Enhancement is complete and published. and removed on-deck This Enhancement or Bug is currently being worked on. labels Oct 16, 2024
@ricmoo ricmoo closed this as completed Oct 16, 2024
@adraffy
Copy link

adraffy commented Oct 17, 2024

This only catches networking setup issues.

try {
resp = await request.send();
} catch (error: any) {
// ...low-level fetch error (missing host, bad SSL, etc.),
// so try next URL
errorMessages.push(error.message);
this.emit("debug", { action: "receiveCcipReadFetchError", request, result: { error } });
continue;

The first gateway can still just return 4XX and nuke the entire request.

assert(resp.statusCode < 400 || resp.statusCode >= 500, `response not found during CCIP fetch: ${ errorMessage }`,

@ricmoo
Copy link
Member

ricmoo commented Oct 17, 2024

Isn’t that part of the specification? The goal of fallbacks is to address backend failure, but a 404 (for example) is meant to be authoritative that the resource doesn’t actually exist. Not existing is a valid value for a thing to be.

@adraffy
Copy link

adraffy commented Oct 17, 2024

I guess the spec technically says:

If the response code from step (5) is in the range 400-499, return an error to the caller and stop.

However, I think the intention was first success or any failure. ie. Promise.any()


In recursive CCIP, a single 4XX anywhere will terminate the entire request even though that might of only been of 1 of N parts.

In trust-minimized situations, we potentially can't trust the url set.

It prevents try-catch in Solidity when invoking an unknown CCIP-Read.

Maybe this will have to be revisited on the forums.

@ricmoo
Copy link
Member

ricmoo commented Oct 17, 2024

Yeah. I thought I saw a post in the forums regarding this as well, at some point.

If we can't trust the URL set for a 4xx result though, we also can't trust it for a 200 result. There is nothing special about 4xx compared to 200 regarding a resource's value.

A 5xx on the other hand indicates a legitimate "I don't know what's going on; please ask someone else". :)

@adraffy
Copy link

adraffy commented Oct 17, 2024

A malicious 200 response can be rejected and avoided by the contract. This is how you can have an untrusted url set where you only need 1 of N to return 2XX and be accepted by the contract.

The opposite is not true, a malicious 4XX just stops.

I have a proof of concept which can avoid everything except 4XX.

I think the issue is that 3668 was written with trusted gateways in mind, but with proof verification machinery, like OffchainDNSResolver and EVMGateway, the contract itself can verify and reject invalid responses.

The objective of CCIP-Read should be to feed the verifier responses until it accepts one, not assuming they're trusted and respecting the first failure.

@ricmoo
Copy link
Member

ricmoo commented Oct 18, 2024

Oh. Very good point. What we want then is an authenticated form of 4xx. A 4xx can still have a body; maybe data with length congruent 4 mod 32 could be used (à la Solidity errors) to indicate an error and proof the error is authentic…

I definitely think the plan (and I’ve always assumed) is trusted backends. But an intermediate link like a proxy could always lie with a 4xx.

@adraffy
Copy link

adraffy commented Oct 18, 2024

Ya, this probably requires an ERC update, but everything should be considered untrusted and malicious. I will reopen the issue on the ENS forums.

I think the proper algorithm is the following:

  1. if you run out of endpoints, call the callback with OffchainLookupUnanswered() eg. 0xc584553e
  2. if the server cannot be reached, try next endpoint
  3. regardless of status code, if the response is not JSON or lacks {data: HexString}, try next endpoint
  4. call contract callback
  5. if it reverts with OffchainTryNext() eg. 0x613746c6, try next endpoint
  6. if it reverts with OffchainLookup()
    • require sender match
    • replace endpoints with new endpoints
    • try next endpoint
  7. return the result

This is backwards compatible with existing standard but gives full control to the contract. This implementation is also tail-recursive and has no stack issues.

@ricmoo
Copy link
Member

ricmoo commented Oct 18, 2024

My concern with that is that it then becomes a DoS vector, since you cause every request to hit every URL in the set by, for example, triggering a link to an NFT that doesn’t exist.

CCIP is a fairly basic protocol. To add additional complexity, I think a whole new protocol should be considered. You can imagine adding round-robin, random-order, quick-fail to the URL set, adding authenticated errors and the feature I want most is “call another chain” URLs. An INFURA provider connected to mainnet knows how to connect to and talk to optimism; I’d love it if URLs could be something like "chain://id=1337&data=0x1234${data}&to=blah"…

@adraffy
Copy link

adraffy commented Oct 18, 2024

The above protocol lets the contract implement all of those things. The contract can test endpoints in any order. The contract can process, propagate, or recover from errors. The contract can reject bad endpoints.

I don't see how there's a DoS attack that doesn't already exist. If I give you N urls, that don't respond 2XX or 4XX, it still hits them all. And more generally, any browser can just for-loop spam any server.

The solution to cross-chain reads is EVMGateway, however it can only be decentralized, and offer untrusted URL set if a single malicious gateway doesn't nuke the whole request.

@adraffy
Copy link

adraffy commented Oct 18, 2024

A real world example of how this is broken (not your implementation, but 3668 as-is) can be seen by trying to wrap Coinbase's CCIP-Read servers for cb.id and base.eth

Their server incorrectly enforces that sender must be their contract. This blocks recursive CCIP, which is outlined in 3668. Since it blows up 4XX, it's unrecoverable.

Because a 4XX is unrecoverable, you can't even test if a contract you're trying to wrap can't be wrapped, meaning wrapping is dead (unless you control the whole stack, in which case you'd never wrap). ENS had to create an open proxy service to solve this, which proxies everyone's resolutions to their servers—huge information leak!

Even if all that was fixed, Coinbase server still return 4XX for functions that don't exist, which means you can't probe a remote server for capabilities. Imagine wanting to test for supportsInterface() or multicall() — well you can't because it blows up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Verified to be an issue. fixed/complete This Bug is fixed or Enhancement is complete and published. next-patch Issues scheduled for the next arch release.
Projects
None yet
Development

No branches or pull requests

3 participants