Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache support #35

Merged
merged 5 commits into from
Oct 10, 2024
Merged

Conversation

nand2
Copy link
Contributor

@nand2 nand2 commented Oct 4, 2024

Hi!

This PR add support for caching of 2 types.
The aim of caching is to reduce RPC calls to the RPC providers.

The first type of caching is easy : it is a config entry pageCache.immutableUrlRegexps in which we declare a list of URLs we know are immutable. So the first time a page is loaded from RPC, then the result is saved in cache, and will be served from cache for later calls.

The second type of caching is a partial implementation of standard proxy HTTP caching : If we visualize web3url-gateway being a proxy, and the web3protocol-go library being a remote server for which we proxy : we implement standard HTTP caching based on ETag.
The mechanism is basically :

  • Request arrive
  • We check if we have it in cache
  • If the request has no If-None-Match cache invalidation header, and we have it in cache, then we inject an If-None-Match: <ETag stored in cache>, and we mark that it was manually injected
  • We forward the request to web3protocol-go
  • If the request response is a 304 (Not modified), and we manually injected If-None-Match : we return the cached request response and we stop here.
  • If the request response is a 200, and it has an ETag, then we save the request response in the cache
  • If the request response is a 200, and it has no ETag, and there was a page cache : we delete the page cache entry
  • We forward the request response to the client

So this is basically standard partial HTTP caching. The cache is a LRU cache that can be configured (max nb of entries, max size of entries, TTL).

The more interesting part is inside an update of web3protocol-go, which implements ERC-7774 ( ethereum/ERCs#652 -- a bit of work still needed), which allows resource request mode websites to send cache invalidation events. That way, the web3protocol-go listen for events, and can send HTTP code 200 or 304 (Not modified). The most important part is : as long as the content is not modified, a 304 (Not modified) response will not make a RPC call.

Final conclusion : for an homepage I was working on, it was making 16 eth_call RPC calls.
After I implemented ERC-7774 on web3protocol-go, and the HTTP cache in web3url-gateway, it was reduced to 2 eth_call RPC calls. After I added the immutable URL caching system in web3url-gateway (in which I cache some auto mode URLs), it is now reduced to 0 eth_call RPC calls!

So now web3url-gateway can handle heavy traffic on a web3:// website implementing ERC-7774.

@qzhodl qzhodl requested a review from syntrust October 8, 2024 10:06
if err != nil {
respondWithErrorPage(w, err)
return
}

// If cache invalidation headers where set from cache, and the response is 304, we can return
// the cached page
if cacheInvalidationHeadersSetFromCache && fetchedWeb3Url.HttpCode == 304 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to return it from the cache if the request header has If-None-Match field and cacheInvalidationHeadersSetFromCache == false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request header has If-None-Match (and so cacheInvalidationHeadersSetFromCache = false), then we want to return the unmodified 304 : in this scenario, the web client has already loaded the page previously, and has stored it in his local browser cache : there is no need to send again the body from the web3url-gateway cache.

So, let's say:

  • client A request for path P, which returns an ETag. web3url-gateway caches it, and the browser of client A cache it too.
  • client A request again path P, with If-None-Match. Because there is If-None-Match, web3url-gateway knows that client A has a local copy of the page, so we forward the call to web3protocol-go, and if the result is 304, we can forward it. --> Here, web3url-gateway act as a transparent proxy.
  • client B request path P. Because it doesn't have it in his browser cache, it doesn't send a If-None-Math. web3url-gateway see this, so it will act as the cache, and inject an If-None-Match before forwarding the call to web3protocol-go. If web3url-gateway see that the result is 304, because we manually injected If-None-Match, we know client B doesn't have it in his browser cache, so we substitute the 304 response by the web3url-gateway cache. --> Here, web3url-gateway act as a proxy injecting a cache layer.

Final note : with the implementation of ERC-7774, the calls to web3protocol-go don't trigger RPC calls (for websites implementing ERC-7774, and websites not updated), so there is no need to try to avoid web3protocol-go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the case of "the request header has If-None-Match" is handled automatically by HTTP server so we do nothing about it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ; in this scenario, the browser client has its own browser cache, so we only act as a transparent proxy, and we let web3protocol-go and the browser client discuss.

We can see web3url-gateway as a proxy which tries to be helpful : if it sees that the browser client does not have a cached version, but web3url-gateway has, then it tries to make use of his cached version : it does some "man-in-the-middle" by injecting an header on the request, and will send his cached version back to the browser client if web3protocol-go indicates 304 no changes.

if err != nil {
respondWithErrorPage(w, err)
return
}

// If cache invalidation headers where set from cache, and the response is 304, we can return
// the cached page
if cacheInvalidationHeadersSetFromCache && fetchedWeb3Url.HttpCode == 304 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the case of "the request header has If-None-Match" is handled automatically by HTTP server so we do nothing about it, right?

log.WithFields(logFields).Infof("Added page cache entry for %s", web3Url)
// If we got a HTTP 200 code, we don't cache the page, there was previously a cache entry,
// and the cache entry was of type PageCacheEntryTypeHttpCaching, we remove it from the cache
} else if fetchedWeb3Url.HttpCode == 200 && cacheEntryPresent && cacheEntry.Type == PageCacheEntryTypeHttpCaching {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean when fetchedWeb3Url.HttpCode == 200 without ETag? Do you have an example of it? If cacheEntryPresent should we replace it instead of remove it?

Copy link
Contributor Author

@nand2 nand2 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the website does not returns an ETag, it means it does not wish it to be cached (with the ETag mechanism).

So the scenario would be :

  • Browser A request /path , web3protocol-go returns a body with an ETag, web3url-gateway caches it.
  • Browser A request /path again, with If-None-Match, web3url-gateway forwards the call to web3protocol-go, which returns 304. The 304 is sent back to the browser.
  • The website is being modified by the author
  • Browser A request /path again, with If-None-Match, web3url-gateway forwards the call to web3protocol-go, which returns 200 with a new ETag. In this case, we enter if willCacheResponseAsType != "" { in line 316, and web3url-gateway updates it cache. It then forward the response to the browser.
  • The website is again being modified by the author
  • Browser A request /path again, with If-None-Match, web3url-gateway forwards the call to web3protocol-go, and this time the website decides to return a 200 without an ETag (because the website has decided it does not want to cache this updated version, or it cannot safely generate a unique ETag). In this case, we enter line 347 } else if fetchedWeb3Url.HttpCode == 200 && cacheEntryPresent && cacheEntry.Type == PageCacheEntryTypeHttpCaching { and we want to delete the cache, because the website just told he no longer want the page to be cached.

Now, thinking more about it, one thing I should change is : I should clear the cache not only when HTTP code is 200, but also for any 2xx, 4xx, 5xx HTTP code (maybe 3xx too, need to research a bit).
Example : in our last scenario, the website could have been modified by the author to unpublish a page, so now /path returns 404 (and 404 without ETag are likely to be common). So here, we can see we need to clear the web3url-gateway cache.

I will make a change on the HTTP code check a bit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants