-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gatsby-source-drupal): Disable caching + add http/2 agent #32012
Conversation
Talking to Gatsby/Drupal users — most set long max-age for the Drupal cache to keep their edge http cache as fresh as possible as Drupal can directly purge its edge http cache. But this has the unfortunate side-effect with the recent http client change in #31514 that API calls aren't revalidated. Meaning that a user could change some content in Drupal and not see those updates in their Gatsby site until the Drupal cache expires in the Gatsby cache. This PR removes the `cache-control` header from Drupal API responses so that we only can use `etag` for caching which forces revalidation on every request.
Some relevant Drupal content:
Because Drupal does not compute hashes for content (there are reasons for this) it's not the best idea to use ETag for cache validation. ETag values will change every time the public proxy clears so the value more or less just represents the last time the public proxy was cleared. On top of this most Drupal users set a long max-age for their public cache because fetching pages from Varnish or another cache proxy is orders of magnitude faster than a Drupal bootstrap. In addition, Drupal now has solid cache invalidation techniques via cache tags so it's much safer to trust the cache. Considering this, and following discussions in the Drupal slack, I think we are leaning towards not having a local cache for requests in Gatsby and instead always forwarding the requests through Varnish. There is one more thing to bear in mind related to caching and Varnish and that would be authentication. When making an authenticated request the public cache is bypassed and Drupal is always bootstrapped. @KyleAMathews and I talked about allowing for a config option which would determine which entities require authentication to be passed since it's usually only a few resource types that you need to use an admin role to fetch. eg. redirects, crops, etc. |
} | ||
|
||
async function worker([url, options]) { | ||
return got(url, { agent, ...options }) | ||
return got(url, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to pass request: http2wrapper.auto
here, as Got v11 doesn't use the newest version of http2-wrapper
. Got v12 is going to be released in the next few weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for weighing in @szmarczak !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I almost forgot. You need to set the http2
option to true
as well. It will pass the entire agent
object to http2wrapper
, otherwise it would pass agent: httpsAgent
which results in an error if the endpoint is HTTP/2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix it 5fbf618
Thanks btw for all your great work on Got! It's a fantastic piece of software.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks btw for all your great work on Got! It's a fantastic piece of software.
No problem :) I really enjoy improving it.
oh haha thanks!
This hasn't been documented, sorry. One of the ways to fix this in Got would be to always pass the entire agent object to the request
function, but then we would need to default request
to something like this:
request = (url, options) => {
if (url.protocol === 'https:') {
options.agent = options.agent.https;
return https.request(url, options);
}
options.agent = options.agent.http;
return http.request(url, options);
};
But then request: https.request
wouldn't work. We could detect native http
functions but then it would fail on custom functions that at the end return the same what https.request
does.
I'm not sure what's the best solution here. Alternatively we could make request
an object with protocols http2
https
and http
as keys...
For now http2: true
does the trick :P
It looks like normalizeArguments is using more than its fair share of CPU?
Indeed. Can you point it down to a few lines? I suppose it's caused by too intensive object creation. The upcoming Got version uses getters & setters in order to avoid re-doing normalization if not necessary. If you could create a reproducible repo, that would be awesome.
Also feel free to create an issue in the Got repo about this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Talking to Gatsby/Drupal users — most set long max-age for the Drupal cache to
keep their edge http cache as fresh as possible as Drupal can directly purge
its edge http cache.
But this has the unfortunate side-effect with the recent http client change in
#31514 that API calls aren't
revalidated. Meaning that a user could change some content in Drupal and not
see those updates in their Gatsby site until the Drupal cache expires in the
Gatsby cache.
We tested using etag for validation but the overhead of reading/writing to the Gatsby
cache + http revalidation made it somewhat slower than just refetching every time.
While working on this PR, I also noticed that we were missing an http2 agent which
helped speed up requests on large Drupal sites.
I also noticed that we're using Array.concate which gets quite slow with large arrays so I replaced
that with
Array.push(...arr)
saving a couple of seconds on a large (70k entities) test site.This site now fetches all data on a warm varnish cache in ~12 seconds!