Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of API responses #1545

Closed
Changaco opened this issue Mar 5, 2018 · 14 comments
Closed

Caching of API responses #1545

Changaco opened this issue Mar 5, 2018 · 14 comments
Labels
core Server, BaseService, GitHub auth, Shared helpers question Support questions, usage questions, unconfirmed bugs, discussions, ideas

Comments

@Changaco
Copy link

Changaco commented Mar 5, 2018

In the past two weeks we've been getting many requests for https://liberapay.com/Changaco/public.json, which I assume are coming from shields following the deployment of #1251.

The requests often arrive in groups of 4, which is the number of Liberapay badges on the homepage. Here's a log extract:

[05/Mar/2018:09:03:29 +0000] 200 71187us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:29 +0000] 200 72585us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:35 +0000] 200 17306us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:36 +0000] 200 18380us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
...
[05/Mar/2018:09:03:41 +0000] 200 17871us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18523us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 15926us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18260us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"

This shows that there is very little caching of these API responses, even though they contain a Cache-Control header with a value of public, max-age=3600. It would be nice if that could be improved.

Similar closed issue: #266.

@paulmelnikow paulmelnikow added question Support questions, usage questions, unconfirmed bugs, discussions, ideas core Server, BaseService, GitHub auth, Shared helpers labels Mar 6, 2018
@paulmelnikow
Copy link
Member

Hi, thanks for raising this. We could definitely do a more thorough job of caching requests made to services and it's something I'm happy to talk about and try to improve.

I think the behavior you're seeing is expected based on the implementation, though I don't think it implies there is little caching of these requests. As @espadrine mentioned in #266:

That's normal. Until we have the response cached, since we cannot send the corresponding information, we ask.

Meaning, if we get four simultaneous responses for an unknown project, they will all fire. It's only once the first response has come back that we have a cached result.

I admit I haven't fully grokked the vendor caching code (in lib/request-handler.js). I think it might fire a separate request for different badges, meaning liberapay/receives/Changaco will be used for subsequent requests to liberapay/receives/Changaco, but not liberapay/gives/Changaco. In fact I'm pretty sure this is how it works because we don't cache the responses from the API – which would be relatively large – rather, the badge's computed text.

Another point to make is that we have three servers, and their caches are independent.

I'd love to improve our caching, possibly by using an external cache in redis or memcached, and possibly syncing those between servers. Though evolving that is substantially limited by our server resources. The cache size is tweaked to prevent OOM crashes. Our hosting budget is $18/month, which is incredibly small for the 500Mreq we handle in that time. That's not to say we can't or won't improve this, just to put it in perspective. (We're asking for one-time $10 donations from people who use and love Shields. Separately from this… please donate and spread the word!)

Finally there are two simple things we could do to address the symptom you're observing. We could prerender the example badges, and/or not display the entire list of badges until people start to drill down into a category.

@Changaco
Copy link
Author

Changaco commented Mar 6, 2018

I've looked at the code and I think the first thing we should change is the underlying cache mechanism: it's currently a simple LRU instead of a time-aware algorithm. When the cache is full the "oldest" entry is dropped, but that entry may be valid for another hour while the "newest" may already be obsolete. This caching policy explains why raising the max-age of our API responses from 10 minutes to 1 hour didn't really help.

@paulmelnikow
Copy link
Member

There is some heuristic logic that tries to reduce the frequency of refreshing data that is in the cache, max-age or no. So, it's possible this would cause a higher cache hit frequency overall, though it's also possible recency works well enough. Seems worth an experiment! What algorithm would you suggest instead?

Another issue is that I don't have access to the server logs, so @espadrine is in a better position to run experiments like this.

Also related: #1285.

@Changaco
Copy link
Author

Changaco commented Mar 20, 2018

I have no experience with this kind of algorithm, but it seems to me that we need to modify the data structures like this:

 function CacheSlot(key, value) {
   this.key = key;
   this.value = value;
-  this.older = null;  // Newest slot that is older than this slot.
-  this.newer = null;  // Oldest slot that is newer than this slot.
+  this.less_recently_used = null;
+  this.more_recently_used = null;
+  this.earlier_expiring = null;
+  this.later_expiring = null;
 }
 
 function Cache(capacity, type) {
   if (!(this instanceof Cache)) { return new Cache(capacity, type); }
   type = type || 'unit';
   this.capacity = capacity;
   this.type = typeEnum[type];
   this.cache = new Map();  // Maps cache keys to CacheSlots.
-  this.newest = null;  // Newest slot in the cache.
-  this.oldest = null;
+  this.most_recently_used = null;
+  this.least_recently_used = null;
+  this.earliest_expiring = null;
+  this.last_expiring = null;
 }

and when cleaning the cache drop the earliest_expiring slot if it has expired, otherwise fall back to dropping the least_recently_used entry.

(Edit: on second thought this may not be the right data structure, because inserting a new cache slot would no longer run in constant time.)

@mattbk
Copy link
Contributor

mattbk commented Apr 9, 2018

@Changaco, another option would be to set up a static endpoint and point the example badges at those. Then page loads at shields.io wouldn't be causing database queries at Liberapay.

@PyvesB
Copy link
Member

PyvesB commented May 5, 2018

Our backend is using three of the aforementioned LRU caches in total:

  • a request cache in the request-handler module.
  • an image cache in the svg-to-img module.
  • a badge key width cache in the make-badge module.

Their size is quite small, they can contain at most 1000 elements. With the high volume of different badge requests we're receiving, I'm not expecting cached elements to survive for very long before being forcibly removed.

According to the back-of-the-envelope calculations indicated in the code, their respective memory footprint is expected to be around 5MB, 1.5MB and probably a negligible amount for the last one. That does't seem like much at all especially for the request-handler one, even our limited servers could probably handle increasing those cache sizes significantly. This seems like a quick win to avoid some redundant computations and somewhat lower the number of API requests we're making. Any opinions?

@paulmelnikow
Copy link
Member

While responding to #2424 it occurred to me that we could run our own downstream cache in front of individual services, particularly for information which doesn't change often (like Liberapay) and/or is served from very slow APIs (like libraries.io). During our outage earlier this year someone suggested using the Google cache downstream of us: #1568 (comment). I imagine we couldn't feed JSON endpoints to the Google cache and expect them to work, though we could do something similar by putting Cloudflare in front of these requests.

@paulmelnikow
Copy link
Member

Cloudflare Free will not let you rewrite Host headers to reverse-proxy someone else's site at a different name: https://community.cloudflare.com/t/can-we-use-cloudflare-as-a-reverse-proxy-to-a-wordpress-site/31051/2

So we'd have to set up our own reverse proxy – though we could place Cloudflare Free in front of that.

@potatoqualitee
Copy link

@paulmelnikow - if you're an OSS project which it seems? CloudFlare will give you a pro sub for free

@adamjstone

This comment has been minimized.

@paulmelnikow

This comment has been minimized.

@adamjstone

This comment has been minimized.

@chris48s
Copy link
Member

This is an old issue and a lot has changed since it was raised, but the concept of whether we should cache upstream API responses came up again in the ops meeting yesterday so now is probably a good time to close this issue and summarise where things are on this today.

For some years now, we have used CloudFlare as a downstream cache. While we don't directly cache API responses, we do cache the rendered badges themselves so if you call (for example https://img.shields.io/liberapay/gives/Changaco ) a whole bunch of times it only hits https://liberapay.com/Changaco/public.json the first time then the subsequent calls are served from cache.

Examples on the home page are all static now - there are no "live" examples, so browsing the index doesn't generate any traffic on any upstream APIs.

We've completely removed the old LRU cache some time ago. Removing it didn't seem to make any meaningful difference to performance or rate limit usage on our side and we did not see any upstream API maintainers complaining of any any increase in traffic.

Shields.io now runs on a larger number of webservers behind a load balancer so the value of any sort of local cache stored on the webservers themselves is pretty limited. The likelyhood that two requests to the same URL would be made by the same server are low, so if we wanted to meaningfully cache API responses we would need a shared cache that all nodes in the cluster can read from/write to.

We accpet there is an edge case here where caching the badges downstream doesn't help: If multiple different badges use the same API endpoint to get their data (for example a version/license badge both ultimately come from the same API endpoint) we will make two requests if someone requests both the version and license badge for the same project and they're both a cache miss at CloudFlare (but then any subsequent requests will be served from cache until max-age is reached) but we think the downstream cache covers the 90% case.

To wrap up, we think caching badges using a downstream CDN mostly solves this problem for both us and upstream API maintainers and we don't plan to implement a secondary upstream cache for the API responses themselves.

@paulmelnikow
Copy link
Member

Thanks for writing that up, Chris!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth, Shared helpers question Support questions, usage questions, unconfirmed bugs, discussions, ideas
Projects
None yet
Development

No branches or pull requests

7 participants