Caching of API responses #1545

Changaco · 2018-03-05T12:26:39Z

In the past two weeks we've been getting many requests for https://liberapay.com/Changaco/public.json, which I assume are coming from shields following the deployment of #1251.

The requests often arrive in groups of 4, which is the number of Liberapay badges on the homepage. Here's a log extract:

[05/Mar/2018:09:03:29 +0000] 200 71187us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:29 +0000] 200 72585us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:35 +0000] 200 17306us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:36 +0000] 200 18380us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
...
[05/Mar/2018:09:03:41 +0000] 200 17871us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18523us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 15926us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18260us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"

This shows that there is very little caching of these API responses, even though they contain a Cache-Control header with a value of public, max-age=3600. It would be nice if that could be improved.

Similar closed issue: #266.

The text was updated successfully, but these errors were encountered:

paulmelnikow · 2018-03-06T07:06:27Z

Hi, thanks for raising this. We could definitely do a more thorough job of caching requests made to services and it's something I'm happy to talk about and try to improve.

I think the behavior you're seeing is expected based on the implementation, though I don't think it implies there is little caching of these requests. As @espadrine mentioned in #266:

That's normal. Until we have the response cached, since we cannot send the corresponding information, we ask.

Meaning, if we get four simultaneous responses for an unknown project, they will all fire. It's only once the first response has come back that we have a cached result.

I admit I haven't fully grokked the vendor caching code (in lib/request-handler.js). I think it might fire a separate request for different badges, meaning liberapay/receives/Changaco will be used for subsequent requests to liberapay/receives/Changaco, but not liberapay/gives/Changaco. In fact I'm pretty sure this is how it works because we don't cache the responses from the API – which would be relatively large – rather, the badge's computed text.

Another point to make is that we have three servers, and their caches are independent.

I'd love to improve our caching, possibly by using an external cache in redis or memcached, and possibly syncing those between servers. Though evolving that is substantially limited by our server resources. The cache size is tweaked to prevent OOM crashes. Our hosting budget is $18/month, which is incredibly small for the 500Mreq we handle in that time. That's not to say we can't or won't improve this, just to put it in perspective. (We're asking for one-time $10 donations from people who use and love Shields. Separately from this… please donate and spread the word!)

Finally there are two simple things we could do to address the symptom you're observing. We could prerender the example badges, and/or not display the entire list of badges until people start to drill down into a category.

Changaco · 2018-03-06T10:26:24Z

I've looked at the code and I think the first thing we should change is the underlying cache mechanism: it's currently a simple LRU instead of a time-aware algorithm. When the cache is full the "oldest" entry is dropped, but that entry may be valid for another hour while the "newest" may already be obsolete. This caching policy explains why raising the max-age of our API responses from 10 minutes to 1 hour didn't really help.

paulmelnikow · 2018-03-19T12:37:22Z

There is some heuristic logic that tries to reduce the frequency of refreshing data that is in the cache, max-age or no. So, it's possible this would cause a higher cache hit frequency overall, though it's also possible recency works well enough. Seems worth an experiment! What algorithm would you suggest instead?

Another issue is that I don't have access to the server logs, so @espadrine is in a better position to run experiments like this.

Also related: #1285.

Changaco · 2018-03-20T10:54:26Z

I have no experience with this kind of algorithm, but it seems to me that we need to modify the data structures like this:

 function CacheSlot(key, value) {
   this.key = key;
   this.value = value;
-  this.older = null;  // Newest slot that is older than this slot.
-  this.newer = null;  // Oldest slot that is newer than this slot.
+  this.less_recently_used = null;
+  this.more_recently_used = null;
+  this.earlier_expiring = null;
+  this.later_expiring = null;
 }
 
 function Cache(capacity, type) {
   if (!(this instanceof Cache)) { return new Cache(capacity, type); }
   type = type || 'unit';
   this.capacity = capacity;
   this.type = typeEnum[type];
   this.cache = new Map();  // Maps cache keys to CacheSlots.
-  this.newest = null;  // Newest slot in the cache.
-  this.oldest = null;
+  this.most_recently_used = null;
+  this.least_recently_used = null;
+  this.earliest_expiring = null;
+  this.last_expiring = null;
 }

and when cleaning the cache drop the earliest_expiring slot if it has expired, otherwise fall back to dropping the least_recently_used entry.

(Edit: on second thought this may not be the right data structure, because inserting a new cache slot would no longer run in constant time.)

mattbk · 2018-04-09T15:13:07Z

@Changaco, another option would be to set up a static endpoint and point the example badges at those. Then page loads at shields.io wouldn't be causing database queries at Liberapay.

PyvesB · 2018-05-05T22:06:33Z

Our backend is using three of the aforementioned LRU caches in total:

a request cache in the request-handler module.
an image cache in the svg-to-img module.
a badge key width cache in the make-badge module.

Their size is quite small, they can contain at most 1000 elements. With the high volume of different badge requests we're receiving, I'm not expecting cached elements to survive for very long before being forcibly removed.

According to the back-of-the-envelope calculations indicated in the code, their respective memory footprint is expected to be around 5MB, 1.5MB and probably a negligible amount for the last one. That does't seem like much at all especially for the request-handler one, even our limited servers could probably handle increasing those cache sizes significantly. This seems like a quick win to avoid some redundant computations and somewhat lower the number of API requests we're making. Any opinions?

paulmelnikow · 2018-11-29T20:10:22Z

While responding to #2424 it occurred to me that we could run our own downstream cache in front of individual services, particularly for information which doesn't change often (like Liberapay) and/or is served from very slow APIs (like libraries.io). During our outage earlier this year someone suggested using the Google cache downstream of us: #1568 (comment). I imagine we couldn't feed JSON endpoints to the Google cache and expect them to work, though we could do something similar by putting Cloudflare in front of these requests.

paulmelnikow · 2018-11-29T21:33:54Z

Cloudflare Free will not let you rewrite Host headers to reverse-proxy someone else's site at a different name: https://community.cloudflare.com/t/can-we-use-cloudflare-as-a-reverse-proxy-to-a-wordpress-site/31051/2

So we'd have to set up our own reverse proxy – though we could place Cloudflare Free in front of that.

potatoqualitee · 2018-12-29T14:31:18Z

@paulmelnikow - if you're an OSS project which it seems? CloudFlare will give you a pro sub for free

chris48s · 2021-06-16T18:25:02Z

This is an old issue and a lot has changed since it was raised, but the concept of whether we should cache upstream API responses came up again in the ops meeting yesterday so now is probably a good time to close this issue and summarise where things are on this today.

For some years now, we have used CloudFlare as a downstream cache. While we don't directly cache API responses, we do cache the rendered badges themselves so if you call (for example https://img.shields.io/liberapay/gives/Changaco ) a whole bunch of times it only hits https://liberapay.com/Changaco/public.json the first time then the subsequent calls are served from cache.

Examples on the home page are all static now - there are no "live" examples, so browsing the index doesn't generate any traffic on any upstream APIs.

We've completely removed the old LRU cache some time ago. Removing it didn't seem to make any meaningful difference to performance or rate limit usage on our side and we did not see any upstream API maintainers complaining of any any increase in traffic.

Shields.io now runs on a larger number of webservers behind a load balancer so the value of any sort of local cache stored on the webservers themselves is pretty limited. The likelyhood that two requests to the same URL would be made by the same server are low, so if we wanted to meaningfully cache API responses we would need a shared cache that all nodes in the cluster can read from/write to.

We accpet there is an edge case here where caching the badges downstream doesn't help: If multiple different badges use the same API endpoint to get their data (for example a version/license badge both ultimately come from the same API endpoint) we will make two requests if someone requests both the version and license badge for the same project and they're both a cache miss at CloudFlare (but then any subsequent requests will be served from cache until max-age is reached) but we think the downstream cache covers the 90% case.

To wrap up, we think caching badges using a downstream CDN mostly solves this problem for both us and upstream API maintainers and we don't plan to implement a secondary upstream cache for the API responses themselves.

paulmelnikow · 2021-06-16T23:41:20Z

Thanks for writing that up, Chris!

paulmelnikow added question Support questions, usage questions, unconfirmed bugs, discussions, ideas core Server, BaseService, GitHub auth, Shared helpers labels Mar 6, 2018

mattbk mentioned this issue Apr 4, 2018

Badge at Shields.io liberapay/liberapay.com#775

Closed

paulmelnikow mentioned this issue Nov 29, 2018

C# dependencies based on NuGet #2424

Closed

paulmelnikow mentioned this issue Dec 17, 2018

Caching layer for discardable credentials #2547

Open

paulmelnikow mentioned this issue Apr 27, 2019

Modernizing the request code #3368

Open

RanolP mentioned this issue May 22, 2019

chore: readme badges are not showing well RanolP/rustyle#1

Open

This comment has been minimized.

Sign in to view

chris48s closed this as completed Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching of API responses #1545

Caching of API responses #1545

Changaco commented Mar 5, 2018

paulmelnikow commented Mar 6, 2018

Changaco commented Mar 6, 2018

paulmelnikow commented Mar 19, 2018

Changaco commented Mar 20, 2018 •

edited

Loading

mattbk commented Apr 9, 2018

PyvesB commented May 5, 2018 •

edited

Loading

paulmelnikow commented Nov 29, 2018

paulmelnikow commented Nov 29, 2018

potatoqualitee commented Dec 29, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

chris48s commented Jun 16, 2021

paulmelnikow commented Jun 16, 2021

Caching of API responses #1545

Caching of API responses #1545

Comments

Changaco commented Mar 5, 2018

paulmelnikow commented Mar 6, 2018

Changaco commented Mar 6, 2018

paulmelnikow commented Mar 19, 2018

Changaco commented Mar 20, 2018 • edited Loading

mattbk commented Apr 9, 2018

PyvesB commented May 5, 2018 • edited Loading

paulmelnikow commented Nov 29, 2018

paulmelnikow commented Nov 29, 2018

potatoqualitee commented Dec 29, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

chris48s commented Jun 16, 2021

paulmelnikow commented Jun 16, 2021

Changaco commented Mar 20, 2018 •

edited

Loading

PyvesB commented May 5, 2018 •

edited

Loading