Rein in overly aggressive frontend cache purging #1155

harrislapiroff · 2021-11-19T15:29:33Z

Yesterday we noticed a spike in the thousands of Tracker nginx logs of the form

Purged page IncidentPage with title: Portland Tribune reporter pushed by police officer while covering far-right rally and slug: portland-tribune-reporter-pushed-police-officer-while-covering-far-right-rally

with a number of different pages specified. I'm fairly sure this was caused by the logic here

https://github.com/freedomofpress/pressfreedomtracker.us/blob/develop/incident/signals.py#L46

where we purge the cache for every incident that shares a category with the incident currently being edited. I believe we do this because it has the potential to affect the "related incidents" section of the page for all those incidents. @KioHerr reported that she recalls having to reload the page a few times while editing a particular incident because it was being slow. I'm guessing it was this incident, due to the overwhelming number of separate API requests we were sending for it.

I could imagine a couple ways of solving this and am open to discussion:

Be less aggressive with what we choose to purge. Maybe we can pinpoint which incidents might actually have this one in their related incident section? Or we could decide it doesn't matter if that section gets out of date (this might cause broken links if the URL of an incident changes or an incident is deleted—otherwise it should only cause an outdated headline or summary in the "Related incidents" section of another incident page)
Batch these requests: I believe cloudflare lets you purge a batch of urls with a single API request. If we could batch these, we could reduce the API request count to something reasonable.

The text was updated successfully, but these errors were encountered:

maeve-fpf · 2021-11-22T18:43:31Z

Documentation for the purge request is here: https://api.cloudflare.com/#zone-purge-files-by-url - This has to be how we are already purging individual pages, so it's a matter of stuffing multiple URLs into that array instead of making multiple requests. But it looks like the code to do that is implemented by Wagtail. Would we need to patch it?

Note that our non-profit account doesn't have Enterprise features, so we cannot use tags: https://api.cloudflare.com/#zone-purge-files-by-cache-tags,-host-or-prefix - I suppose the really overengineered right way to do this would be for each route to include all the IDs of Wagtail pages it refers to as a cache tag, then purge by tag.

It seems like Cloudflare doesn't have a problem with us sending too many requests, but will ignore more than 1000 URLs per minute no matter how they are requested (separately, or in one big request).

chigby · 2021-11-22T18:48:15Z

Wagtail does support the concept of purge "batches".

harrislapiroff · 2021-11-22T19:48:02Z

Let's start by purging batches and log number of pages purged and go from there. If more than 1000 are regularly purged we'll need to address that too

chigby · 2021-11-22T20:50:01Z

I looked a little into the wagtail implementation of batching, and from this it seems like Cloudflare has a secondary limit of 30 URLs per request (in addition to the 1,000 URLs per minute). Note the "constraints" column from the documentation link above.

Wagtail's batching does take this into account, though it might not be as big of a reduction in number of requests as we may have imagined.

harrislapiroff · 2022-05-02T17:11:30Z

Looking at this ticket again, I think the solution here is simply to remove the logic for purging related incidents. I think this is unlikely to be a huge issue—changing URLs should be uncommon, so I think this probably won't break links, and a 4 hour (or whatever our TTL is) delay on updating related incidents seems acceptable. Anyone else have thoughts on the side effects here?

If not, let's call this ticket ready for action.

chigby self-assigned this May 23, 2022

chigby mentioned this issue May 23, 2022

Remove related incident cache purging #1430

Merged

SaptakS closed this as completed in #1430 May 31, 2022

chigby mentioned this issue Oct 19, 2023

Rework frontend cache purging request model and signals #1776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rein in overly aggressive frontend cache purging #1155

Rein in overly aggressive frontend cache purging #1155

harrislapiroff commented Nov 19, 2021 •

edited

Loading

maeve-fpf commented Nov 22, 2021

chigby commented Nov 22, 2021

harrislapiroff commented Nov 22, 2021

chigby commented Nov 22, 2021 •

edited

Loading

harrislapiroff commented May 2, 2022

Rein in overly aggressive frontend cache purging #1155

Rein in overly aggressive frontend cache purging #1155

Comments

harrislapiroff commented Nov 19, 2021 • edited Loading

maeve-fpf commented Nov 22, 2021

chigby commented Nov 22, 2021

harrislapiroff commented Nov 22, 2021

chigby commented Nov 22, 2021 • edited Loading

harrislapiroff commented May 2, 2022

harrislapiroff commented Nov 19, 2021 •

edited

Loading

chigby commented Nov 22, 2021 •

edited

Loading