Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rein in overly aggressive frontend cache purging #1155

Closed
2 tasks
harrislapiroff opened this issue Nov 19, 2021 · 5 comments · Fixed by #1430
Closed
2 tasks

Rein in overly aggressive frontend cache purging #1155

harrislapiroff opened this issue Nov 19, 2021 · 5 comments · Fixed by #1430
Assignees

Comments

@harrislapiroff
Copy link
Contributor

harrislapiroff commented Nov 19, 2021

Yesterday we noticed a spike in the thousands of Tracker nginx logs of the form

Purged page IncidentPage with title: Portland Tribune reporter pushed by police officer while covering far-right rally and slug: portland-tribune-reporter-pushed-police-officer-while-covering-far-right-rally

with a number of different pages specified. I'm fairly sure this was caused by the logic here

https://github.com/freedomofpress/pressfreedomtracker.us/blob/develop/incident/signals.py#L46

where we purge the cache for every incident that shares a category with the incident currently being edited. I believe we do this because it has the potential to affect the "related incidents" section of the page for all those incidents. @KioHerr reported that she recalls having to reload the page a few times while editing a particular incident because it was being slow. I'm guessing it was this incident, due to the overwhelming number of separate API requests we were sending for it.

I could imagine a couple ways of solving this and am open to discussion:

  • Be less aggressive with what we choose to purge. Maybe we can pinpoint which incidents might actually have this one in their related incident section? Or we could decide it doesn't matter if that section gets out of date (this might cause broken links if the URL of an incident changes or an incident is deleted—otherwise it should only cause an outdated headline or summary in the "Related incidents" section of another incident page)
  • Batch these requests: I believe cloudflare lets you purge a batch of urls with a single API request. If we could batch these, we could reduce the API request count to something reasonable.
@maeve-fpf
Copy link
Contributor

Documentation for the purge request is here: https://api.cloudflare.com/#zone-purge-files-by-url - This has to be how we are already purging individual pages, so it's a matter of stuffing multiple URLs into that array instead of making multiple requests. But it looks like the code to do that is implemented by Wagtail. Would we need to patch it?

Note that our non-profit account doesn't have Enterprise features, so we cannot use tags: https://api.cloudflare.com/#zone-purge-files-by-cache-tags,-host-or-prefix - I suppose the really overengineered right way to do this would be for each route to include all the IDs of Wagtail pages it refers to as a cache tag, then purge by tag.

It seems like Cloudflare doesn't have a problem with us sending too many requests, but will ignore more than 1000 URLs per minute no matter how they are requested (separately, or in one big request).

@chigby
Copy link
Contributor

chigby commented Nov 22, 2021

Wagtail does support the concept of purge "batches".

@harrislapiroff
Copy link
Contributor Author

Let's start by purging batches and log number of pages purged and go from there. If more than 1000 are regularly purged we'll need to address that too

@chigby
Copy link
Contributor

chigby commented Nov 22, 2021

I looked a little into the wagtail implementation of batching, and from this it seems like Cloudflare has a secondary limit of 30 URLs per request (in addition to the 1,000 URLs per minute). Note the "constraints" column from the documentation link above.

Wagtail's batching does take this into account, though it might not be as big of a reduction in number of requests as we may have imagined.

@harrislapiroff
Copy link
Contributor Author

Looking at this ticket again, I think the solution here is simply to remove the logic for purging related incidents. I think this is unlikely to be a huge issue—changing URLs should be uncommon, so I think this probably won't break links, and a 4 hour (or whatever our TTL is) delay on updating related incidents seems acceptable. Anyone else have thoughts on the side effects here?

If not, let's call this ticket ready for action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants