Conversation
|
So I'm not sure I'd call this a points of pain overview so much as good docs ;P. We've discussed a bunch offline; two thoughts that are really only relevant here: your scheduler failure modes are simple bugs; they should be fixed in-situ I think, because that can be done quickly (to whit: don't de-and-requeue things when there is no work slot available - thats not the task failing; use system metrics to inform work slot availability (e.g. if there is io overload, don't schedule more work); immediately place work when slots are freed up (e.g. schedule work immediately at the end of your cleanup of a work slot), cap exponential backoff (e.g. at 5 minutes), discard work after (say) 10 attempts, and finally implement a quick-reset mechanism to zero the queue and allow an immediate restoration of service without mucking around. |
|
|
||
| ## Memory contention + exponential backoff = every-growing retry queue | ||
|
|
||
| Until recently, this VM only had 224Gb RAM. It would relatively often |
| log-scraping job. This could probably be avoided if we could do | ||
| something like feed the logs into syslog or some other central logging | ||
| system. | ||
|
|
There was a problem hiding this comment.
mitmproxy troubleshooting
Because mitmproxy intercepts HTTPS traffic, we've got a self-signed CA cert on it which needs to be trusted by the scrapers. Most of the customisation on our buildstep image is trying to get this cert in all the places and trusted by all the things.
However, the dev vagrant image doesn't (unless my memory is playing tricks on me) set this up at all; which makes it hard to reproduce problems. Having a dev setup that did set up SSL by default - in as close as possible to the prod setup - would make it easier to troubleshoot problems.
Stepping back a little though: we really only have mitmproxy in place so that we can record the hosts the scraper is scraping from. If we had a different way to record those hosts, we might not need mitmproxy at all.
Scraper platform updates
At present all scrapers use openaustralia/buildstep:latest. This image is based on cedar:14, which is old, doesn't work at all with PHP, and about to be EOL. We need to update to heroku:16, but this is a breaking change: different set of system packages installed, different versions of languages avalable and so on. #1207 is the start of a plan to allow scrapers to pick a platform so that we can start with a soft migration to heroku:16, and later make other platforms available.
|
@jamezpolley Shall we merge this anyway? It's a good thing to have recorded. |
|
@jamezpolley (cc @Br3nda ) - I feel should be reviewed to make sure its still relevant and then merged and if not then closed and marked abandoned - its getting really long in the tooth. |
No description provided.