Description
Welcome
- Yes, I've searched similar issues on GitHub and didn't find any.
How do you use lego?
Other
Detailed Description
Hiya! I'm an engineer at Let's Encrypt. Lately we've been having severe load spikes at 00:00 UTC every day. We've run into similar issues in the past, and historically the issue has been large numbers of users manually configuring their cron jobs to run renewals at exactly midnight every day. It gets worse - because many of these jobs fail due to overload conditions, the next day the load spike may be bigger!
In Certbot the problem is partly solved by packaging - all the major packaging for Certbot arranges for it to run at a random time throughout the day. However, there are still people who manually setup Certbot in crontab, and they often choose 00:00 UTC to run it. So, some years ago Certbot added some code:
certbot/certbot#6391
certbot/certbot#6596
certbot/certbot#6599
When certbot renew
is run non-interactively, it will sleep a random amount of time up to 8 minutes. This helps spread out the midnight load spike significantly.
Right now lego-cli is the top participant in the load spike. During the first 30 seconds after 00:00 UTC today, lego-cli accounted for 33k new-order requests, while the next biggest contributor accounted for only 5.6k new-order requests. For all requests (not just new-order) lego-cli accounted for 173k vs 19k for the next biggest contributor (60% of the total).
Considering all lego-cli traffic to Let's Encrypt, it's very spiky:
So, two questions:
- Could you implement a randomized delay for non-interactive renewals, similar to Certbot's?
- Are you aware of any major integrations or packaging for lego-cli that includes a cron job or systemd unit that runs at 00:00?
Thanks,
Jacob