-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(OCMDiscoveryService): Also cache error results during discovery #49311
Conversation
/backport to stable30 |
/backport to stable29 |
/backport to stable28 |
not a huge fan of caching for so long, might be interesting (and not too much of overkill) to have a background process that check the status of faulty remote instance on its own and reset faulty cache on success |
I can also reduce the error caching to 5m or even lower, but we need to prevent that this request occurs on every request which can lead to this kind of self DOS. |
Signed-off-by: provokateurin <kate@provokateurin.de>
746094e
to
cc8e69c
Compare
I now experienced the opposite. WHile working on that area, I got one of my instance pending an update. So returning a 503. I think some errors should be left out from caching. |
True, maybe it could be more precise. But the previous behavior was also completely unacceptable for end-users with the entire instance being super slow on every request. |
Summary
On my personal instance I'm currently facing the problem that a folder shared from another instance is no longer available, as the server went down. Unfortunately the remote server doesn't terminate the connection instantly, but it only gets stopped by the client timeout.
This makes literally every request to my own Nextcloud server 10 seconds longer, as the timeout happens every time.
Decreasing the timeout would help a little bit, but then in other scenarios it might also lead to unintended consequences, so it isn't a good way to workaround the problem.
When the remote server has a problem we must also cache the result, but for a much shorter duration (e.g. 1h), just to prevent that every request tries to contact the remote server again.
Now we still check once in a while if the server is still there, but not on every request that uses the Filesystem which was slowing down everything dramatically.
Checklist