Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(OCMDiscoveryService): Also cache error results during discovery #49311

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

provokateurin
Copy link
Member

@provokateurin provokateurin commented Nov 15, 2024

Summary

On my personal instance I'm currently facing the problem that a folder shared from another instance is no longer available, as the server went down. Unfortunately the remote server doesn't terminate the connection instantly, but it only gets stopped by the client timeout.
This makes literally every request to my own Nextcloud server 10 seconds longer, as the timeout happens every time.

Decreasing the timeout would help a little bit, but then in other scenarios it might also lead to unintended consequences, so it isn't a good way to workaround the problem.
When the remote server has a problem we must also cache the result, but for a much shorter duration (e.g. 1h), just to prevent that every request tries to contact the remote server again.
Now we still check once in a while if the server is still there, but not on every request that uses the Filesystem which was slowing down everything dramatically.

Checklist

@provokateurin provokateurin added bug 3. to review Waiting for reviews labels Nov 15, 2024
@provokateurin provokateurin added this to the Nextcloud 31 milestone Nov 15, 2024
@provokateurin provokateurin requested review from ArtificialOwl, a team, nfebe and sorbaugh and removed request for a team November 15, 2024 13:27
@provokateurin
Copy link
Member Author

/backport to stable30

@provokateurin
Copy link
Member Author

/backport to stable29

@provokateurin
Copy link
Member Author

/backport to stable28

@ArtificialOwl
Copy link
Member

not a huge fan of caching for so long, might be interesting (and not too much of overkill) to have a background process that check the status of faulty remote instance on its own and reset faulty cache on success

@provokateurin
Copy link
Member Author

I can also reduce the error caching to 5m or even lower, but we need to prevent that this request occurs on every request which can lead to this kind of self DOS.

Signed-off-by: provokateurin <kate@provokateurin.de>
@provokateurin provokateurin force-pushed the fix/ocmdiscoveryservice/cache-errors branch from 746094e to cc8e69c Compare November 25, 2024 09:29
@skjnldsv
Copy link
Member

I now experienced the opposite. WHile working on that area, I got one of my instance pending an update. So returning a 503.
I couldn't understand why this was still throwing despite me having done the update.

I think some errors should be left out from caching.

@provokateurin
Copy link
Member Author

True, maybe it could be more precise. But the previous behavior was also completely unacceptable for end-users with the entire instance being super slow on every request.

@skjnldsv skjnldsv mentioned this pull request Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants