-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC should be more resilient for flaky backends #15469
Comments
By default, |
We are now running on 2.3.1. The Long-term problem is that when there is a permanent problem with a blob or manifest, the registry keeps on growing. |
Understand your problem, it could be a GC backlog. However, IMO, since GC is a high risk task within system and the currently it runs well, we should cautious update the code. |
just chiming in here, but we are having several persisting issues with GC:
I think these issues deserve a more immediate attention, we are reaching a point where GC is simply broken due to a combination of all these issues and I don't think we are the only ones that would hit this. |
This is a massive issue for us. We have 100s of thousands of tags and 200+ TB of data to be deleted and because the GC task fails on retries and then must rebuild the proposed list to be deleted on every run, we cannot get through GC.
This appears to be #12948 |
Object storages such as S3, Swift and the respective different implementation can act inconsistent and event fail to deliver data. Especially under load or with many objects.
Imagine when the GC runs over 5 TB of data and there is somewhere a timeout or other issue. The whole GC process just stops.
Here is such an example:
My recommendation is to make GC more resilient so that it can carry on even if there are errors with individual repositories.
The text was updated successfully, but these errors were encountered: