-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images not pullable due to wrong cache info in redis #19695
Comments
What's the harbor version and which redis key did you use to query the size? |
Hey, Harbor 2.8.4 and Redis 5.0.3 (as shipped with RHEL 8.8). |
While the problem already occured on Harbor 2.8.4 we meanwhile have updated to Has happened again today. Easy to see the 0 byte response from Harbor in the proxy.log.
Querying the blob That blob was listed in the Redis registry_db_index with size 0. At that time replication was executed (runs every 10 minutes) with lots of log entries. Could this collide in some way? |
Another example.
|
@wy65701436 any hint what to check? |
Meanwhile we investigated further and found more info about the circumstances that lead to the issue. It occurs when an image is pushed and pulled immediately afterwards (only a few seconds apart). However, the problem only occurs when the image is pulled via the Docker image digest (not the manifest digest) or one of the image layers is pulled directly via the layer digest. These failing pulls (response http 200 with size 0 in the log, as mentioned above) seem to leave the corresponding faulty entries in the redis cache. But we still dont really know why these pulls fail in the first place and where the wrong size info originates from. |
Temporarily we had deactivated the auto vuln scan after push for selected projects hoping that this would prevent the issue, but it occured though. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue. |
Hi,
we are facing an issue that occurs from time to time and we are stuck with it.
Environment:
Harbor 2.8.4
RHEL 8.8
3 Node HA Setup
external redis (5.0.3) with redis sentinel
external postgres 13 with pgpool
harbor.yml (redacted ansible template)
Heading
The issue looks like the following:
filesystem layer verification failed for digest sha256:******cb5e
errorThe problematic blob is existing in the Postgres DB and also in the S3 storage (with the correct checksum)
While debugging the logs we found, that the proxy in the problematic case is answering the HEAD and GET queries with "200 0", which means 0 bytes for the requested blob.
We then search in Redis and are now pretty sure that all constellations of non-pullable images are due to incorrect cache information in Redis. In every case we investigated, the Redis query for blob hashes of a "defective" image returned the value "0" for the key "size". After deleting the Redis entry, the image is immediately pullable.
The issue we now have is, that we do not know under which circumstances this happens or how we can further debug this issue.
We are happy about any further suggestions.
The text was updated successfully, but these errors were encountered: