-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GC performance] The performance of v2 manifest deletion is not good in S3 environment #12948
Comments
Hey @wy65701436, following our chat in Slack I'd like to share similar performance issue we're experiencing with a GCS storage backend. We're running Harbor v2.1.1 and we replicated a GCR registry content to Harbor, however we forgot to exclude a repo that had at the time >60,000 tags. After replication completed we deleted the repo in Harbor and ran GC, but the job keeps failing due to timeout to delete manifest:
looking at the registry logs we see that it takes over an hour to delete a manifest:
We enabled debug log in registry and we saw it was spending most of the time iterating though the tags with
for example:
I can share the full GC job and registry debug logs if needed, also happy to provide more information. |
Experiencing exactly the same issue as @dkulchinsky. We're unable to end a GC. It always end with a context deadline exceeded. We've more than 1Tb to clean (~130k objects). We can't resume so we have to start from scratch again. |
Hello friends, I'd like to ask to raise the priority of this issue. We are running several instances of Harbor (we use GCS backend, but I think the root cause here is the same) and we're rapidly growing our usage. We are starting to reach capacities that the GC simply cannot handle, repositories with more than a few thousand tags is taking ~2 minutes to delete a single manifest during GC, GC is now taking 10~14 hours and the problem is getting worse every day since we're adding more tags then we are deleting. on our test/certification Harbor instance we've reached over 20,000 tags on some repositories and GC just times out on the first manifest since the lookup takes >20 minutes. We are concerned about increasing our storage costs since we can't clean it up as well as other potential issues that may arise from having all these blobs/manifests lingering with no ability to properly clean them up. this issue was tagged as a candidate for v2.2.0, and we're already seeing v2.4.0 going out the door. I'm happy to provide additional context, information, logs but just hope we can have some attention on this issues since I think it will impact any user that needs Harbor to work at scale. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I believe this is still an active issue being tracked, so probably shouldn't get closed yet. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This is still an issue. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This is still an issue. |
@wy65701436 any hints on when the team may find some time to look at this? this seems like an issue that requires desperate attention however it didn't see any traction in over 2 years now. |
Just to further explain the crux of this issue. Harbor is using the docker distribution for it's (harbor-registry) registry component. The harbor GC will call into the docker registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest: To find the tag references, the docker registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest): So, the more tags you have in your repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale. |
@wy65701436, take a look #12948 (comment) |
Any update? |
Has anyone tested @hemanth132 solution? Or are there any updates from Harbor team regarding the API or GC? @Vad1mo @chlins Bump because it's still an important issue regarding usage with every S3 backend. |
Any update? Still causing a huge pain, GC works out slower than data is added, resulting in having to constantly extend disks |
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
hi @karmicdude, I have taken over @Antiarchitect's efforts with concurrent lookup and untag in PR distribution/distribution#4329. You can try it and check whether it has improvement, thanks. |
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Harbor is using the distribution for it's (harbor-registry) registry component. The harbor GC will call into the registry to delete the manifest, which in turn then does a lookup for all tags that reference the deleted manifest. To find the tag references, the registry will iterate every tag in the repository and read it's link file to check if it matches the deleted manifest (i.e. to see if uses the same sha256 digest). So, the more tags in repository, the worse the performance will be (as there will be more s3 API calls occurring for the tag directory lookups and tag file reads). Therefore, we can use concurrent lookup and untag to optimize performance as described in goharbor/harbor#12948. P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over. Thanks @Antiarchitect's efforts with PR distribution#3890. Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
hi @karmicdude, @sebglon, @jwojnarowicz @sidewinder12s , distribution/distribution#4329 has already merged, would you please help to try whether it meets expectations. |
Nice, I'll definitely check it out |
@wy65701436 @Vad1mo Can we have this change in v2.11.1 as this will improve the GC efficiency. |
Running GC on s3 like storage is so expensive, our many customers has storage more the 2 PiB and 900 million objects, they want quickly and high performance GC. I am interested in this work. |
@kofj FYI It seems like concurrent tag lookup feature kindly picked up by @microyahoo was merged into distribution: distribution/distribution#4329 but there is no release tag with this MR yet as far as I can tell. |
Whilst concurrent tag lookup is slightly better, it does not solve the underlying performance issue (it still has to lookup and read all tag files on the S3 filesystem for the referenced repository). For better solutions, I'd like to see one of:
|
I prefer the option 3, which is harbor doesn't perform the tag deletion at all. Why harbor still leverages this api is because that the tag on pushing still be landed into the distribution side. However, actually, it is not necessary since harbor uses the database to do the CRUD of tag. In summary, we can store all the artifacts in tag-less in the distribution side. So then, the deletion is not needed, but we should consider about the existing artifact when we try to update the logic. Another quick solution is that, we can give an option for the end user to let them decide whether removing the tag from the backend. It's no harm but generates some garbage in the storage side. |
In S3 backend environment, we found that it took about 39 seconds to delete a manifest via v2 API.
[why still use v2 to handle manifest deletion]
As Harbor cannot know the tags belong to the manifest in the storage, the GC job needs to leverage the v2 API to clean them. But, the v2 API will look up all of tags, and remove them one by one. This may cause performance issue.
[what we can do next]
1, Investigate how many requests send to S3 storage within the v2 manifest deletion.
2, Investigate the possibility of not to store the first tag in the backend, then GC job can skip this step.
Log
The text was updated successfully, but these errors were encountered: