Skip to content

Conversation

@lukaszraczylo
Copy link

What type of PR is this?

Feature

Which issue does this PR fix:

Adds ability to specify the images architecture for mirroring.

What does this PR do / Why do we need it:

Instead of pulling 10s of GB of images for different architectures, you can now pull only architectures you use.

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

Automation added to e2e:

Will this break upgrades or downgrades?

Meh.

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

{
"prefix": "/repo4/**",
"tags": {
"excludeRegex": ".*-(amd64|arm64)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag-based exclusion if platform info is in the image name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "meh". Platform info in the image name does not mean anything to be honest. I've seen ( and even accidentally released myself ) quite a few images tagged with one platform declared as part of the tag and image itself / binaries within being different platform.

@rchincha
Copy link
Contributor

@lukaszraczylo thanks for your PR. This has been a long standing ask from the community.

@lukaszraczylo
Copy link
Author

@rchincha I'll work on the suggestions as well, and I'd PR it faster, but I'm using zotregistry for past two days :D

@rchincha
Copy link
Contributor

@rchincha I'll work on the suggestions as well, and I'd PR it faster, but I'm using zotregistry for past two days :D

If you able to test locally, you can iterate a lot quicker.
Also do take a look at the CI and failures to understand what would matter to get this PR merged.

@lukaszraczylo
Copy link
Author

@rchincha sorted. I also have an idea re: resources usage improvement, but I'll leave it for the next PR to not scope creep too much :)


// ParsePlatform parses a platform string into a Platform struct
// The string can be in the format "os/arch" or just "arch"
func ParsePlatform(platform string) Platform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, instead of a nested map, it is a flat string with a '/' separator

@rchincha
Copy link
Contributor

@rchincha sorted. I also have an idea re: resources usage improvement, but I'll leave it for the next PR to not scope creep too much :)

Looking forward to that also.

@codecov
Copy link

codecov bot commented Apr 30, 2025

Codecov Report

Attention: Patch coverage is 14.92537% with 171 lines in your changes missing coverage. Please review.

Project coverage is 90.34%. Comparing base (06a0cd5) to head (3eecc29).

Files with missing lines Patch % Lines
pkg/extensions/sync/service.go 11.00% 95 Missing and 2 partials ⚠️
pkg/extensions/sync/destination.go 19.56% 71 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3128      +/-   ##
==========================================
- Coverage   90.79%   90.34%   -0.46%     
==========================================
  Files         172      172              
  Lines       32385    32577     +192     
==========================================
+ Hits        29405    29431      +26     
- Misses       2241     2403     +162     
- Partials      739      743       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lukaszraczylo
Copy link
Author

@rchincha I resolved all the issues / suggestions from what I see :) If anything else - I'll pick it up tomorrow

Copy link
Contributor

@rchincha rchincha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this config-sync-platforms.json instead

Copy link
Contributor

@rchincha rchincha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/blackbox tests - just make one file right?

@lukaszraczylo
Copy link
Author

@rchincha that's done. Also rewrote the commit messages to adhere to the rules

Copy link
Contributor

@andaaron andaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eusebiu-constantin-petu-dbk , please review this PR

@rchincha when sync-ing an index, do we want an exact copy, or a new index containing just a subset of the synced manifests? The image would contain invalid references in the 1st case, and digest would change in the 2nd case. There is also a 3rd option of saving the new content with the old digest, but I don't know what the implications would be, since it doesn't respect the content addressability.

tempStoreController storage.StoreController, // temp store controller
metaDB mTypes.MetaDB,
log log.Logger,
config ...*syncconf.RegistryConfig, // optional config for filtering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this optional?

return err
}

manifestContent = updatedContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this will break any referrer pointing to the original index digest.
Unless the same digest is kept in the filename, but a different one would be computed based on the new filename, are we doing this?


for _, spec := range platformSpecs {
parts := strings.Split(spec, "/")
if len(parts) == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we need to handle the variant here as well?
  2. Maybe we should have a platform method for matching against a different platform object (including partial matches in case filtering is done just for a subset of platform attributes of course)?

}

// shouldIncludeArchitecture determines if an architecture should be included in the sync
// DEPRECATED: Use shouldIncludePlatform instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commit this code if it is deprecated?

if !service.shouldIncludePlatform(platform) {
platformDesc := platform.Architecture
if platform.OS != "" {
platformDesc = platform.OS + "/" + platform.Architecture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we need to handle variant here?
  2. Maybe we should have a Platform method to return the os/arch/variant string for that instance.

@eusebiu-constantin-petu-dbk
Copy link
Collaborator

eusebiu-constantin-petu-dbk commented May 15, 2025

Sorry for being late to the party :D.

Why do you filter platforms in destination.go ? It will commit whatever it finds inside the synced repo, so if you filter upstream there's no need to filter again. Or am I missing something?

Also there is an API in regclient to filter platforms, I'll make a PR based on this one.

@rchincha
Copy link
Contributor

@lukaszraczylo more folks are hitting this and complaining.

https://cloud-native.slack.com/archives/C03EGRE4QGH
^ pls join the slack channel.

@lukaszraczylo
Copy link
Author

@rchincha I may try to sit down on it this weekend although I'm a little bit busy with family and baby on the way so no promises

@rchincha
Copy link
Contributor

rchincha commented Oct 8, 2025

@lukaszraczylo pls do let us know if you are able to continue work on this PR.

@lukaszraczylo
Copy link
Author

@rchincha, sure - I can resume the work, although I'm disappointed with the review style and last-minute submissions of the change requests, as they've already been working on the previous comments, making it a bit of a wild goose chase.

@rchincha
Copy link
Contributor

rchincha commented Oct 8, 2025

@rchincha, sure - I can resume the work, although I'm disappointed with the review style and last-minute submissions of the change requests, as they've already been working on the previous comments, making it a bit of a wild goose chase.

@lukaszraczylo feedback noted. Just so you are aware, the contributors and maintainers of this project work from diverse locations/timezones and mostly contributing their free time, hence feels a little choppy at times - overall the goal is to continue delivering a high quality project.

@andaaron
Copy link
Contributor

andaaron commented Oct 9, 2025

@rchincha when sync-ing an index, do we want an exact copy, or a new index containing just a subset of the synced manifests? The image would contain invalid references in the 1st case, and digest would change in the 2nd case. There is also a 3rd option of saving the new content with the old digest, but I don't know what the implications would be, since it doesn't respect the content addressability

@rchincha let's sync on this before more effort is put in

@rchincha
Copy link
Contributor

rchincha commented Oct 10, 2025

For example, upstream repo has:
original_index[manifest_archA, manifest_archB, manifest_archC]

Assuming we want to sync/mirror only archA, we can either make:

  1. original_index[manifest_archA, manifest_archB, manifest_archC] with missing manifest_archB and manifest_archC blobs, but this may require GC fixes potentially. Referrers etc which are downloaded will remain intact.

(OR)

  1. new_index[manifest_archA], but this may have client implications because of digest mismatches potentially.

@rchincha
Copy link
Contributor

There appears to be a notion of "sparse index" which is option 1. We should probably implement that. We should fix any GC issues that pop up.
However, let's also ask the question, what happens if configuration is changed to reduce/increase/alter the arch pattern?

@rchincha
Copy link
Contributor

@andaaron
Copy link
Contributor

Ok, let's go for the sparse index. We'll need to make a list of all cases where we are walking repo content and check if we handle missing references nicely.

@andaaron
Copy link
Contributor

@lukaszraczylo I have merged #3503 today.
It should take care of all the issues sparse indexes may cause in other features after they get copied to zot storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants