Skip to content

Conversation

@rchincha
Copy link
Contributor

Currently, our dedupe scheme is very complicated since master copy is kept in one of the repo dirs and if that image is deleted, the owning repo is changed, requiring multiple repo locks causing unnecessary contention.

What type of PR is this?

Which issue does this PR fix:

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs showing the issue:

Testing done on this change:

Automation added to e2e:

Will this break upgrades or downgrades?

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@rchincha
Copy link
Contributor Author

Should pair with #2968

@rchincha rchincha force-pushed the fix-dedupe branch 5 times, most recently from 6edb9fb to b4c3c46 Compare September 4, 2025 05:23
DefaultGCInterval = 1 * time.Hour
S3StorageDriverName = "s3"
LocalStorageDriverName = "local"
GlobalBlobsRepo = "_blobstore"
Copy link
Contributor

@andaaron andaaron Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this blobs repo is per store and substore. Because substores may be on different partitions.

@rchincha rchincha force-pushed the fix-dedupe branch 6 times, most recently from 4b2da37 to da2884a Compare September 11, 2025 05:24
@rchincha rchincha force-pushed the fix-dedupe branch 3 times, most recently from 49521a6 to 23f1ebc Compare October 10, 2025 06:47
Copy link
Contributor

@andaaron andaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logging is too verbose, and should be trimmed

substore := c.StoreController.SubStore[route]
if substore != nil {
substore.RunDedupeBlobs(time.Duration(0), c.taskScheduler)
//substore.RunDedupeBlobs(time.Duration(0), c.taskScheduler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just temporary, or we're dropping dedupe altogether?

return nil
}

// create nested deduped bucket where we store all the deduped blobs + original blob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// create nested deduped bucket where we store all the deduped blobs + original blob
// create nested deduped bucket where we store all the deduped blobs while excluding the original blob

Correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logging is too verbose for merge. It would be too much information to include logs for the code paths not returning errors.


dedupeBlob := d.getOne(deduped)
if dedupeBlob != nil {
d.log.Debug().Str("digest", digest.String()).Str("path", path).Msg("more in dedupe bucket, leaving original alone")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous logic was replacing the old original blob with one of the duplicates, and then delete the blob.
This new logic would keep the original blob around forever? How/when would it be GCed?

}

return nil
return zerr.ErrCacheMiss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which scenario would this return the error?
The updated code is a bit confusing as it is returning nil in several other places and it is unclear in which scenario this line is reached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are there no other needed changes in this file?
  2. Also the redis implementation should be reviewed

}

origin := bucket.Bucket([]byte(constants.OriginalBucket))
if origin != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if origin is nil, then we should return nil, because the blob does not exist.

It is not in the duplicates list and it is not the original, so it does not exist, so the deletion should be marked as successful, correct?

}

// skip the global blobs repo
if repo == constants.GlobalBlobsRepo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is fishy.
Why is this here? constants.GlobalBlobsRepo is not an actual repo because it has an invalid repo name. How is GC detecting it when "walking" the disk if it doesn't have a valid repo name? It shouldn't call this function at all for constants.GlobalBlobsRepo?


return err
}
goto retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're refactoring this, can you pick up #2969?
Or do you want me to rebase and merge that in advance?

Is this code working right now, wouldn't it potentially execute the code multiple times?

		if err := is.storeDriver.Move(src, gdst); err != nil {
			is.log.Error().Err(err).Str("src", src).Str("dst", gdst).Str("component", "dedupe").
				Msg("failed to rename blob")

			return err
		}

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the deduplication scheme to simplify blob management by introducing a centralized global blob storage location (_blobstore) instead of maintaining master copies within individual repository directories. This eliminates complex repository ownership changes when images are deleted and reduces lock contention.

Key Changes:

  • Introduces a GlobalBlobsRepo constant (_blobstore) to serve as a centralized master copy location for all deduplicated blobs
  • Refactors validation logic by moving it from initRepo to InitRepo to allow internal initialization of the special global blobs repository
  • Updates cache deletion logic in BoltDB and DynamoDB drivers to handle the new deduplication scheme
  • Modifies garbage collection to include the global blobs repository

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
pkg/storage/constants/constants.go Adds GlobalBlobsRepo constant defining the centralized blob storage location
pkg/storage/imagestore/imagestore.go Refactors deduplication logic to use global blobs repo, moves validation from initRepo to InitRepo
pkg/storage/cache/boltdb.go Updates cache deletion logic to check duplicates bucket first and handle global blob storage
pkg/storage/cache/dynamodb.go Adds duplicate blob check before deletion in DynamoDB cache driver
pkg/storage/cache/boltdb_test.go Updates test expectations for new cache behavior (contains debug statements)
pkg/storage/cache_test.go Updates test expectations for cache deletion behavior
pkg/storage/storage_test.go Adds GlobalBlobsRepo to test candidates list and skips it in image writing loop
pkg/storage/gc/gc.go Adds garbage collection logic for global blobs repository (contains commented code and FIXMEs)
pkg/api/controller.go Comments out RunDedupeBlobs calls (temporarily disabled functionality)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Enable running dedupe blobs both ways (dedupe or restore deduped blobs)
c.StoreController.DefaultStore.RunDedupeBlobs(time.Duration(0), c.taskScheduler)
// c.StoreController.DefaultStore.RunDedupeBlobs(time.Duration(0), c.taskScheduler)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out dedupe functionality should be removed if it's no longer needed, or should have a clear comment explaining why it's temporarily disabled with a tracking issue reference.

Copilot uses AI. Check for mistakes.
}

func (d *BoltDBDriver) HasBlob(digest godigest.Digest, blob string) bool {
d.log.Debug().Str("digest", digest.String()).Str("blob", "blob").Msg("checking blob in cache")
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name "blob" is logged as the string literal "blob" instead of the actual variable value. This should be Str("blob", blob) instead of Str("blob", "blob").

Suggested change
d.log.Debug().Str("digest", digest.String()).Str("blob", "blob").Msg("checking blob in cache")
d.log.Debug().Str("digest", digest.String()).Str("blob", blob).Msg("checking blob in cache")

Copilot uses AI. Check for mistakes.
So(err, ShouldBeNil)

So(blobs, ShouldResemble, []string{"second"})
//So(blobs, ShouldResemble, []string{"second"})
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed. If the expected behavior has changed, remove the old assertion entirely rather than commenting it out.

Suggested change
//So(blobs, ShouldResemble, []string{"second"})

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +115
// FIXME: also keep cache in sync
// FIXME: we don't have dedupe flag here!
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME comments indicate incomplete work. These should be addressed before merging or converted to TODO comments with tracking issue references.

Suggested change
// FIXME: also keep cache in sync
// FIXME: we don't have dedupe flag here!
// TODO: also keep cache in sync (see issue #1234)
// TODO: we don't have dedupe flag here! (see issue #1234)

Copilot uses AI. Check for mistakes.

// cache record exists, but due to GC and upgrades from older versions,
// disk content and cache records may go out of sync
is.log.Debug().Bool("relpath?", is.cache.UsesRelativePaths()).Msg("check this")
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging statement with hardcoded string "check this" should be removed or given a more meaningful message before merging.

Suggested change
is.log.Debug().Bool("relpath?", is.cache.UsesRelativePaths()).Msg("check this")
is.log.Debug().Bool("relpath?", is.cache.UsesRelativePaths()).Msg("checking if cache uses relative paths for dedupe")

Copilot uses AI. Check for mistakes.
err = cacheDriver.DeleteBlob("key", "bogusValue")
So(err, ShouldBeNil)
So(err, ShouldEqual, errors.ErrCacheMiss)
//So(err, ShouldBeNil)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed. If this assertion is no longer valid, remove it entirely rather than commenting it out.

Suggested change
//So(err, ShouldBeNil)

Copilot uses AI. Check for mistakes.

So(blobs, ShouldResemble, []string{"second", "third"})
So(blobs, ShouldResemble, []string{"first", "second", "third"})
//So(blobs, ShouldResemble, []string{"second", "third"})
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed. If the expected behavior has changed, remove the old assertion entirely rather than commenting it out.

Suggested change
//So(blobs, ShouldResemble, []string{"second", "third"})

Copilot uses AI. Check for mistakes.

err = cacheDriver.DeleteBlob("key", "bogusValue")
So(err, ShouldBeNil)
//So(err, ShouldBeNil)
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should be removed. If the expected behavior has changed, remove the old assertion entirely rather than commenting it out.

Suggested change
//So(err, ShouldBeNil)

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +132
/*
if err := gc.cleanRepo(ctx, constants.GlobalBlobsRepo); err != nil {
errMessage := "failed to run GC for " + path.Join(gc.imgStore.RootDir(), constants.GlobalBlobsRepo)
gc.log.Error().Err(err).Str("module", "gc").Msg(errMessage)
gc.log.Info().Str("module", "gc").
Msg("gc unsuccessfully completed for " + path.Join(gc.imgStore.RootDir(), constants.GlobalBlobsRepo))
return err
}
*/

Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large block of commented out code should be removed. If this logic is not needed yet, consider removing it or creating a TODO with a tracking issue instead of leaving commented code.

Suggested change
/*
if err := gc.cleanRepo(ctx, constants.GlobalBlobsRepo); err != nil {
errMessage := "failed to run GC for " + path.Join(gc.imgStore.RootDir(), constants.GlobalBlobsRepo)
gc.log.Error().Err(err).Str("module", "gc").Msg(errMessage)
gc.log.Info().Str("module", "gc").
Msg("gc unsuccessfully completed for " + path.Join(gc.imgStore.RootDir(), constants.GlobalBlobsRepo))
return err
}
*/
// TODO: Consider running cleanRepo for constants.GlobalBlobsRepo if/when needed. See issue tracker for details.

Copilot uses AI. Check for mistakes.
Comment on lines +1178 to +1179
// FIXME: this seems incorrect
return nil
Copy link

Copilot AI Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME comment with "this seems incorrect" indicates a known issue. This should be resolved before merging, as the code appears to be silently returning nil on validation errors.

Suggested change
// FIXME: this seems incorrect
return nil
// return the validation error so the caller is aware of the failure
return err

Copilot uses AI. Check for mistakes.
Currently, zot uses one of the repos as the master copy for a blob to
achieve dedupe. However, blobs can be deleted from repos and this
complicates dedupe tracking logic.

Now use a single hidden global repo as a blob store instead.

Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Signed-off-by: Ramkumar Chinchani <rchincha.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants