Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the blob storage in pkg/registry pluggable. #1158

Closed
wants to merge 2 commits into from

Conversation

mattmoor
Copy link
Collaborator

With this change folks can use registry.WithBlobHandler(...) to substitute a custom blob handling layer.

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2021

Codecov Report

Merging #1158 (2a9295a) into main (e7cd6af) will decrease coverage by 0.25%.
The diff coverage is 58.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1158      +/-   ##
==========================================
- Coverage   75.22%   74.96%   -0.26%     
==========================================
  Files         108      108              
  Lines        7850     7982     +132     
==========================================
+ Hits         5905     5984      +79     
- Misses       1379     1424      +45     
- Partials      566      574       +8     
Impacted Files Coverage Δ
pkg/registry/registry.go 92.98% <50.00%> (-7.02%) ⬇️
pkg/registry/blobs.go 77.34% <58.45%> (-11.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7cd6af...2a9295a. Read the comment docs.

With this change folks can use `registry.WithBlobHandler(...)` to substitute a custom blob handling layer.
Copy link
Collaborator

@imjasonh imjasonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, if you're interested we can do manifests next (roughly the same way), and just leave uploads for later when/if we ever need it.

@@ -102,10 +151,20 @@ func (b *blobs) handle(resp http.ResponseWriter, req *http.Request) *regError {
}
}

resp.Header().Set("Content-Length", fmt.Sprint(len(b)))
b, ok := b.bh.Get(repo, h)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can wrap the returned ReadCloser in a internal/verify.ReadCloser to automagically check the digest matches before returning it.

If we do it in here, we'll need to be sure to check the error on defer b.Close().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a bunch of additional checks we could add, so I'm sorta reluctant to start pulling that particular thread in this PR. Another notable bit is that we don't check the possible err return value on io.Copy() below, which at one point was the source of a fun bug in the Docker client (around digest verification!).

I think my personal bias would be towards leaning on internal consistency and checking things on their way in (vs. on the way out), but I'd be happy to iterate with you on follow up PRs here.

@mattmoor
Copy link
Collaborator Author

mattmoor commented Nov 2, 2021

@imjasonh I'll optimistically start to look at what we might be able to do with manifests tomorrow.

@mattmoor
Copy link
Collaborator Author

mattmoor commented Nov 2, 2021

@jonjohnsonjr PTAL

"math/rand"
"net/http"
"path"
"strings"
"sync"

"github.com/google/go-containerregistry/pkg/name"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to keep this from having any dependencies outside the stdlib.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late:

v1 "github.com/google/go-containerregistry/pkg/v1"
"github.com/google/go-containerregistry/pkg/v1/types"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I agree with this goal for implementations of the interfaces, but it seems wasteful to reimplement validation that exists in the name and similar packages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jon beat me to it 🙈

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment stands. It's foolish to reimplement the validation we have in name and other places that let us avoid dealing with stringly typed data, and I don't see the risk of a dependency cycle for these superficial things because they should absolutely NOT need a full-blown registry implementation to test interfaces.

I could absolutely get behind a depcheck style test that this never depend on pkg/v1/remote in particular.

pkg/registry/blobs.go Outdated Show resolved Hide resolved
pkg/registry/blobs.go Show resolved Hide resolved
Message: err.Error(),
}
}
repo, err := name.NewRepository(req.URL.Host + path.Join(elem[1:len(elem)-2]...))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Resurrecting this because I'm sad we couldn't make it happen)

Aside from the question of whether we need to depend on pkg/name here, I wonder if req.URL.Host is adding anything besides letting the string pass through NewRepository. Really all blob storers want is the subrepo path.

I hear the concern about duplicating validation logic though. Would it be reasonable to export a method from pkg/name to validate a subrepo path just so pkg/registry could use it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sad we couldn't make it happen

Me too, but alas I've unblocked myself (and it is glorious 😉 ).

I hear the concern about duplicating validation logic though

We have a bigger issue here. This repo is now apparently governed by unwritten rules in one person's head, which also have had no rationale articulated anywhere that was arrived at by consensus of the stakeholders in this library.

I've clearly demonstrated a willingness to send PRs here (and elsewhere) to fix things, move things forward, and I'm happy to engage in productive discussions about them, but waiting a week to get a Nack based on unwritten rules (that are already violated with what is checked in), is frankly both aggravating and unsustainable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Package registry implements a docker V2 registry and the OCI distribution specification.
//
// It is designed to be used anywhere a low dependency container registry is needed, with an
// initial focus on tests.
//
// Its goal is to be standards compliant and its strictness will increase over time.
//
// This is currently a low flightmiles system. It's likely quite safe to use in tests; If you're using it
// in production, please let us know how and send us CL's for integration tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for this to disqualify this change, "low dependency" must be interpreted to exclude what it currently imports. 🙃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bigger issue here. This repo is now apparently governed by unwritten rules in one person's head, which also have had no rationale articulated anywhere that was arrived at by consensus of the stakeholders in this library.

Hoping to address this in #1166

Taking the issue of dependency philosophy aside, I think I'm still convinced that the dependency on pkg/name is unnecessary, or at least having a full name.Repository in the interface is. If not for validation of the repository path, you don't actually need a full name.Repository, and could just get by with Get(string path, h v1.Hash), and not add any new dependencies. Path validation itself is fairly simple, and could be moved to internal/something and shared by pkg/name and pkg/registry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming details of the blob layer by stating that it can only care about path. Every single cloud vendor registry uses host to route storage requests, Amazon even gives each customer their own hostname per region. I doubt this is how you wrote kontain.me, but….

Hopefully you get my point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you're saying. The same code depending on pkg/registry with some blob storage impl, running at foo.registry.example and bar.registry.example, might decide to store those blobs differently based on the host.

Can we document this expectation in the doc comments for BlobHandler? It wasn't immediately clear to me (because I'm a big dumb dummy) why the host was included, except to make path validation easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document this expectation in the doc comments for BlobHandler

Absolutely.


// Get returns true and a reader for consuming the blob specified with the hash,
// if it exists. It now, it returns (nil, error).
Get(repo name.Repository, h v1.Hash) (io.ReadCloser, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something occurred to me about this API, thinking through how I could use it with kontain.me -- this API would require kontain.me to proxy and serve blobs from GCS, instead of HTTP redirecting like I do today, which is a lot faster and cheaper than proxying through Cloud Run.

This might just mean that pkg/registry isn't going to be a good fit for kontain.me, and that's fine, but I wanted to surface that here since I think it's a pretty nice use case that isn't supported by this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants