-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement proxy mode #860
Implement proxy mode #860
Conversation
…ic#861) Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.2 to 1.13.0. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](prometheus/client_golang@v1.12.2...v1.13.0) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.21.0 to 1.22.0. - [Release notes](https://github.com/uber-go/zap/releases) - [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md) - [Commits](uber-go/zap@v1.21.0...v1.22.0) --- updated-dependencies: - dependency-name: go.uber.org/zap dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update favicon file with the new Elastic Package Registry logo. A new "img" folder has also been added to include other available formats of this new logo (png, svg and ico).
Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.24.0 to 1.25.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@pubsub/v1.24.0...spanner/v1.25.0) --- updated-dependencies: - dependency-name: cloud.google.com/go/storage dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#866) Bumps [github.com/fsouza/fake-gcs-server](https://github.com/fsouza/fake-gcs-server) from 1.38.3 to 1.38.4. - [Release notes](https://github.com/fsouza/fake-gcs-server/releases) - [Commits](fsouza/fake-gcs-server@v1.38.3...v1.38.4) --- updated-dependencies: - dependency-name: github.com/fsouza/fake-gcs-server dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@dependabot merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some comments.
My only concern is about needing to add logic to all handlers for this. I wonder if we can implement it as an indexer, added a comment about this. In any case I would be also ok if you prefer to don't use indexers for this.
if err != nil { | ||
logger.Error("getting package path failed", zap.Error(err)) | ||
http.Error(w, "internal server error", http.StatusInternalServerError) | ||
return | ||
} | ||
if len(packages) == 0 { | ||
if len(pkgs) == 0 && proxyMode.Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like so much that we need to add logic to all handlers for the proxy mode.
I guess that another option would be to implement it as an indexer, but as indexers are now we would be making unneeded calls to the remote registry if there is already a local package that matches. Is this the reason to don't use indexers?
Maybe a way to overcome this could be to add to the indexer interface a GetFirst() (*packages.Package, error)
method, that could be used by indexers to optimize the [several] cases where we only want the first package. Most indexers would implement GetFirst()
as returning the first package in the result of Get()
, but the CombinedIndexer
could implement it as breaking the loop once it finds a package in one indexer.
Or a similar approach would be to add a new MaxResults
option to GetOptions
, to return only the first n packages found. But an option can be ignored/forgotten, while a method in the interface needs to be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that another option would be to implement it as an indexer, but as indexers are now we would be making unneeded calls to the remote registry if there is already a local package that matches. Is this the reason to don't use indexers?
I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with all=true
) on every API call. Responses would be heavy and heavier in the future. I wouldn't like to implement an artificial cache that syncs periodically with the remote Package Registry, but I'm happy to discuss other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with
all=true
) on every API call.
Would this still be true with the GetFirst()
approach? The logic would be similar to what is done here in each handler, but it would be centralized in the CombinedIndexer
: it would return the first package found without needing to look for all indexers, so it wouldn't call the proxy for packages available locally. And when it calls the proxy, it would be with the parameters of the request, not with all=true
.
Another advantage of using an indexer is that we can add it only when the feature is enabled, without needing to check on every call if the proxy is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't get me wrong, I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.
The problem with an indexer is that it isn't aware of the calling context and it can't return correct/full package metadata. Consider following flows:
Search packages:
- Call
indexer.Get(options)
:
1.1 Mashal the filter to query.
1.2 Make a call/search?all=true
(heavy HTTP response)
1.3 Apply the filter.
1.4 Fill private (fsBuilder
,resolver
,semver
) or skipped fields (BasePath
)
Problem: Unfortunately, returned packages don't contain all properties of policy templates (Package vs BasePackage), so the result returned by indexer is always incorrect.
Package resources/signature:
- Call
indexer.Get(options)
:
1.1 Mashal the filter to query - based on the filter we should figure out to which endpoint we should proxy request?
1.2 Make a call, for example: /package/apm/8.4.0/
1.3 Apply the filter.
1.4 Fill private (fsBuilder
,resolver
,semver
) or skipped fields (BasePath
)
Categories:
- Call
indexer.Get(options)
:
1.1 Mashal the filter to query.
1.2 Make a call/search?all=true
(heavy HTTP response)
...
Problem: Now, we need to pull all packages to map them into categories. We need to visit also categories in policy templates, BUT the /search
endpoint don't expose them (all categories are group together in categories: [ ... ]
. For the sake of /categories
, we could use those buggy packages, but they would work accidentally (we don't analyze policy templates).
I might have missed something while I tried to implement this approach. Feel free to share your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.
Yeah, it is a bit beyond of their concept to use indexers for this. More than specifically using them, I was wondering about a way to separate more the proxy mode without needing to modify all the handlers, so when disabled no additional code is executed.
To search packages I don't think that we would need to search all and filter, we could translate the options.Filter
into an equivalent search query, and wouldn't be needed to filter afterwards. From the result the private fields would be filled as they are now in this PR. https://github.com/elastic/package-registry/pull/860/files#diff-d3d83ca5e1802bcf5d721013c2c366abb0d682c3e100e4307630c34f85cf6f89R93-R100
Though for categories it is true that we would need to search for all packages, and even this could be inaccurate as you mention.
Two more ideas to separate the proxy mode:
- Use a middleware in the router. This middleware would need to be able to handle every API endpoint, capture the response from the actual handlers in the response writer and interpret it, decide if needed to make a request to the proxified registry, and merge the results. May be cumbersome and not sure if always possible, but would allow to inject the proxy only when enabled.
- Follow an approach similar to the storage indexer, each registry has a new API to expose an "index" of the packages it serves. Proxy registries can periodically query this API and update their local index.
But we can continue with the current approach, every option seems to have its cons. Thanks for the discussion!
if err != nil { | ||
logger.Error("getting package path failed", zap.Error(err)) | ||
http.Error(w, "internal server error", http.StatusInternalServerError) | ||
return | ||
} | ||
if len(packages) == 0 { | ||
if len(pkgs) == 0 && proxyMode.Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed this case. If we want to implement an indexer then we will have to call the search endpoint (with
all=true
) on every API call.
Would this still be true with the GetFirst()
approach? The logic would be similar to what is done here in each handler, but it would be centralized in the CombinedIndexer
: it would return the first package found without needing to look for all indexers, so it wouldn't call the proxy for packages available locally. And when it calls the proxy, it would be with the parameters of the request, not with all=true
.
Another advantage of using an indexer is that we can add it only when the feature is enabled, without needing to check on every call if the proxy is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's continue with current approach, unless my last comment rings some bell on how to do it without modifying the handlers 🙂 Thanks for the discussion!
if err != nil { | ||
logger.Error("getting package path failed", zap.Error(err)) | ||
http.Error(w, "internal server error", http.StatusInternalServerError) | ||
return | ||
} | ||
if len(packages) == 0 { | ||
if len(pkgs) == 0 && proxyMode.Enabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to use a dedicated indexer here, but I'm not sure if it's possible to apply the concept here.
Yeah, it is a bit beyond of their concept to use indexers for this. More than specifically using them, I was wondering about a way to separate more the proxy mode without needing to modify all the handlers, so when disabled no additional code is executed.
To search packages I don't think that we would need to search all and filter, we could translate the options.Filter
into an equivalent search query, and wouldn't be needed to filter afterwards. From the result the private fields would be filled as they are now in this PR. https://github.com/elastic/package-registry/pull/860/files#diff-d3d83ca5e1802bcf5d721013c2c366abb0d682c3e100e4307630c34f85cf6f89R93-R100
Though for categories it is true that we would need to search for all packages, and even this could be inaccurate as you mention.
Two more ideas to separate the proxy mode:
- Use a middleware in the router. This middleware would need to be able to handle every API endpoint, capture the response from the actual handlers in the response writer and interpret it, decide if needed to make a request to the proxified registry, and merge the results. May be cumbersome and not sure if always possible, but would allow to inject the proxy only when enabled.
- Follow an approach similar to the storage indexer, each registry has a new API to expose an "index" of the packages it serves. Proxy registries can periodically query this API and update their local index.
But we can continue with the current approach, every option seems to have its cons. Thanks for the discussion!
Issue: #770
This PR implements the proxy mode to connect to EPR and combine remote resources with existing local packages.
Testing:
go run . --feature-proxy-mode --log-level debug