-
Notifications
You must be signed in to change notification settings - Fork 171
Add embedding engine #3280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jerm/2026-01-13-optimizer-in-vmcp
Are you sure you want to change the base?
Add embedding engine #3280
Conversation
This PR addresses several goals: Configuration Consolidation: Move CRD-specific fields into config.Config to eliminate duplication and establish a single source of truth Validation Unification: Consolidate scattered validation logic into shared, reusable functions Type Simplification: Remove redundant CRD-to-config conversion functions by embedding config types directly in CRDs Bug Fixes: Fix telemetry config field loss during CRD conversion Documentation: Clarify field precedence and add proper kubebuilder annotations --------- Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## jerm/2026-01-13-optimizer-in-vmcp #3280 +/- ##
====================================================================
Coverage ? 62.60%
====================================================================
Files ? 378
Lines ? 37427
Branches ? 0
====================================================================
Hits ? 23431
Misses ? 12082
Partials ? 1914 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ilures (#3084) * add comprehensive tests for remote MCP server health check failures * updated the tests to table driven approach * refacr the tests * refactored the tests * reduce healthcheck time interval for testing * updated halthcheck test for retry mechanism * refactor to fix linting issue
We have received reports of the remote workload healthchecks being brittle and putting workloads into an unhealthy state when they are still working. Disable for now as we investigate the problem further.
In some places, we are using the background context when we could use request context instead.
Move the HTTP client interface to pkg/networking so it can be shared across packages. This consolidates the previously private httpClient interface from pkg/auth/oauth.
Co-authored-by: JAORMX <145564+JAORMX@users.noreply.github.com>
111676e to
3ba5cc0
Compare
Move HTTPError type from operator httpclient package to the shared networking package for reuse across the codebase. Changes: - Add pkg/networking/http_error.go with HTTPError, NewHTTPError, and IsHTTPError helper - Add pkg/networking/http_error_test.go with equivalent test coverage - Update operator httpclient to import from networking package - Remove duplicate types.go and types_test.go from operator Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Changes New Error Package (pkg/errors) Added WithCode(error, int) to wrap errors with HTTP status codes Added Code(error) to extract HTTP status codes from errors (defaults to 500) Added New(msg, code) to create new errors with codes Added HasCode(error) to check if an error has an associated code Supports Go's standard errors.Is() and errors.As() through proper Unwrap() implementation New API Error Handler (pkg/api/errors) Added ErrorHandler decorator that wraps handlers returning error Automatically converts errors to appropriate HTTP responses based on their codes Returns generic "Internal Server Error" for 5xx errors (avoids leaking sensitive details) Returns specific error messages for 4xx errors Updated Sentinel Errors Wrapped existing sentinel errors with appropriate HTTP status codes: groups.ErrGroupAlreadyExists → 409 Conflict groups.ErrGroupNotFound → 404 Not Found groups.ErrInvalidGroupName → 400 Bad Request runtime.ErrWorkloadNotFound → 404 Not Found workloads/types.ErrInvalidWorkloadName → 400 Bad Request workloads/types/errors.ErrRunConfigNotFound → 404 Not Found retriever.ErrBadProtocolScheme → 400 Bad Request retriever.ErrImageNotFound → 404 Not Found retriever.ErrInvalidRunConfig → 400 Bad Request secrets.ErrUnknownManagerType → 400 Bad Request secrets.ErrSecretsNotSetup → 404 Not Found secrets.ErrKeyringNotAvailable → 400 Bad Request session.ErrSessionNotFound → 404 Not Found Refactored API Handlers Updated all API handlers in pkg/api/v1 to: Return error instead of writing http.Error directly Use ErrorHandler decorator in router setup Rely on error codes for HTTP status mapping instead of errors.Is checks Affected files: pkg/api/v1/workloads.go pkg/api/v1/groups.go pkg/api/v1/clients.go pkg/api/v1/secrets.go Documentation Updated docs/error-handling.md to reflect the new error handling strategy Testing Added unit tests for pkg/errors package covering WithCode, Code, HasCode, New, and error unwrapping Added unit tests for ErrorHandler decorator verifying correct status codes and response body content Updated existing API tests to work with the new handler signatures --------- Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com> Co-authored-by: Eleftheria Stein-Kousathana <eleftheria@stacklok.com>
3ba5cc0 to
bca112f
Compare
Add type-safe HTTP fetch utilities with functional options pattern for
JSON API requests. Provides FetchJSON for GET requests and
FetchJSONWithForm for POST with form-encoded bodies.
Features:
- Generic type parameter for compile-time type safety
- Automatic Accept header and Content-Type validation
- Response size limiting
- Custom error handlers for domain-specific error parsing
- Secure error messages (uses status text, not response body)
OIDC Discovery:
```go
result, err := networking.FetchJSON[OIDCEndpoints](
ctx,
p.httpClient,
discoveryURL,
networking.WithErrorHandler(errorHandler),
)
```
OAuth Token Exchange:
```go
result, err := networking.FetchJSONWithForm[tokenResponse](
ctx,
p.httpClient,
endpoint,
params,
networking.WithErrorHandler(errorHandler),
)
```
The existing FetchResourceMetadata function in pkg/auth/discovery/discovery.go
has ~60 lines of boilerplate for HTTP/JSON fetching that could be reduced to:
```go
result, err := networking.FetchJSON[auth.RFC9728AuthInfo](ctx, client, metadataURL)
if err != nil {
return nil, err
}
if result.Data.Resource == "" {
return nil, fmt.Errorf("metadata missing required 'resource' field")
}
return &result.Data, nil
```
Similar refactoring opportunities exist in:
- pkg/auth/oauth/oidc.go (OIDC discovery)
- pkg/auth/oauth/dynamic_registration.go (DCR responses)
therealnb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but mostly looks good.
Signed-off-by: Dan Barr <6922515+danbarr@users.noreply.github.com> Co-authored-by: Dan Barr <6922515+danbarr@users.noreply.github.com>
bca112f to
e290799
Compare
e290799 to
2165ce8
Compare
Title says it all. There are a few TODOs left in the VMCP.Spec from #3125 which can be deleted. It does not make sense to pull these fields into the Config object.
* Add OAuth2 authorization server configuration and JWKS utilities Implement the client-facing OAuth2 authorization server configuration using Ory Fosite's composable handler architecture. The core integration centers on AuthorizationServerConfig, a wrapper around fosite.Config that extends it with JWK/JWKS support for JWT signing. A Factory pattern enables pluggable handler composition where each factory receives the full config, storage, and strategy, returning handlers that are auto- registered based on which fosite interfaces they implement (AuthorizeEndpointHandler, TokenEndpointHandler, TokenIntrospector, RevocationHandler, PushedAuthorizeEndpointHandler). Security settings include enforced PKCE with S256-only per MCP specification, token lifespan bounds validation, and HMAC secret rotation support via RotatedGlobalSecrets for zero-downtime key changes. This module serves as the configuration layer in the authserver's federation model, where ToolHive acts as an intermediary between MCP clients and upstream identity providers. MCP clients authenticate to this authorization server, which then federates authentication to upstream IDPs. AuthorizationServerConfig bridges AuthorizationServerParams (cryptographic materials, token lifespans, issuer URL) with the fosite framework, and is consumed by both the storage layer (fosite's authorization/token/PKCE handlers) and the HTTP handlers serving OAuth2 endpoints. PublicJWKS() enables safe key exposure at /.well-known/jwks.json for client token verification. * Address PR review feedback for OAuth2 server config - Validate rotated HMAC secrets meet minimum length requirement, not just the current secret. This is defense-in-depth to catch misconfiguration when HMACSecrets is constructed directly. - Rename GetSigningJWKS to GetPrivateSigningJWKS to make it explicit that this method returns private key material, reducing risk of accidental exposure. - Add test case for weak rotated HMAC secret validation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add in-memory authserver storage implementation This is a preparatory commit for the auth-proxy feature. The storage package provides the persistence layer for the OAuth 2.0 authorization server. In the complete branch, this storage is: - Passed to authserver.New() and used by the fosite OAuth2 provider for authorization codes, access tokens, refresh tokens, and PKCE - Used by handlers to validate clients and store pending authorizations while users authenticate with the upstream identity provider - Exposed via Server.IDPTokenStorage() for the token swap middleware, which replaces client JWTs with upstream IDP tokens for backend requests This commit adds: - Storage interface embedding fosite's OAuth2 storage interfaces plus ToolHive-specific interfaces for upstream tokens, pending authorizations, and client registration - MemoryStorage implementation with thread-safe maps and background cleanup - Comprehensive documentation in doc.go explaining fosite's type system (Requester, Session, signature vs request ID) to help developers understand why the API is designed as it is The in-memory implementation is suitable for single-instance deployments and development. Production deployments will use a distributed storage backend (Redis or database). * Address PR #3298 review feedback This commit addresses three review comments from the PR: - Remove authorization code from debug log - Remove internal state from debug logs - Refactor cleanupExpired() to collect-then-delete pattern The original implementation held a write lock while iterating all 8 storage maps, blocking all read and write operations during cleanup. The new implementation: - Acquires READ lock to collect expired keys into slices - Releases read lock and returns early if nothing to delete - Acquires WRITE lock only for the deletion phase This reduces write lock hold time from O(total entries) to O(expired entries), improving concurrency for read-heavy workloads.
c88d4e5 to
4dc64c0
Compare
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Large PR justification has been provided. Thank you!
2377edf to
9f438b8
Compare
This is intended to prevent the server and client from getting overloaded by an excessively-large log file. Fixes: #2213
9f438b8 to
bdf43ee
Compare
bdf43ee to
ec9aae8
Compare
Introduces a new MCPEmbedding custom resource to deploy HuggingFace embedding models as MCP servers in Kubernetes. This enables semantic search and similarity features for MCP tools and resources. Key Features: - Custom resource definition for embedding model deployments - Integration with HuggingFace text-embeddings-inference - Support for model caching via PersistentVolumeClaims - Flexible resource configuration and pod customization - GroupRef support for organizational grouping - Comprehensive status conditions and phase tracking Components: - MCPEmbedding CRD with validation and webhook support - Controller for managing deployment lifecycle - Generated CRD manifests and Helm chart templates - RBAC permissions for managing embeddings - Example configurations for various use cases This change is based on the original commit by rebasing onto jerm/2026-01-13-optimizer-in-vmcp to remove intermediate commits. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
97bc58c to
8d13462
Compare
8d13462 to
6203f70
Compare
Add MCPEmbedding CRD for embedding model deployment in operator
Introduces a new MCPEmbedding custom resource to deploy HuggingFace
embedding models as MCP servers in Kubernetes. This enables semantic
search and similarity features for MCP tools and resources.
Key additions:
The embedding server uses huggingface-embedding-inference with persistent
volume caching for downloaded models and integrates with MCPGroups for
organizing backend workloads.
Large PR Justification
Multiple related changes that would break if separated