Skip to content

Conversation

@ptelang
Copy link

@ptelang ptelang commented Jan 13, 2026

Add MCPEmbedding CRD for embedding model deployment in operator

Introduces a new MCPEmbedding custom resource to deploy HuggingFace
embedding models as MCP servers in Kubernetes. This enables semantic
search and similarity features for MCP tools and resources.

Key additions:

  • MCPEmbedding CRD with model caching, resource management, and grouping
  • Controller managing deployment lifecycle and model initialization
  • Helm chart templates and CRD definitions
  • Example configurations for basic and advanced use cases
  • CRD API documentation updates

The embedding server uses huggingface-embedding-inference with persistent
volume caching for downloaded models and integrates with MCPGroups for
organizing backend workloads.

Large PR Justification

Multiple related changes that would break if separated

This PR addresses several goals:

Configuration Consolidation: Move CRD-specific fields into config.Config to eliminate duplication and establish a single source of truth
Validation Unification: Consolidate scattered validation logic into shared, reusable functions
Type Simplification: Remove redundant CRD-to-config conversion functions by embedding config types directly in CRDs
Bug Fixes: Fix telemetry config field loss during CRD conversion
Documentation: Clarify field precedence and add proper kubebuilder annotations
---------

Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Jan 13, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@ptelang ptelang requested review from jerm-dro and therealnb January 13, 2026 23:15
@ptelang ptelang marked this pull request as draft January 13, 2026 23:17
@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 3.83333% with 577 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (jerm/2026-01-13-optimizer-in-vmcp@cb723c3). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...operator/controllers/embeddingserver_controller.go 0.87% 564 Missing and 1 partial ⚠️
cmd/thv-operator/main.go 0.00% 8 Missing ⚠️
...thv-operator/api/v1alpha1/embeddingserver_types.go 81.81% 4 Missing ⚠️
Additional details and impacted files
@@                         Coverage Diff                          @@
##             jerm/2026-01-13-optimizer-in-vmcp    #3280   +/-   ##
====================================================================
  Coverage                                     ?   62.60%           
====================================================================
  Files                                        ?      378           
  Lines                                        ?    37427           
  Branches                                     ?        0           
====================================================================
  Hits                                         ?    23431           
  Misses                                       ?    12082           
  Partials                                     ?     1914           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

amirejaz and others added 5 commits January 14, 2026 10:16
…ilures (#3084)

* add comprehensive tests for remote MCP server health check failures

* updated the tests to table driven approach

* refacr the tests

* refactored the tests

* reduce healthcheck time interval for testing

* updated halthcheck test for retry mechanism

* refactor to fix linting issue
We have received reports of the remote workload healthchecks being
brittle and putting workloads into an unhealthy state when they are
still working. Disable for now as we investigate the problem further.
In some places, we are using the background context when we could use
request context instead.
Move the HTTP client interface to pkg/networking so it can be shared
across packages. This consolidates the previously private httpClient
interface from pkg/auth/oauth.
Co-authored-by: JAORMX <145564+JAORMX@users.noreply.github.com>
@ptelang ptelang force-pushed the add-embedding-engine branch from 111676e to 3ba5cc0 Compare January 14, 2026 15:00
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 14, 2026
jhrozek and others added 2 commits January 14, 2026 16:01
Move HTTPError type from operator httpclient package to the shared
networking package for reuse across the codebase.

Changes:
- Add pkg/networking/http_error.go with HTTPError, NewHTTPError, and
  IsHTTPError helper
- Add pkg/networking/http_error_test.go with equivalent test coverage
- Update operator httpclient to import from networking package
- Remove duplicate types.go and types_test.go from operator

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Changes
New Error Package (pkg/errors)
Added WithCode(error, int) to wrap errors with HTTP status codes
Added Code(error) to extract HTTP status codes from errors (defaults to 500)
Added New(msg, code) to create new errors with codes
Added HasCode(error) to check if an error has an associated code
Supports Go's standard errors.Is() and errors.As() through proper Unwrap() implementation
New API Error Handler (pkg/api/errors)
Added ErrorHandler decorator that wraps handlers returning error
Automatically converts errors to appropriate HTTP responses based on their codes
Returns generic "Internal Server Error" for 5xx errors (avoids leaking sensitive details)
Returns specific error messages for 4xx errors
Updated Sentinel Errors
Wrapped existing sentinel errors with appropriate HTTP status codes:

groups.ErrGroupAlreadyExists → 409 Conflict
groups.ErrGroupNotFound → 404 Not Found
groups.ErrInvalidGroupName → 400 Bad Request
runtime.ErrWorkloadNotFound → 404 Not Found
workloads/types.ErrInvalidWorkloadName → 400 Bad Request
workloads/types/errors.ErrRunConfigNotFound → 404 Not Found
retriever.ErrBadProtocolScheme → 400 Bad Request
retriever.ErrImageNotFound → 404 Not Found
retriever.ErrInvalidRunConfig → 400 Bad Request
secrets.ErrUnknownManagerType → 400 Bad Request
secrets.ErrSecretsNotSetup → 404 Not Found
secrets.ErrKeyringNotAvailable → 400 Bad Request
session.ErrSessionNotFound → 404 Not Found
Refactored API Handlers
Updated all API handlers in pkg/api/v1 to:

Return error instead of writing http.Error directly
Use ErrorHandler decorator in router setup
Rely on error codes for HTTP status mapping instead of errors.Is checks
Affected files:

pkg/api/v1/workloads.go
pkg/api/v1/groups.go
pkg/api/v1/clients.go
pkg/api/v1/secrets.go
Documentation
Updated docs/error-handling.md to reflect the new error handling strategy
Testing
Added unit tests for pkg/errors package covering WithCode, Code, HasCode, New, and error unwrapping
Added unit tests for ErrorHandler decorator verifying correct status codes and response body content
Updated existing API tests to work with the new handler signatures

---------

Signed-off-by: Jeremy Drouillard <jeremy@stacklok.com>
Co-authored-by: Eleftheria Stein-Kousathana <eleftheria@stacklok.com>
@ptelang ptelang force-pushed the add-embedding-engine branch from 3ba5cc0 to bca112f Compare January 14, 2026 15:29
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 14, 2026
Add type-safe HTTP fetch utilities with functional options pattern for
JSON API requests. Provides FetchJSON for GET requests and
FetchJSONWithForm for POST with form-encoded bodies.

Features:
- Generic type parameter for compile-time type safety
- Automatic Accept header and Content-Type validation
- Response size limiting
- Custom error handlers for domain-specific error parsing
- Secure error messages (uses status text, not response body)

OIDC Discovery:
```go
result, err := networking.FetchJSON[OIDCEndpoints](
    ctx,
    p.httpClient,
    discoveryURL,
    networking.WithErrorHandler(errorHandler),
)
```

OAuth Token Exchange:
```go
result, err := networking.FetchJSONWithForm[tokenResponse](
    ctx,
    p.httpClient,
    endpoint,
    params,
    networking.WithErrorHandler(errorHandler),
)
```

The existing FetchResourceMetadata function in pkg/auth/discovery/discovery.go
has ~60 lines of boilerplate for HTTP/JSON fetching that could be reduced to:

```go
result, err := networking.FetchJSON[auth.RFC9728AuthInfo](ctx, client, metadataURL)
if err != nil {
    return nil, err
}
if result.Data.Resource == "" {
    return nil, fmt.Errorf("metadata missing required 'resource' field")
}
return &result.Data, nil
```

Similar refactoring opportunities exist in:
- pkg/auth/oauth/oidc.go (OIDC discovery)
- pkg/auth/oauth/dynamic_registration.go (DCR responses)
Copy link
Contributor

@therealnb therealnb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but mostly looks good.

Signed-off-by: Dan Barr <6922515+danbarr@users.noreply.github.com>
Co-authored-by: Dan Barr <6922515+danbarr@users.noreply.github.com>
@ptelang ptelang force-pushed the add-embedding-engine branch from bca112f to e290799 Compare January 14, 2026 17:57
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 14, 2026
@ptelang ptelang force-pushed the add-embedding-engine branch from e290799 to 2165ce8 Compare January 14, 2026 18:26
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 14, 2026
Title says it all. There are a few TODOs left in the VMCP.Spec from #3125 which can be deleted. It does not make sense to pull these fields into the Config object.
jhrozek and others added 2 commits January 16, 2026 12:13
* Add OAuth2 authorization server configuration and JWKS utilities

Implement the client-facing OAuth2 authorization server configuration
using Ory Fosite's composable handler architecture. The core integration
centers on AuthorizationServerConfig, a wrapper around fosite.Config that
extends it with JWK/JWKS support for JWT signing. A Factory pattern
enables pluggable handler composition where each factory receives the
full config, storage, and strategy, returning handlers that are auto-
registered based on which fosite interfaces they implement
(AuthorizeEndpointHandler, TokenEndpointHandler, TokenIntrospector,
RevocationHandler, PushedAuthorizeEndpointHandler). Security settings
include enforced PKCE with S256-only per MCP specification, token
lifespan bounds validation, and HMAC secret rotation support via
RotatedGlobalSecrets for zero-downtime key changes.

This module serves as the configuration layer in the authserver's
federation model, where ToolHive acts as an intermediary between MCP
clients and upstream identity providers. MCP clients authenticate to
this authorization server, which then federates authentication to
upstream IDPs. AuthorizationServerConfig bridges AuthorizationServerParams
(cryptographic materials, token lifespans, issuer URL) with the fosite
framework, and is consumed by both the storage layer (fosite's
authorization/token/PKCE handlers) and the HTTP handlers serving OAuth2
endpoints. PublicJWKS() enables safe key exposure at /.well-known/jwks.json
for client token verification.

* Address PR review feedback for OAuth2 server config

- Validate rotated HMAC secrets meet minimum length requirement,
  not just the current secret. This is defense-in-depth to catch
  misconfiguration when HMACSecrets is constructed directly.
- Rename GetSigningJWKS to GetPrivateSigningJWKS to make it explicit
  that this method returns private key material, reducing risk of
  accidental exposure.
- Add test case for weak rotated HMAC secret validation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add in-memory authserver storage implementation

This is a preparatory commit for the auth-proxy feature. The storage
package provides the persistence layer for the OAuth 2.0 authorization
server. In the complete branch, this storage is:

- Passed to authserver.New() and used by the fosite OAuth2 provider
  for authorization codes, access tokens, refresh tokens, and PKCE
- Used by handlers to validate clients and store pending authorizations
  while users authenticate with the upstream identity provider
- Exposed via Server.IDPTokenStorage() for the token swap middleware,
  which replaces client JWTs with upstream IDP tokens for backend requests

This commit adds:

- Storage interface embedding fosite's OAuth2 storage interfaces plus
  ToolHive-specific interfaces for upstream tokens, pending authorizations,
  and client registration
- MemoryStorage implementation with thread-safe maps and background cleanup
- Comprehensive documentation in doc.go explaining fosite's type system
  (Requester, Session, signature vs request ID) to help developers
  understand why the API is designed as it is

The in-memory implementation is suitable for single-instance deployments
and development. Production deployments will use a distributed storage
backend (Redis or database).

* Address PR #3298 review feedback

This commit addresses three review comments from the PR:

- Remove authorization code from debug log
- Remove internal state from debug logs

- Refactor cleanupExpired() to collect-then-delete pattern

The original implementation held a write lock while iterating all 8
storage maps, blocking all read and write operations during cleanup.

The new implementation:
- Acquires READ lock to collect expired keys into slices
- Releases read lock and returns early if nothing to delete
- Acquires WRITE lock only for the deletion phase

This reduces write lock hold time from O(total entries) to O(expired
entries), improving concurrency for read-heavy workloads.
@ptelang ptelang force-pushed the add-embedding-engine branch from c88d4e5 to 4dc64c0 Compare January 16, 2026 14:43
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026
@github-actions
Copy link
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot dismissed their stale review January 16, 2026 14:44

Large PR justification has been provided. Thank you!

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026
@ptelang ptelang force-pushed the add-embedding-engine branch from 2377edf to 9f438b8 Compare January 16, 2026 15:19
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026
This is intended to prevent the server and client from getting
overloaded by an excessively-large log file.

Fixes: #2213
@ptelang ptelang force-pushed the add-embedding-engine branch from 9f438b8 to bdf43ee Compare January 16, 2026 15:37
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jan 16, 2026
@ptelang ptelang force-pushed the add-embedding-engine branch from bdf43ee to ec9aae8 Compare January 16, 2026 17:55
ptelang and others added 7 commits January 16, 2026 12:58
Introduces a new MCPEmbedding custom resource to deploy HuggingFace
embedding models as MCP servers in Kubernetes. This enables semantic
search and similarity features for MCP tools and resources.

Key Features:
- Custom resource definition for embedding model deployments
- Integration with HuggingFace text-embeddings-inference
- Support for model caching via PersistentVolumeClaims
- Flexible resource configuration and pod customization
- GroupRef support for organizational grouping
- Comprehensive status conditions and phase tracking

Components:
- MCPEmbedding CRD with validation and webhook support
- Controller for managing deployment lifecycle
- Generated CRD manifests and Helm chart templates
- RBAC permissions for managing embeddings
- Example configurations for various use cases

This change is based on the original commit by rebasing onto
jerm/2026-01-13-optimizer-in-vmcp to remove intermediate commits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ptelang ptelang force-pushed the add-embedding-engine branch 2 times, most recently from 97bc58c to 8d13462 Compare January 16, 2026 18:04
@ptelang ptelang force-pushed the add-embedding-engine branch from 8d13462 to 6203f70 Compare January 16, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.