Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
304 changes: 304 additions & 0 deletions apix/v1/inferencepool_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// InferencePool is the Schema for the InferencePools API.
//
// +kubebuilder:object:root=true
// TODO: change the annotation once it gets officially approved
// +kubebuilder:metadata:annotations="api-approved.kubernetes.io=https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1173"
// +kubebuilder:resource:shortName=infpool
// +kubebuilder:subresource:status
// +kubebuilder:storageversion
// +genclient
type InferencePool struct {
metav1.TypeMeta `json:",inline"`

// +optional
metav1.ObjectMeta `json:"metadata,omitempty"`

// Spec defines the desired state of the InferencePool.
//
// +required
Spec InferencePoolSpec `json:"spec,omitzero"`

// Status defines the observed state of the InferencePool.
//
// +optional
//nolint:kubeapilinter // status should not be a pointer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It's the same as any other struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We followed the existing k8s convention. See https://github.com/kubernetes/api/blob/master/batch/v1/types.go#L84

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed I agree that Status should be a pointer type, but we could not find other K8s APIs that follow this convention. We prefer to follow established patterns.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory every optional struct-field should be a pointer. In practice, we do not do that (in part because we have had no linter).

I am OK with this because it is "not worse than" everything else in the system. We should refine the rules (the doc which emerged from this discussion)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joel, are you OK to resolve this one?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should preserve the right to the projects to decide to switch to the new conventions meanwhile they adhere to the long established patterns that we know are safe

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with adding an exception, but can we please use the more specific golangci-lint exceptions in the config file that ignore specific messages rather than blanket ignoring the whole KAL suite for this field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed can you share a pointer for how we could do that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for getting to this so late, but, an example would be how CAPI has done so, https://github.com/kubernetes-sigs/cluster-api/blob/0db4bf4fd9ec5cea39d1853fd4b2e877788b46df/.golangci-kal.yml#L86-L91

It's directly in the golangci config with a path regex and text regex to match on.

Status InferencePoolStatus `json:"status,omitempty"`
}

// InferencePoolList contains a list of InferencePools.
//
// +kubebuilder:object:root=true
type InferencePoolList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []InferencePool `json:"items"`
}

// InferencePoolSpec defines the desired state of the InferencePool.
type InferencePoolSpec struct {
// Selector determines which Pods are members of this inference pool.
// It matches Pods by their labels only within the same namespace; cross-namespace
// selection is not supported.
//
// The structure of this LabelSelector is intentionally simple to be compatible
// with Kubernetes Service selectors, as some implementations may translate
// this configuration into a Service resource.
//
// +required
Selector LabelSelector `json:"selector,omitzero"`

// TargetPorts defines a list of ports that are exposed by this InferencePool.
// Currently, the list may only include a single port definition.
//
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=1
// +listType=atomic
// +required
TargetPorts []Port `json:"targetPorts,omitempty"`

// EndpointPickerRef is a reference to the Endpoint Picker extension and its
// associated configuration.
//
// +required
EndpointPickerRef EndpointPickerRef `json:"endpointPickerRef,omitzero"`
}

// Port defines the network port that will be exposed by this InferencePool.
type Port struct {
// Number defines the port number to access the selected model server Pods.
// The number must be in the range 1 to 65535.
//
// +required
Number PortNumber `json:"number,omitempty"`
}

// EndpointPickerRef specifies a reference to an Endpoint Picker extension and its
// associated configuration.
type EndpointPickerRef struct {
// Group is the group of the referent API object. When unspecified, the default value
// is "", representing the Core API group.
//
// +optional
// +kubebuilder:default=""
Group *Group `json:"group,omitempty"`

// Kind is the Kubernetes resource kind of the referent.
//
// Required if the referent is ambiguous, e.g. service with multiple ports.
//
// Defaults to "Service" when not specified.
//
// ExternalName services can refer to CNAME DNS records that may live
// outside of the cluster and as such are difficult to reason about in
// terms of conformance. They also may not be safe to forward to (see
// CVE-2021-25740 for more information). Implementations MUST NOT
// support ExternalName Services.
//
// +optional
// +kubebuilder:default=Service
Kind Kind `json:"kind,omitempty"`

// Name is the name of the referent API object.
//
// +required
Name ObjectName `json:"name,omitempty"`

// PortNumber is the port number of the Endpoint Picker extension service. When unspecified,
// implementations SHOULD infer a default value of 9002 when the kind field is "Service" or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rob, we discussed this, but I forget what we came to. Do we have good precedent for "should infer" as opposed to just setting a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the default value of 9002 only applies when the kind filed is a service or unspecified (defaults to "Service").

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that telling clients to infer a value is generally not what we do. We either give it an explicit default or we require the user to set it.

You can bundle an MAP which looks at Group and Kind, and assigns 9002, but the validation for it would be "required". Or you can assign a static default of 9002 and leave it optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @robscott for the context of inferring here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'd forgotten about this one! Fundamentally we're trying to leave room open for a type of backend where port defined at this level would either be meaningless or repetitive. IE imagine that we want to point to a different kind of backend that only ever has one port, and that port is inferred by the type. Why would we require port here?

We could follow the model that Gateway API uses here which would be to require port in cases where it's ambiguous, such as multi-port Services. With that said, unlike Gateway API, we already know that the most common case will be a Service listening on port 9002, so it seems like a shame to require that input from users in the common case.

Unfortunately "only require a field when the target has multiple ports, and one of them is not 9002" is not something that I can find a good way to express with any of the validations we have available to us today.

You can bundle an MAP which looks at Group and Kind, and assigns 9002
I think that makes a lot of sense as MAP becomes more broadly available, but it will still be in alpha when this release happens. We can file a follow up issue to bundle a MAP when it reaches beta.

So that's a long way of saying that I think the current state is the least bad way to achieve the following combination:

  1. Sensible defaults that cover the common case and avoid unnecessary inputs - this default can be translated from controller-inferred to MAP in the future as it's available
  2. Room for targeting other backend types where port would not be meaningful
  3. Consistency with existing APIs (alpha version of this API + Gateway API BackendRef)

Copy link
Member

@robscott robscott Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen no other APIs do this

Obviously very biased here, but IMO this same pattern has worked reasonably well for Gateway API. Some examples of where this has been valuable:

  1. GKE, Envoy Gateway, and others support targeting ServiceImport from Routes to effectively provide "Multi-Cluster Gateways"
  2. AgentGateway uses Gateway API Routes and targets a custom MCP Backend type
  3. Gateway for Mesh is based on a creative use of parentRefs on Routes. See Istio's docs for an example, but this pattern has become widespread across meshes

Carrying this a fuller normalization, why is it not even more abstract, like

type EndpointPicker struct {
    // One of the following must be specified

    // A reference to a kubernetes Service
    Service *ServiceRef

    // An arbitrary thing on the network
    Network *NetworkRef

    // A reference to an arbitrary kubernetes object
    Object *ObjectRef

Doesn't the presence of ObjectRef in that example invalidate the idea of ServiceRef? It would provide more than one way to point to a Service which feels suboptimal. If on the other hand we exclude ObjectRef, we're missing out on some very real opportunity for innovation and extension from the implementations of this API. Honestly I think one of the strongest parts of Gateway API is that there's a well defined core with plenty of extension points that allow you to plug in your own custom type of Route or Backend while still retaining the rest of the API. I'd hate to give that up here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some examples of where this has been valuable

What is "this". To me "this" is defining an implcit default value in the documentation, and imposing that work on clients and/or implementations.

Doesn't the presence of ObjectRef in that example invalidate the idea of ServiceRef?

No - a ServiceRef can a) be simpler and b) carry more assumptions. You don't need group or kind, and you can safely assume that "port" is meaningful. Or you can just assert a fixed port number (maybe?). You can, of course, represent a Service as an ObjectRef, but the API is less precise and more generic.

I'm not saying to necessarily do exactly this, I am saying that an implicit, conditional default is more than a little bit smelly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline with @thockin and we agreed that although this is not perfect, mirroring Gateway API behavior here is a reasonable path. That means two things:

  1. We should remove the inferred default of 9002
  2. We should require port to be specified when the target kind is Service (this will require some CEL validation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change needed is included in #1484

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated PR to reflect the change. Let me know whether it works now or not.

// unspecified (defaults to "Service").
//
// +optional
//nolint:kubeapilinter // ignore kubeapilinter here as we want to use pointer as zero means all ports in convention, we don't make to use 0 to indicate not set.
PortNumber *PortNumber `json:"portNumber,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "PortNumber" vs "Port" ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need protocol here eventually? Did we discuss this one?

Why not use the same Port struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott Do you have any context about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think the ideal would be to mirror BackendObjectReference in GW API where we have a port field with the same validation instead of portNumber. I think this started as portNumber to avoid any confusion over portName, but given that we've used port successfully for years in GW API without any issues or need for struct/other fields, I'd expect that to work well here as well.

I'm a bit split on whether or not we should make the change from portNumber to port at this late stage. Will defer to @danehans, @kfswain, and @nirrozenbaum.

Copy link
Contributor

@nirrozenbaum nirrozenbaum Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m fine with this change. portNumber was introduced only in v1 (different field exists in v1alpha2) and as long as the api has not been finalized, that’s ok to change.
we do need to keep in mind that v1 rc1 was already cut, but if we feel strongly that this is an improvement let’s go for it.

Copy link
Contributor Author

@capri-xiyue capri-xiyue Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR of approach 3 to make two different ports symmetric #1484

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "PortNumber" vs "Port" ?

To reduce the nesting within EndpointPickerRef. To your point, if named ports and app protocol are anticipated future requirements, then +1 to use type Port instead of PortNumber. Currently, this spec defines the Gateway <> EPP protocol as ext-proc (gRPC), so I'm unsure if any additional app protocols are needed. Thoughts @robscott @kfswain @nirrozenbaum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated PR to reflect the change. Let me know whether it works now or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, +1. I think further app protocols may be useful at some point in the future, but it's not necessary now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @capri-xiyue, this has been addressed by #1484.


// FailureMode configures how the parent handles the case when the Endpoint Picker extension
// is non-responsive. When unspecified, defaults to "FailClose".
//
// +optional
// +kubebuilder:default="FailClose"
FailureMode EndpointPickerFailureMode `json:"failureMode,omitempty"`
}

// EndpointPickerFailureMode defines the options for how the parent handles the case when the
// Endpoint Picker extension is non-responsive.
//
// +kubebuilder:validation:Enum=FailOpen;FailClose
type EndpointPickerFailureMode string

const (
// EndpointPickerFailOpen specifies that the parent should forward the request to an endpoint
// of its picking when the Endpoint Picker extension fails.
EndpointPickerFailOpen EndpointPickerFailureMode = "FailOpen"
// EndpointPickerFailClose specifies that the parent should drop the request when the Endpoint
// Picker extension fails.
EndpointPickerFailClose EndpointPickerFailureMode = "FailClose"
)

// InferencePoolStatus defines the observed state of the InferencePool.
type InferencePoolStatus struct {
// Parents is a list of parent resources, typically Gateways, that are associated with
// the InferencePool, and the status of the InferencePool with respect to each parent.
//
// A controller that manages the InferencePool, must add an entry for each parent it manages
// and remove the parent entry when the controller no longer considers the InferencePool to
// be associated with that parent.
//
// A maximum of 32 parents will be represented in this list. When the list is empty,
// it indicates that the InferencePool is not associated with any parents.
//
// +kubebuilder:validation:MaxItems=32
// +optional
// +listType=atomic
Parents []ParentStatus `json:"parents,omitempty"`
}

// ParentStatus defines the observed state of InferencePool from a Parent, i.e. Gateway.
type ParentStatus struct {
// Conditions is a list of status conditions that provide information about the observed
// state of the InferencePool. This field is required to be set by the controller that
// manages the InferencePool.
//
// Supported condition types are:
//
// * "Accepted"
// * "ResolvedRefs"
//
// +optional
// +listType=map
// +listMapKey=type
// +kubebuilder:validation:MaxItems=8
Conditions []metav1.Condition `json:"conditions,omitempty"`

// ParentRef is used to identify the parent resource that this status
// is associated with. It is used to match the InferencePool with the parent
// resource, such as a Gateway.
//
// +required
ParentRef ParentReference `json:"parentRef,omitzero"`
}

// InferencePoolConditionType is a type of status condition for the InferencePool.
type InferencePoolConditionType string

// InferencePoolReason is the reason for a type of InferencePool status condition.
type InferencePoolReason string

const (
// InferencePoolConditionAccepted is a type of condition that indicates whether
// the InferencePool has been accepted or rejected by a Parent, and why.
//
// Possible reasons for this condition to be True are:
//
// * "SupportedByParent"
//
// Possible reasons for this condition to be False are:
//
// * "Accepted"
// * "HTTPRouteNotAccepted"
//
// Possible reasons for this condition to be Unknown are:
//
// * "Pending"
//
// Controllers MAY raise this condition with other reasons, but should
// prefer to use the reasons listed above to improve interoperability.
InferencePoolConditionAccepted InferencePoolConditionType = "Accepted"

// InferencePoolReasonAccepted is a reason used with the "Accepted" condition
// when the InferencePool is accepted by a Parent because the Parent supports
// InferencePool as a backend.
InferencePoolReasonAccepted InferencePoolReason = "Accepted"

// InferencePoolReasonNotSupportedByParent is a reason used with the "Accepted"
// condition when the InferencePool has not been accepted by a Parent because
// the Parent does not support InferencePool as a backend.
InferencePoolReasonNotSupportedByParent InferencePoolReason = "NotSupportedByParent"

// InferencePoolReasonHTTPRouteNotAccepted is an optional reason used with the
// "Accepted" condition when the InferencePool is referenced by an HTTPRoute that
// has been rejected by the Parent. The user should inspect the status of the
// referring HTTPRoute for the specific reason.
InferencePoolReasonHTTPRouteNotAccepted InferencePoolReason = "HTTPRouteNotAccepted"
)

const (
// InferencePoolConditionResolvedRefs is a type of condition that indicates whether
// the controller was able to resolve all the object references for the InferencePool.
//
// Possible reasons for this condition to be True are:
//
// * "ResolvedRefs"
//
// Possible reasons for this condition to be False are:
//
// * "InvalidExtensionRef"
//
// Controllers MAY raise this condition with other reasons, but should
// prefer to use the reasons listed above to improve interoperability.
InferencePoolConditionResolvedRefs InferencePoolConditionType = "ResolvedRefs"

// InferencePoolReasonResolvedRefs is a reason used with the "ResolvedRefs"
// condition when the condition is true.
InferencePoolReasonResolvedRefs InferencePoolReason = "ResolvedRefs"

// InferencePoolReasonInvalidExtensionRef is a reason used with the "ResolvedRefs"
// condition when the Extension is invalid in some way. This can include an
// unsupported kind or API group, or a reference to a resource that cannot be found.
InferencePoolReasonInvalidExtensionRef InferencePoolReason = "InvalidExtensionRef"
)

// ParentReference identifies an API object. It is used to associate the InferencePool with a
// parent resource, such as a Gateway.
type ParentReference struct {
// Group is the group of the referent API object. When unspecified, the referent is assumed
// to be in the "gateway.networking.k8s.io" API group.
//
// +optional
// +kubebuilder:default="gateway.networking.k8s.io"
Group *Group `json:"group,omitempty"`

// Kind is the kind of the referent API object. When unspecified, the referent is assumed
// to be a "Gateway" kind.
//
// +optional
// +kubebuilder:default=Gateway
Kind Kind `json:"kind,omitempty"`

// Name is the name of the referent API object.
//
// +required
Name ObjectName `json:"name,omitempty"`

// Namespace is the namespace of the referenced object. When unspecified, the local
// namespace is inferred.
//
// Note that when a namespace different than the local namespace is specified,
// a ReferenceGrant object is required in the referent namespace to allow that
// namespace's owner to accept the reference. See the ReferenceGrant
// documentation for details: https://gateway-api.sigs.k8s.io/api-types/referencegrant/
//
// +optional
Namespace Namespace `json:"namespace,omitempty"`
}
Loading