Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
174be49
configuration implementation (after rebase...)
shmuelk Jun 15, 2025
84cde29
Moved plugin registry back to pkg/epp/plugins
shmuelk Jun 15, 2025
a1494f3
Removed unneeded 'forced imports' of scorers
shmuelk Jun 15, 2025
4b2e8ba
Changed 'profilepicker' to 'profilehandler' in new and old code
shmuelk Jun 15, 2025
3153f57
Pass the configured SchedulingProfiles to LoadSchedulerConfig
shmuelk Jun 15, 2025
3fb0945
Ensure that both the configText and configFile flags are not specified
shmuelk Jun 15, 2025
07406b0
Load RequestControl plugins from the configuration
shmuelk Jun 15, 2025
d11f45b
Register all plugin factories
shmuelk Jun 15, 2025
eb94355
Review fixes
shmuelk Jun 15, 2025
a9bd54d
Reverted unneeded change
shmuelk Jun 16, 2025
bce595b
Updates from review comments
shmuelk Jun 16, 2025
70b9b74
Added a stub interface for plugins to get data from the EPP
shmuelk Jun 16, 2025
75e8fb6
Added a temporary implementation of plugins.Handle
shmuelk Jun 16, 2025
8792e58
Added pluginName and plugins.Handle to plugin factory interface
shmuelk Jun 16, 2025
f44163b
Updated plugin factory signatures to reflect new API
shmuelk Jun 16, 2025
32b53fd
Updated plugin instantiation to reflect new API
shmuelk Jun 16, 2025
6bf9bd7
Updated plugin instantiation to reflect new API
shmuelk Jun 16, 2025
b8e0b4d
Updated tests to reflect new API
shmuelk Jun 16, 2025
58cd2fa
Do not rename the imported package
shmuelk Jun 16, 2025
b91488d
Only upper layer of code should log errors
shmuelk Jun 16, 2025
2bcd718
Only pass what is needed to instantiate the plugins
shmuelk Jun 16, 2025
b4980ef
Review updates
shmuelk Jun 16, 2025
b7443b2
Review update
shmuelk Jun 16, 2025
3eb408f
Review update. Make more clear that the code only checks for already …
shmuelk Jun 16, 2025
0ac2718
fixed e2e doc in makefile (does not require GPUs) (#976)
nirrozenbaum Jun 15, 2025
8ebec80
API: Adds 5xx Status Code for Invalid ExtRef (#991)
danehans Jun 16, 2025
2072d23
feat(conformance): Add test for invalid EPP service reference (#959)
SinaChavoshi Jun 16, 2025
511a5c2
moved the creation of the context to main.go. (#995)
nirrozenbaum Jun 16, 2025
6386cd6
fix dead links (#989)
caozhuozi Jun 16, 2025
d108223
feat: add health check for epp cluster (#966)
zhengkezhou1 Jun 17, 2025
17d6993
Merge branch 'main' into config-loading
shmuelk Jun 17, 2025
642b9e5
Server unit test and utility to help with such tests (#820)
irar2 Jun 17, 2025
9dbb2d5
Update dynamic-lora-sidecar to expose metrics to track loaded adapter…
shotarok Jun 17, 2025
3d37de1
refactor: Replace prefix cache structure with golang-lru (#928)
kfirtoledo Jun 17, 2025
189c5fe
feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test (#834)
SinaChavoshi Jun 17, 2025
a15557e
configuration implementation (after rebase...)
shmuelk Jun 15, 2025
e1fc8c9
After review, made code more obvious
shmuelk Jun 18, 2025
c5f78f9
Merge branch 'main' into config-loading
shmuelk Jun 18, 2025
ba530cf
Fixed merge issues
shmuelk Jun 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,12 @@ resources:
kind: InferenceModel
path: sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
domain: x-k8s.io
group: inference
kind: EndpointPickerConfig
path: sigs.k8s.io/gateway-api-inference-extension/api/config/v1alpha1
version: v1alpha1
version: "3"
29 changes: 29 additions & 0 deletions api/config/v1alpha1/defaults.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

// SetDefaults_EndpointPickerConfig sets default values in a
// EndpointPickerConfig struct.
//
// This naming convension is required by the defalter-gen code.
func SetDefaults_EndpointPickerConfig(cfg *EndpointPickerConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use this function to default the parameters of the various in-tree plugins because defaulting is associated with the api version. I believe this function will be automatically invoked the file is decoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this "extension point" is used to set default values in the CR in question. Look at the examples you pointed me to. I don't think it should be used to specify default values for parameters of plugins. The plugins should handle that themselves.

This could be used, if we wanted to create default configurations, albeit in a round about way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The in-tree plugins config types should be part of the versioned api, their defaulting should be as well.

See as an example: https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/apis/config/v1/defaults.go

We can handle that as a followup though to keep this PR focused on defining the api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is this an autogenerated function name?
Asking since it's not standard.

Copy link
Contributor Author

@shmuelk shmuelk Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function name is required by the generated code. See api/config/v1alpha1/zz_generated_defaulys.go

I agree its not a cool name

for idx, pluginConfig := range cfg.Plugins {
if pluginConfig.Name == "" {
cfg.Plugins[idx].Name = pluginConfig.PluginName
}
}
}
22 changes: 22 additions & 0 deletions api/config/v1alpha1/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/*
Copyright 2025 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package v1alpha1 contains API Schema definitions for the
// inference.networking.x-k8s.io API group.
//
// +kubebuilder:object:generate=true
// +groupName=inference.networking.x-k8s.io
package v1alpha1
92 changes: 92 additions & 0 deletions api/config/v1alpha1/endpointpickerconfig_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
Copyright 2025 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
"encoding/json"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// +k8s:defaulter-gen=true
// +kubebuilder:object:root=true

// EndpointPickerConfig is the Schema for the endpointpickerconfigs API
type EndpointPickerConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the general plugin registration instead of being explicit and have filters, scorers and picker entries under a profile. generally speaking, I think the explicit approach will solve a lot of issues (I raised some of them in other comments, like how to configure a plugin that implements multiple extension points to use only some of the extension points). I understand current approach may minimize the file, but it comes with the price of readability. I'm reading the scheduling config and I can't understand which plugins are going to be used as filters, scorers and picker. when looking at the tradeoff being explicit here makes more sense to me.

leaving that aside and commenting on the existing approach, I was expecting to see the structure here a bit different (and the configuration file structure as well).
to be more concrete, I expect a section per extensible layer in the configuration file.
as of today we have extension points in scheduling and one extension point in requestcontrol (PostResponse).
so something like:

plugins:
- name: lowQueue
  pluginName: low-queue
  parameters:
    threshold: 10
- name: prefixCache
  pluginName: prefix-cache
  parameters:
    hashBlockSize: 32
- name: maxScore
  pluginName: max-score
- name: singleProfileHandler
  pluginName: single-profile
requestcontrol:
  postResponse:
  - pluginRef: prefixCache
scheduling:
  profileHandler: singleProfileHandler
  profiles:
  - name: default
    plugins:
    - pluginRef: lowQueue
    - pluginRef: prefixCache
      weight: 50
    - pluginRef: maxScore

and then I expect to see an appropriate structure here.
pay attention that in the above example I defined prefixCache plugin, which is an instance of prefix-cache, and the same instance is used in two different layers (requestcontrol and scheduling).

additionally, in current example config file and in this struct I don't see profileHandler as a first level configurable plugin in the scheduling layer, which IMO is missing. when looking on the Scheduler Config we can see that the above is the required structure.

clarification - I do NOT expect to see all layers atm. I expect to see here only the layers that have extension points today which is requestcontrol and scheduling.
all other layers can be added as we make progress and more extension points are defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to answer several questions:

  1. How much does the person who is configuring an instance of the GIE have to know? In particular does he/she need to know specifically which extension points in various layers a plugin implements?
  2. How much do we want the GIE to just simply do the "right" thing?
  3. How verbose do we want the config to be? Granted that this is probably a less important consideration.

This subject has been discussed before, either here in the PR or on the proposal.

Copy link
Contributor

@nirrozenbaum nirrozenbaum Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think there’s a ”right” thing.
as stated in the example above, if I’m running performance tests I may want to run once with my plugin as filter+scorer and another run as scorer only. what’s the right thing here?

IMO it’s expected from whoever configure GIE to understand what he is configuring. if he doesn’t, he should read the documentation and learn.
sounds to me like a basic thing.

I was deeply involved in scheduler and I can’t understand from this file what’s the setup of the scheduler, so how can we expect someone who never touched the scheduler code to understand which plugins are going to be configured?

would be happy to hear more thoughts about this - @kfswain @elevran @ahg-g

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shared my view at #839 (comment)

In summary:

  1. I prefer to not breakdown the config by extensions or layers, those are implementation details of the plugins
  2. Treat plugins as features, and so the user is enabling/disabling a feature, not a specific extension.
  3. We should set a principle that a plugin implements one feature; and so a plugin should not be implementing a filter and a score if they are two distinct features, they should be implemented as two plugins.

In the above, I am trying to shift the complexity to the plugin developer and simplify it for the plugin admin/user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmuelk I recommend to document the decision that we will reach in the doc, and also publish the doc as a proposal under https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals, perhaps under epp-config/.

Copy link
Contributor

@nirrozenbaum nirrozenbaum Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard disagree here.
the structure that is used in this PR is problematic in my view from several reasons.
it uses all plugin instances under a scheduling profile, while this is obviously not the case.
for example profile handler is NOT part of any profile.
another example- if I want to implement response guardrails as a PostResponse plugin, where does it belong here? it’s not part of any scheduling profile. there are more examples of course.

hard disagree also on the point that plugin should implement a single extension.
if we would want to have a single extension allowed for a plugin, the plugin interface should have looked different to include “Type” (which extension point it implements) and a “Run” function (that runs the single extension point). obviously this is also not the case here, as we already have use cases for plugins that should implement multiple extensions (e.g., all plugins that have some cache/state, like prefix, session, etc).

and I wasn’t even mentioning the readability point. IMO when reading a configuration file one should be able to understand what is the configuration that is going to be used.

this looks to me as if we’re trying to get inspiration from the kube-scheduler and make use of the same principles, although some of them don’t represent GIE layers.

an inspiration example is good until the point it doesn’t.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I prefer to not breakdown the config by extensions or layers, those are implementation details of the plugins

I'm okay with that. User should be reading documentation anyhow to know what a plugin does.
Splitting configuration into layers could help in catching errors (e.g., addition of Scorer plugin under Filters section) which would otherwise be allowed. I'm not sure this is sufficiently important to warrant the split, esp. for Plugins that could span layers.

  1. Treat plugins as features, and so the user is enabling/disabling a feature, not a specific extension.

+1. This means that Plugins are defined in their own section, not under scheduling specific sections, right?

  1. We should set a principle that a plugin implements one feature; and so a plugin should not be implementing a filter and a score if they are two distinct features, they should be implemented as two plugins.

If this refers only to the scheduling layer - this is fine. However, this would be an issue for plugins that span subsystems (e.g., prefix/kv-cache implement a scheduling interface as well as a post response handler).

Copy link
Contributor

@ahg-g ahg-g Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it uses all plugin instances under a scheduling profile, while this is obviously not the case.
for example profile handler is NOT part of any profile.
another example- if I want to implement response guardrails as a PostResponse plugin, where does it belong here? it’s not part of any scheduling profile. there are more examples of course.

Scheduling profiles are a special case that we need to clearly document what plugins can be listed there. We can also add validation

hard disagree also on the point that plugin should implement a single extension.

I didn't say a single extension, I said a single feature. A feature can be composed of more than one extension, in which case disabling an extension breaks the feature (e.g., prefix cache can't function without both the post-response and the score extensions).

To give an example where I think breaking things into two features would be recommended: In kube-scheduler, we have preferred affinity and required affinity, they are both affinity features, but from user perspective they can be enabled/disable independently, so I would not implement them under as a single plugin, but as two.

and I wasn’t even mentioning the readability point. IMO when reading a configuration file one should be able to understand what is the configuration that is going to be used.

I view this as a configuration of features, not extensions. Having to explicitly configure each extension point a feature requires is not good UX and error prone. What if I forget to configure the postResponse for the prefix cache feature? what if in a future release I add another optimization to prefix-cache that required implementing another extension point?

If this refers only to the scheduling layer - this is fine. However, this would be an issue for plugins that span subsystems (e.g., prefix/kv-cache implement a scheduling interface as well as a post response handler).

See response above, it is not related to a specific layer, more about the idea that a plugin should not implement more than one user facing feature that could be disable/enabled independently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view this as a configuration of features, not extensions. Having to explicitly configure each extension point a feature requires is not good UX and error prone.

I agree with this in spirit. But I worry that 'defining a feature' defines too many opinions. If we want to include in-tree plugins & have the feature approach, that means that we have to have in-tree plugins implement exactly 1 extension point. As we do not know if a user will want to make different decisions than the feature creator did.

But, I do understand the idea of setting a minimum atomic unit: i.e. if you implement a Scorer plugin that relies on a data layer extension to work, full stop. Then yes, I absolutely could see those being considered an atomic unit.

So perhaps it's a matter of best practices, plugin development should only implement what is strictly necessary. So even if there is a KV-cache util scorer, & a KV-cache util filter. They should not be implemented together, b/c a user does not necessarily want to implement them both. However that seems very easy to footgun.

although some of them don’t represent GIE layers.

++ I think with the existence of multiple profile runs per request (some of which might just be dark-launches; aka doesnt actually impact scheduling decisions). We may want to make sure we aren't relying principles for a different set of requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although some of them don’t represent GIE layers.
We may want to make sure we aren't relying principles for a different set of requirements.

None of the three points I mentioned above referenced kube-scheduler or argued that we should copy exactly what we did there. But I understand this impression since I referenced kube-scheduler many times throughout the whole process of designing extensibility.

Speaking of requirements, I am arguing that the primary persona for the config api is the platform admin, and my thinking is that platform admins think of features, not implementation details of the component (EPP in this case).

The idea is to shift defining what a feature is to the feature developer since they understand its dependencies the best, rather than pushing that to the platform admin.

metav1.TypeMeta `json:",inline"`

// +required
// +kubebuilder:validation:Required
// Plugins is the list of plugins that will be instantiated.
Plugins []PluginSpec `json:"plugins"`

// +required
// +kubebuilder:validation:Required
// SchedulingProfiles is the list of named SchedulingProfiles
// that will be created.
SchedulingProfiles []SchedulingProfile `json:"schedulingProfiles"`
}

// PluginSpec contains the information that describes a plugin that
// will be instantiated.
type PluginSpec struct {
// +optional
// Name provides a name for plugin entries to reference. If
// omitted, the value of the PluginName field will be used.
Name string `json:"name"`

// +required
// +kubebuilder:validation:Required
// PluginName specifies the plugin to be instantiated.
PluginName string `json:"pluginName"`
Copy link
Contributor

@ahg-g ahg-g Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another follow up is to probably change this to "type" @elevran


// +optional
// Parameters are the set of parameters to be passed to the plugin's
// factory function. The factory function is responsible
// to parse the parameters.
Parameters json.RawMessage `json:"parameters"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in kube-scheduler, we used runtime.Object, probably allows us to use k8s api-machinery functions for decoding https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/apis/config/types.go#L207

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code in the K8S scheduler, runtime.Object is simply cast to the appropriate type used by the plugin. Those types seem to include a standard K8S TypeMeta stanza. You want every plugin's parameters object to start with apiVersion and kind? You were trying to same extra name tags, isn't here even more acute?

Also if we go with runtime.Object, every plugin with parameters would need to register it's schema in the global schema for the config...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a secondary schema that this not used/known to the k8s client and only used in configuration?

I do think that the use of Object makes sense as it would natively separate the plugin type from its name, which is currently unclear, IMO. This is a side note - I'll open an issue to discuss to avoid sidetracking the feedback.

}

// SchedulingProfile contains the information to create a SchedulingProfile
// entry to be used by the scheduler.
type SchedulingProfile struct {
// +kubebuilder:validation:Required
// Name specifies the name of this SchedulingProfile
Name string `json:"name"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we verify name uniqueness somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. In the function validateConfiguration in pkg/epp/common/config/configloader.go


// +required
// +kubebuilder:validation:Required
// Plugins is the list of plugins for this SchedulingProfile. They are assigned
// to the appropriate "slots" based on their type.
Plugins []SchedulingPlugin `json:"plugins"`
}

// SchedulingPlugin describes a plugin that will be associated with a
// SchedulingProfile entry.
type SchedulingPlugin struct {
// +required
// +kubebuilder:validation:Required
// PluginRef specifies a partiular Plugin instance to be associated with
// this SchedulingProfile. The reference is to the name of an
// entry of the Plugins defined in the configuration's Plugins
// section
PluginRef string `json:"pluginRef"`

// +optional
// Weight is the weight fo be used if this plugin is a Scorer.
Weight *int `json:"weight"`
}
126 changes: 126 additions & 0 deletions api/config/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 38 additions & 0 deletions api/config/v1alpha1/zz_generated.defaults.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

69 changes: 69 additions & 0 deletions api/config/v1alpha1/zz_generated.register.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading