Enable pluggable datalayer as experimental feature #1391

elevran · 2025-08-17T15:33:22Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This is the last of a series of PRs addressing requirements for Pluggable Data Layer (#703).
It enables a use of a new experimental data layer. The new data layer is enabled when ENABLE_EXPERIMENTAL_DATALAYER_V2 is set true.

Related PR's: #1351, #1237, #1195, #1154.

Which issue(s) this PR fixes:
Fixes #703

Does this PR introduce a user-facing change?:

Enable an optional and experimental data layer for collecting endpoint information from pluggable data sources.
This is experimental and should not be used in *production*. To enable, set the environment variable `ENABLE_EXPERIMENTAL_DATALAYER_V2` to `true`.

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

netlify · 2025-08-17T15:33:28Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`48347d4`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68a5d5932a9e170008a013e9
😎 Deploy Preview	https://deploy-preview-1391--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

pkg/epp/datalayer/collector.go

pkg/epp/datalayer/enabled.go

pkg/epp/datalayer/metrics/client.go

pkg/epp/datastore/datastore.go

pkg/epp/datalayer/factory.go

pkg/epp/datalayer/enabled.go

pkg/epp/datalayer/factory.go

pkg/epp/datalayer/metrics/logger.go

cmd/epp/runner/runner.go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

pkg/epp/datalayer/collector.go

pkg/epp/datalayer/datasource.go

pkg/epp/datalayer/factory.go

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

kfswain · 2025-08-20T19:46:08Z

cmd/epp/runner/runner.go

+const (
+	// enableExperimentalDatalayerV2 defines the environment variable
+	// used as feature flag for the pluggable data layer.
+	enableExperimentalDatalayerV2 = "ENABLE_EXPERIMENTAL_DATALAYER_V2"


Doesnt have to be done in this PR, but I'm wondering if we extend:

gateway-api-inference-extension/apix/config/v1alpha1/endpointpickerconfig_types.go

Line 29 in 0eaf592

type EndpointPickerConfig struct {

To include a field for features to be enabled, we will soon have: Flow Control, SLO prediction, & now the pluggable data layer as experimental, opt-in features. So rather than having various env vars, we keep all feature gating in one place

the intention is to have everything configured through the config file eventually.
do you suggest adding a free form of key value pairs as variables (kinda similar to env vars) for the transitionary state when a feature is experimental?
I was proposing this change recently, which I think aligns with your intention - #1288

Maybe not everything but would be good to discuss the cutline. Def out of scope for this PR

right. maybe not everything.
I meant the extension points of those parts :)
anyway, the parameters section the was suggested in #1288 can be leveraged as an solution for experimental features, so instead of having env vars we can just configure a parameter - e.g.,

- name: enable-experimental-datalayer value: true

Yes, I definitely agree there

And we doc ^ features similar to K8s feature gates. We should also drop any "experimental" references and simply call them by feature name under featureGates. The associated docs will detail feature level, e.g., Alpha, and a short description of the feature. For example:

apiVersion: inference.networking.x-k8s.io/v1alpha1 kind: EndpointPickerConfig featureGates: dataLayer: true # Defaults to false flowControl: true # Defaults to false until the feature graduates to beta ...

cc @shmuelk another useful use case for adding parameters section (key value pairs) to the config API.
to answer the above, we should have those parameters consumable not necessarily though the plugins but also by runner.go (as an example) to enable/disable features. generally speaking it could replace the usage of env vars (e.g., today we have env vars in saturation detector).

I like @danehans suggestion of a featureGates section. From a config point of view it is simply a map[string]bool.

The code that consumes the feature gates should validate the set and apply the values.

Overloading a shared parameters with so-called well known parameter names is bad idea and will lead to conflicts and confusion.

kfswain · 2025-08-20T19:59:57Z

This LGTM, and just brought this up to @elevran via DM but wanted to bubble it up here, do we want to include this in v1.0 or , since its experimental should it wait for a fast follow of v1.1? We have a few experimental features (that i mention in another comment) that are also going to be landing soon.

I prefer to wait but I don't hold that opinion strongly. Holding so this can be discussed/ack'd

/lgtm
/approve
/hold

nirrozenbaum · 2025-08-20T20:52:01Z

finally finished reviewing this PR (had some limited time in the last couple of days).

/lgtm
/approve

This LGTM, and just brought this up to @elevran via DM but wanted to bubble it up here, do we want to include this in v1.0 or , since its experimental should it wait for a fast follow of v1.1? We have a few experimental features (that i mention in another comment) that are also going to be landing soon.

I prefer to wait but I don't hold that opinion strongly. Holding so this can be discussed/ack'd

I understand the concern. IMO the risk here is low since most of the changes are in datalayer package and the only changes that are in the bootstrap path are minor updates for being able to select between the old metrics collection or the new data layer. by default the old one should be used.
this will allow us in llm-d to get a feeling of this feature and get it more robust for the next release.
so +1 from me for getting this in this version.

k8s-ci-robot · 2025-08-20T20:52:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elevran, kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain,nirrozenbaum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfswain · 2025-08-20T21:14:26Z

SGTM
/unhold

elevran added 2 commits August 17, 2025 18:21

enable global metrics logging

3bcb68a

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

enable v2 data layer

b986fef

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 17, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 17, 2025

k8s-ci-robot requested review from liu-cong and robscott August 17, 2025 15:33

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 17, 2025