Skip to content

Conversation

elevran
Copy link
Contributor

@elevran elevran commented Aug 17, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
This is the last of a series of PRs addressing requirements for Pluggable Data Layer (#703).
It enables a use of a new experimental data layer. The new data layer is enabled when ENABLE_EXPERIMENTAL_DATALAYER_V2 is set true.

Related PR's: #1351, #1237, #1195, #1154.

Which issue(s) this PR fixes:
Fixes #703

Does this PR introduce a user-facing change?:

Enable an optional and experimental data layer for collecting endpoint information from pluggable data sources.
This is experimental and should not be used in *production*. To enable, set the environment variable `ENABLE_EXPERIMENTAL_DATALAYER_V2` to `true`.

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 17, 2025
Copy link

netlify bot commented Aug 17, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 48347d4
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68a5d5932a9e170008a013e9
😎 Deploy Preview https://deploy-preview-1391--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 17, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 17, 2025
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
const (
// enableExperimentalDatalayerV2 defines the environment variable
// used as feature flag for the pluggable data layer.
enableExperimentalDatalayerV2 = "ENABLE_EXPERIMENTAL_DATALAYER_V2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt have to be done in this PR, but I'm wondering if we extend:

To include a field for features to be enabled, we will soon have: Flow Control, SLO prediction, & now the pluggable data layer as experimental, opt-in features. So rather than having various env vars, we keep all feature gating in one place

Copy link
Contributor

@nirrozenbaum nirrozenbaum Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention is to have everything configured through the config file eventually.
do you suggest adding a free form of key value pairs as variables (kinda similar to env vars) for the transitionary state when a feature is experimental?
I was proposing this change recently, which I think aligns with your intention - #1288

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not everything but would be good to discuss the cutline. Def out of scope for this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. maybe not everything.
I meant the extension points of those parts :)
anyway, the parameters section the was suggested in #1288 can be leveraged as an solution for experimental features, so instead of having env vars we can just configure a parameter - e.g.,

- name: enable-experimental-datalayer
  value: true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I definitely agree there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we doc ^ features similar to K8s feature gates. We should also drop any "experimental" references and simply call them by feature name under featureGates. The associated docs will detail feature level, e.g., Alpha, and a short description of the feature. For example:

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
featureGates:
  dataLayer: true # Defaults to false
  flowControl: true # Defaults to false until the feature graduates to beta
  ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @shmuelk another useful use case for adding parameters section (key value pairs) to the config API.
to answer the above, we should have those parameters consumable not necessarily though the plugins but also by runner.go (as an example) to enable/disable features. generally speaking it could replace the usage of env vars (e.g., today we have env vars in saturation detector).

Copy link
Contributor

@shmuelk shmuelk Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @danehans suggestion of a featureGates section. From a config point of view it is simply a map[string]bool.

The code that consumes the feature gates should validate the set and apply the values.

Overloading a shared parameters with so-called well known parameter names is bad idea and will lead to conflicts and confusion.

@kfswain
Copy link
Collaborator

kfswain commented Aug 20, 2025

This LGTM, and just brought this up to @elevran via DM but wanted to bubble it up here, do we want to include this in v1.0 or , since its experimental should it wait for a fast follow of v1.1? We have a few experimental features (that i mention in another comment) that are also going to be landing soon.

I prefer to wait but I don't hold that opinion strongly. Holding so this can be discussed/ack'd

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 20, 2025
@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 20, 2025
@nirrozenbaum
Copy link
Contributor

finally finished reviewing this PR (had some limited time in the last couple of days).

/lgtm
/approve

This LGTM, and just brought this up to @elevran via DM but wanted to bubble it up here, do we want to include this in v1.0 or , since its experimental should it wait for a fast follow of v1.1? We have a few experimental features (that i mention in another comment) that are also going to be landing soon.

I prefer to wait but I don't hold that opinion strongly. Holding so this can be discussed/ack'd

I understand the concern. IMO the risk here is low since most of the changes are in datalayer package and the only changes that are in the bootstrap path are minor updates for being able to select between the old metrics collection or the new data layer. by default the old one should be used.
this will allow us in llm-d to get a feeling of this feature and get it more robust for the next release.
so +1 from me for getting this in this version.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elevran, kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kfswain,nirrozenbaum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kfswain
Copy link
Collaborator

kfswain commented Aug 20, 2025

SGTM
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 20, 2025
@k8s-ci-robot k8s-ci-robot merged commit 091ebea into kubernetes-sigs:main Aug 20, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extensible data layer: EPP should allow configurable metrics collection

6 participants