Description
Definition
The "Unknown Fields" situation (colloquially we've often referred to this as the "Dead Fields" situation as well) is the situation where the API provides one or more fields that an implementation doesn't actually support, but that implementation is otherwise schema-compatible with the rest of the fields.
Note: In this context we're referring very specifically to the Gateway API CRDs.
In the best case the user doesn't need or use the dead fields, and the problem is non-impacting (at least for that moment).
In the bad case the user utilizes the fields and the API accepts them, but the implementation doesn't support them and so they are "dropped silently". This is not axiomatic for the user. The implementation problematically provisions something that is only a partial reflection of the users' intent.
The worst case is the same as the bad case, except those dropped fields are critical and it is broken and/or unsafe to provision the rest without them. For instance, if some security related fields like authentication were populated and the implementation drops them and exposes a route without its intended authentication (and the user has no way to tell).
What?
I'm advocating for an upstream solution to the "dead fields" problem to help Gateway API implementations and Kubernetes platforms create a safer and more consistent experience for users. This should include both documentation and tooling.
Note: This problem is not unique to Gateway API, and in fact I'm aware of it existing for at least Cluster API (CAPI) and potentially some other projects. However, we were early adopters of CRDs for an official API so the impact is particularly strong here. See more context below.
Why?
Note: Originally discussed in #3576 (comment)
In Gateway API, we have several dimensions where the "dead fields" problem comes into play:
- Clusters housing multiple implementations
- Extended feature support
- Experimental channel
We'll explain each of these situations in more detail.
Clusters housing multiple implementations
The Gateway API CRDs have become ubiquitous, and as such the situation where multiple implementations on the same cluster is a thing that needs to be supported is becoming more and more common. Consequentially, platforms are inclined to provide the Gateway API CRDs as "core-like" APIs on these clusters.
Core-Like: meaning that in effect the API is like a "core API" - It's provided by default and can't be managed by the cluster admin or any user, instead the full life-cycle of the API is managed by the platform. The biggest differences is that you can't necessarily expect a specific version of Gateway API to match with a specific version of Kubernetes (today).
This gives implementations at least two obvious choices. They can either:
a) crash until the schema that matches is present
b) accept the CRDs as long as they are newer than the minimum required, and forward compatible (e.g. same major release)
A is nobody's favorite. The implication is that implementations might need to (across a matrix of platform versions and the corresponding Gateway API version) create specific releases that target a combination of platforms and their versions, and while doing so match the Gateway API version there exactly. This pushes implementations to B which is where the dead fields dragons lie.
Extended feature support
We have a concept in Gateway API called "Support Levels". This explicitly adds fields that implementations are not required to support. These are not exactly the same as "unknown fields", but relate.
A result of this enhancement should be to define the standard for what implementations do when they have a field populated which they can't use, whether that comes from an "unknown field" or an "known field" (i.e. extended field).
Experimental channel
This is probably one of the least problematic situations, since we anticipate if you're using something labeled Experimental
then what you're doing is testing or something adjacent to that, so the impact of dropped fields is limited to non-production cases here. This is mainly a problem because we originally "layered" experimental features on top of the same GVK as standard and had users choose which one to deploy. A solution to that problem is being discussed.
How?
I'm open to suggestion on the "how".
Note: Though the "how" is very open for discussion, I do have some thoughts on what I think might not work or might work that seem worth sharing now as the basis for that discussion.
Telling implementations that they should schema check the CRDs and crash on additive schema, and target releases against specific combinations of platform versions and their Gateway API versions is... well... maybe on the table but it's sure going to be unpopular, people are going to prefer a more flexible solution since this isn't actually a core API.
I currently don't think Validating Webhooks or Validating Admission Policies (VAP) or anything like that is going to be particularly viable. We could in theory add something to the specification (e.g. something like supported features) to inform validation, but then we still have to contend with race conditions and the "best effort nature" of cross-resource validation since we'll have to follow the
GatewayClass <-> Gateway <-> *Route
resource chain to do that validation.A solution I have not tested but I think might be viable is if we provide tooling for implementations to inject schema checking into object deserialization in the API. That is to say, we might be able to provide a custom decoder which would be a "dead fields detector": implementations could employ this in their controllers, and it could tell them whether the actual API schema is different from the one they expect, and furthermore if the user specified values in those "dead fields". Again this still needs to be tested out, but if it can work then implementations can just identify and bark at dead fields themselves and we can provide conformance tests to ensure they bark. One downside of this is that it's not a universal solution right out of the box. We can provide Golang and Rust tooling (via gateway-api-rs), but we simply can't feasibly provide full solutions for all languages and instead would have to provide guidance for them to add it themselves.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status