Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid schema generated for #[serde(flatten)] HashMap<String, serde_json::Value> #844

Closed
jcaesar opened this issue Mar 2, 2022 · 11 comments · Fixed by #845
Closed

Invalid schema generated for #[serde(flatten)] HashMap<String, serde_json::Value> #844

jcaesar opened this issue Mar 2, 2022 · 11 comments · Fixed by #845
Labels
bug Something isn't working

Comments

@jcaesar
Copy link
Contributor

jcaesar commented Mar 2, 2022

Current and expected behavior

The following CRDT

#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(
    group = "playground.liftm.de",
    version = "v1alpha1",
    kind = "Test",
    namespaced
)]
pub struct TestSpec {
    foo: String,
    #[serde(flatten)]
    arbitrary: HashMap<String, serde_json::Value>,
}
derives an openAPIV3Schema that contains a property for foo, and additionalProperties.
{
  "description": "Auto-generated derived type for TestSpec via `CustomResource`",
  "properties": {
    "spec": {
      "additionalProperties": true,
      "properties": {
        "foo": {
          "type": "string"
        }
      },
      "required": [
        "foo"
      ],
      "type": "object"
    }
  },
  "required": [
    "spec"
  ],
  "title": "Test",
  "type": "object"
}

Trouble is that k8s has some extra rules over JSON schemas, the problem in this case is:

The field additionalProperties is mutually exclusive with properties.

So the generated schema is invalid.

Possible solution

It may be possible to somehow configure schemars to generate the wanted schema, but I haven't looked deeply into schemars' GenVisitor.

As a workaround, one can convert the generated schema to a serde_json::Value and manually replace all "additionalProperties": true by "x-kubernetes-preserve-unknown-fields": true. Code example. But that's maybe to ugly a hack for a PR?

Additional context

(Oddly, this doesn't happen with #[serde(flatten)] arbitrary: HashMap<String, String>, for which nothing at all is generated, which also seems wrong.)

Environment

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"archive", BuildDate:"2022-01-19T22:09:46Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1

Configuration and features

kube = { version = "0.69.1", features = ["runtime", "derive"] }
serde = { version = "1.0.136", features = ["derive"] }
serde_json = "1.0.79"
k8s-openapi = { version = "0.14.0", features = ["v1_21"] }
schemars = "0.8.8"

Affected crates

kube-derive

Would you like to work on fixing this bug?

maybe

@jcaesar jcaesar added the bug Something isn't working label Mar 2, 2022
@jcaesar jcaesar changed the title Invalid schema generated for #[serde(flatten)] HashMap<String, serde_json::Value> Invalid schema generated for #[serde(flatten)] HashMap<String, serde_json::Value> Mar 2, 2022
@clux
Copy link
Member

clux commented Mar 2, 2022

Ah, thanks for this. Super helpful.

So, basically:

  • additionalProperties is used for maps, whereas properties are used for structs
  • #[serde(flatten)] on a map should turn the field values into properties on the parent struct
  • using #[serde(flatten)] must cause something to remove the additionalProperties after such a conversion

This sounds to me like it's worth raising an issue against schemars for, unless there are some use cases that actually use both properties and additionalProperties.

At the very least the proposed solution won't work when users embed legit (non-flattened) BTreeMap or HashMap types into their CRD (which we see in kopium).

I can see there was one fix for flattening in the last schemars release, but it looks unrelated, and don't see any other related flattening issues there.

@clux
Copy link
Member

clux commented Mar 2, 2022

It also makes sense that it doesn't happen for #[serde(flatten)] arbitrary: HashMap<String, String> because the schema for that case is additionalProperties (to indicate map), but when the additionals have no sub properties, that afaikr indicates String -> String.

@clux
Copy link
Member

clux commented Mar 2, 2022

Btw, what schema do you get if you instead use the following struct:

pub struct TestSpec {
    foo: String,
    #[serde(flatten)]
    arbitrary: serde_json::Value,
}

i don't think you need to nest the Value in a map? this is what we do in DynamicObject

@jcaesar
Copy link
Contributor Author

jcaesar commented Mar 2, 2022

When replacing the HashMap with just a serde_json::Value, I get:

properties:
  spec:
    properties:
      foo:
        type: string
    required:
      - foo
    type: object
required:
  - spec
title: Test
type: object

But with that, I'm lost in several ways:

If I try to just create a resource "normally", with kubectl apply, I get the following:

$ cat examplefoo.yaml
---
apiVersion: playground.liftm.de/v1alpha1
kind: Test
metadata:
  name: test-test
spec:
  foo: works
  bar: rejected

$ kubectl apply -f examplefoo.yaml 
error: error validating "examplefoo.yaml": error validating data: ValidationError(Test.spec): unknown field "bar" in de.liftm.playground.v1alpha1.Test.spec; if you choose to ignore these errors, turn validation off with --validate=false

That's about to be expected, kubernetes doesn't like properties that aren't explicitly allowed. (So for me, just using serde_json::Value is out, because I do want to be able to create these CRs with kubectl apply.)

But it works with DynamicObject, right? So I tried creating the object via kube-rs. That seems to work at first, but when I inspect the object, the additional properties are gone:

$ kubectl get test asdf -o yaml
apiVersion: playground.liftm.de/v1alpha1
kind: Test
metadata:
  creationTimestamp: "2022-03-02T09:54:40Z"
  generation: 1
  name: asdf
  namespace: kafka
  resourceVersion: "48831"
  uid: edde3ebc-d8c8-45c2-94f3-784cd3a59267
spec:
  foo: works

So, it just skips validation, but then drops the properties. Is DynamicObject actually working as intended?

(Nevermind that I tried spawning kubectl from rust, which for some reason implicitly specifies --validate=false, but otherwise behaves equal to the kube-rs api call.)

@clux
Copy link
Member

clux commented Mar 2, 2022

You could try to specify preserve-unknown-fields. Maybe that helps your use case even if it's a bit of cheat code. See kube-rs/kopium#31

@jcaesar
Copy link
Contributor Author

jcaesar commented Mar 2, 2022

Is… is there a way to specify preserve-unknown-fields on a derived CRD?
Because that's what I was hoping to get from the beginning, instead of additionalProperties.

@clux
Copy link
Member

clux commented Mar 2, 2022

Yeah, should possible by customizing parts of the schema or the entire schema.

There is an example of that in crd_derive_schema, e.g. this particular part:

https://github.com/kube-rs/kube-rs/blob/0b5e803bc5f7c7829960eb06bfd99bfcbc000c45/examples/crd_derive_schema.rs#L86-L103

i haven't gotten a specific example with x-preserve-unknown-fields, but the process should be analogous.

@jcaesar
Copy link
Contributor Author

jcaesar commented Mar 3, 2022

Okay, this took me a good long 1½ hours to figure out (I had mistakenly thought this would be easy…): In case of a #[serde(flatten)], schemars generates a schema for the struct without the flattened field, then a schema for the type of the flattened field, and then calls flatten to merge the second schema into the first. So, to get a valid merged schema, the schema for the flattened field should only contain an extension and be otherwise empty.

So this is a passable workaround (for both serde_json::Value and HashMap<String, …>):

pub struct TestSpec {
    foo: String,
    #[serde(flatten)]
    #[schemars(schema_with = "preserve_arbitrary")]
    arbitrary: HashMap<String, serde_json::Value>,
}

fn preserve_arbitrary(_gen: &mut schemars::gen::SchemaGenerator) -> Schema {
    let mut obj = SchemaObject::default();
    obj.extensions
        .insert("x-kubernetes-preserve-unknown-fields".into(), true.into());
    Schema::Object(obj)
}

So for the real fix, what kind of issue do you want to raise against schemars? I still think that handling that in a visitor is the right direction. StructuralSchemaRewriter is already doing something quite similar.

@clux
Copy link
Member

clux commented Mar 3, 2022

Hm, yeah, we are doing something specifically for structural schemas, but that's a kubernetes specific restriction. I feel like there's probably a good case for schemars clearing some empty containers on flatten to avoid both properties and additionalProperties getting set. I'm not super familiar with the internals of schemars so not sure how this would work though.

Still, I am not sure we need to jump through these hoops (even if schemars could be improved here). If you only use a flattenend Value and use a schema_with on that you avoid the additionalProperties problem. I.e.

pub struct TestSpec {
    foo: String,
    #[serde(flatten)]
    #[schemars(schema_with = "preserve_arbitrary")]
    arbitrary: serde_json::Value,
}

you do get the following schema:

        openAPIV3Schema:
          description: "Auto-generated derived type for TestSpec via `CustomResource`"
          properties:
            spec:
              properties:
                foo:
                  type: string
              required:
                - foo
              type: object
              x-kubernetes-preserve-unknown-fields: true
          required:
            - spec
          title: Test
          type: object

which i think is an easier way to describe what you want?

@jcaesar
Copy link
Contributor Author

jcaesar commented Mar 4, 2022

(The schema_with = "preserve_arbitrary" generates the same schema for Value and HashMap<String, Value>. The former will fail serialization if you set non-map values like arbitrary = Value::String("nah".into()), so I prefer the latter.)

jcaesar added a commit to jcaesar/kube-rs that referenced this issue Mar 5, 2022
jcaesar added a commit to jcaesar/kube-rs that referenced this issue Mar 5, 2022
jcaesar added a commit to jcaesar/kube-rs that referenced this issue Mar 5, 2022
…o serde(flatten)

See also: kube-rs#844

Signed-off-by: Julius Michaelis <gitter@liftm.de>
@clux clux closed this as completed in #845 Mar 24, 2022
clux added a commit that referenced this issue Mar 24, 2022
* Fix schemas containing both properties and additionalProperties due to serde(flatten)

See also: #844

Signed-off-by: Julius Michaelis <gitter@liftm.de>

* comment suggestion

Co-authored-by: Eirik A <sszynrae@gmail.com>

Co-authored-by: Eirik A <sszynrae@gmail.com>
@jcaesar
Copy link
Contributor Author

jcaesar commented Mar 25, 2022

Thanks for your efforts.

I don't know if you want to keep this open for the #[serde(flatten)] serde_json::Value case, or make that a separate issue?
(I have opened GREsau/schemars#133 in the meantime.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants