Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify process for validating a compound schema document against meta-schema(s) #936

Closed
handrews opened this issue May 22, 2020 · 24 comments · Fixed by #977
Closed

Specify process for validating a compound schema document against meta-schema(s) #936

handrews opened this issue May 22, 2020 · 24 comments · Fixed by #977

Comments

@handrews
Copy link
Contributor

PR #914 is blocked on the question of how to validate schema document containing embedded schema resources with differing meta-schemas.

In my view, when loading a schema document, an implementation should be able to recognize embedded schema resources and treat them separately, as if they were $ref'd.

There are two concerns:

  1. Do we support embedding draft-04 schemas? This requires recognizing id.

I think we should say no, if you want to use draft-04 with draft 2020-NN, you have to $ref it. Since draft-04 did not have vocabularies, there is no way to tell the difference between a custom draft-04 meta-schema relying on id and a custom meta-schema of another draft, that happens to have id next to it either b/c of a typo or someone did a really weird extension.

  1. How does the meta-schema look?

@Relequestual says:

When processing a schema document with any embedded schema resources, for the purposes of schema validation against meta-schemas (confirming the JSON Schema document is likely to be processable), embedded schema resources SHOULD be validated within their own JSON Schema feature set (using the appropriate meta-schema). For enclosing schema resources (which is likely the document root schema), an embedded resource SHOULD be considered as a valid schema document, with the value of true, for the purposes of validating the enclosing schema resource as a valid JSON Schema.

Which I almost follow but it's late and I think with the other points this deserves to be visible in an issue.

@Relequestual
Copy link
Member

  1. Do we support embedding draft-04 schemas? This requires recognizing id.

No, we do not. draft-04 is not forwards compatable in this way.

Besides the fact we want a hard "you should upgrade if you can" message, being able to support such would put a burden on developers I do not want.

Realistically we should RECOMMEND that schema authors do not embed or referece JSON Schema documents which are constructed for different versions.

  1. How does the meta-schema look?

What I was attempting to convey, in a very sleepy state, is the following...

Say your root schema is 2020-whatever. You process the schema against the meta-schema (if you do that kinda thing), and if you encounter any embedded resource which identifies as using a different feature set, you treat it as it's own document, and validating it as such accordingly.

Effectly, if you identify an embedded schema resource, you don't validate it as part of the meta-data validation.

This is achievable using a meta-schema, although it may look a little narley to look for a $schema const, and a not > $schema const at the schema root.

It means that you kind of re-take apart the embedded documents and process them individually.


We cannot have a scenario where bundling an external reference as an embedded schema resource changes the behavior from best effort ("I have no idea what this is but I'll pretend it's the standard core+validation and give it a shot") to an error. - @handrews #914

My thoughts were, if no $schema is defined for the embedded resource, it should be treated as the same of the enclosing document.

In the case where $schema is provided, if it's not known, the meta-schema is also not known, so it's unknown if the implementation understands all the vocabularies, which it neededs to determine (if it knows them or not, and what support is required for processing the associated schema) in order to know if it should process said schema or throw an error (because it doesn't support a required vocabulary).

@handrews
Copy link
Contributor Author

@Relequestual

It means that you kind of re-take apart the embedded documents and process them individually.

I think this makes sense. @jdesrosiers is this what you said you're doing already?

In the case where $schema is provided, if it's not known, the meta-schema is also not known, so it's unknown if the implementation understands all the vocabularies, which it neededs to determine (if it knows them or not, and what support is required for processing the associated schema) in order to know if it should process said schema or throw an error (because it doesn't support a required vocabulary).

This is covered in 2019-09 for independent documents, although it took me a while to find it (we don't talk about it under $schema, only under $vocabulary, even for behavior of unknown $schema values, so maybe that needs tidying up.

TL;DR: behavior when the vocabulary can't be determined is implementation-defined, because that's essentially what it always was before.

My thoughts were, if no $schema is defined for the embedded resource, it should be treated as the same of the enclosing document.

Yes, that's in the PR and AFAICT not controversial.

Realistically we should RECOMMEND that schema authors do not embed or reference JSON Schema documents which are constructed for different versions.

Part of the reason for doing this at all is acknowledging the reality that in a large ecosystem you do not necessarily control all schemas involved. If you're relying on stuff from from some officially maintained set of schemas (say, 3rd-party data format schemas) but need to use newer features in your local schemas, this will happen.

With the OAS folks, we'd talked about being able to retrofit an id to indicate that an external schema file was still using old OAS 3.0 (or 2.0) rules. I'm not sure how in-demand that was or if it was more of a hypothetical. Also, maybe saying it's fine for external files but you can't embed it (OAS 3.0 or draft-04) covers the part that actually needs covering. I think @philsturgeon as an OpenAPI+JSON Schema vendor person can best speak to this. I'd certainly rather not support embedded changes from 2020-NN to draft-04 or OAS 3.0/2.0. I agree that we're not trying to provide perfect compatibility.

Technically, the way the spec has always been written, each schema document is loaded under its own rules. Because nothing in the spec says that the processing is determined by the $ref source. Everything in the spec prior to 2019-09 talks about schema documents and $schema. In draft-07, we talk about not changing processing rules within a schema document, which implies that changing processing rules across schema documents is a thing that happens. Nothing in that draft (or, to the best of my knowledge, any earlier draft) says otherwise.

It also might be worth noting that transcluding an older draft resource into a 2020-NN resource does not mean that that older embedded resource suddenly gains the ability to further embed resources with different $schemas. 2019-09 and earlier leave such behavior as undefined, or just kind of punt in a CREF. So I suppose someone could support it but there's no way to justify requiring that.

@karenetheridge
Copy link
Member

karenetheridge commented May 23, 2020

Do we support embedding draft-04 schemas?

I don't think we should not require (as in the RFC "MUST" language) any earlier draft to be supported via $schema at all. IMO the advantage of $schema keywords in subschemas is to switch vocabularies, not to switch keyword semantics to older specification definitions. To properly support earlier schema versions correctly will be a real PITA -- e.g. earlier drafts even allowed for $refs to point to anything, not just a subschema (for example, {"multipleOf": {"$ref":"#/definitions/foo/multipleOf"}} used to be legal). When writing a new implementation, I don't really want to have to go back and look for all the subtle language differences in earlier draft specs to bugwardly support them. When a new spec document comes out, it should be able to entirely replace the old ones. Is that too idealistic to be practical?

@handrews
Copy link
Contributor Author

@karenetheridge yes, in the long run I expect this to be more about switching among different meta-schemas and/or vocabularies in bundled resource documents.

I don't personally care if anyone implements old drafts or not. However, draft-06 and later would not be particularly hard ($id was introduced, and $ref became only allowed where schemas are allowed). But I still plan to recommend folks start with 2020-NN / OAS 3.1 for new implementations.

I would be fine with restricting this to 2020-NN and later, but either way we still need to handle the rest of the questions raised here.

When a new spec document comes out, it should be able to entirely replace the old ones. Is that too idealistic to be practical?

Continued usage of old drafts suggests that the answer is yes, that's too idealistic. :(

@jdesrosiers
Copy link
Member

  1. Do we support embedding draft-04 schemas? This requires recognizing id.

This is actually an edge case that I missed. My implementation doesn't handle this properly, but I don't see why it couldn't. I just need to check for a sibling $schema keyword and switch the identifier keyword I'm looking for. If no $schema is present, the dialect is assumed to be that of the enclosing schema.

As for custom dialects/vocabularies/meta-schemas, I have two ways of dealing with those. If the custom meta-schema uses the $schema of the dialect it is extending/modifying, I can assume that the configuration for that dialect is to be used. For example, in the following example, my implementation knows that the https://example.com/my-dialect dialect should use the draft-04 rules.

Schema:

{
  "id": "https://exmaple.com/my-schema",
  "$schema": "https://example.com/my-dialect",
  ...
}

Meta-schema:

{
  "id": "https://example.com/my-dialect",
  "$schema": "http://json-schema.org/draft-04/schema#",
  ...
}

This is just a nicety to make it easier to make simple meta-schema changes without needing to write any configuration.

However, if the custom meta-schema uses some of it's new keywords in the meta-schema itself (like the hyper-schema meta-schema), you can't use the extended/modified schema as the $schema. In that case, you have to register the dialect with the library and configure which type of identifier (among a few other things) the dialect uses. It's not hard, but it can't be done automatically if you don't assume a default set of rules (which I don't, but it might be reasonable to do).

  1. How does the meta-schema look?

Meta-validation can be handled properly without any change to meta-schemas if schemas are processed in a certain way when they are loaded. My implementation splits embedded schemas into the equivalent two schemas with a reference when schemas are loaded. The schemas are then validated separately against the meta-schema that applies that schema.

(example uses old version of $ref because the new version gets complicated, but the idea is the same)

{
  "$id": "https://example.com/schema1",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "foo": {
      "$id": "https://example.com/schema2",
      "$schema": "http://json-schema.org/draft-06/schema#",
      "type": "string"
    }
  }
}

This gets converted to two schemas when loaded.

{
  "$id": "https://example.com/schema1",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "foo": { "$ref": "https://example.com/schema2" }
  }
}

{
  "$id": "https://example.com/schema2",
  "$schema": "http://json-schema.org/draft-06/schema#",
  "type": "string"
}

This all works great if you know you are working with a schema and can split it out when needed. The problem comes when you validate a schema as an instance against a meta-schema. The validator doesn't know that an instance is a schema and should therefore be processed as a schema rather than plain JSON.

I can think of two solutions, but they are both awkward. The first is to define bundled schemas as a transport encoding and not a valid schema by itself. Sticking that bundle into a validator and expecting it to validate against a meta-schema is just wrong.

The other solution is to introduce some kind of special keyword or type that indicates that a value is a schema of any dialect (which is something a recursive reference can't do). You could then replace all recursive references in meta-schemas with the new keyword/type and validators would have the context necessary to validate a bundled schema. I would probably go with "type": "schema" for this. It's awkward to add a non-JSON type to type, but it's also not unprecedented considering "integer" is not a JSON type either. (Hyperjump Validation uses this strategy, so I know it's a feasible option.)

@Relequestual
Copy link
Member

@jdesrosiers I'm not totally clear on the second solution which you've implemented in your comment. Could you maybe exapnd it a little please?

What we're effectivly saying is, Schemas need to have a different processing model than simply applying the meta-schema, right?

I don't think we CAN form a correct meta-schema for these situations, unless we re-create old meta-schemas in newer versions of JSON Schema (Which I think would cause some confusion).

It also might be worth noting that transcluding an older draft resource into a 2020-NN resource does not mean that that older embedded resource suddenly gains the ability to further embed resources with different $schemas. - @handrews

Yes, I think we need to make sure that's covered! Good call.

@jdesrosiers
Copy link
Member

What we're effectively saying is, Schemas need to have a different processing model than simply applying the meta-schema, right?

That's true, but it only addresses the part of the problem that's effectively solved. There are two distinct situations.

  1. Loading a schema
    I know it's a schema and I can do the necessary processing before applying it to one or more meta-schemas as needed. I think everyone's in agreement here that splitting out embedded documents into their own schema and meta-validating each separately solves this problem well.

  2. Validating an instance that happens to be a schema against a schema that happens to be a meta-schema
    In this case, the instance just looks like a schema, but it's actually just plain old JSON. No special processing takes place because it's a schema. Because it's not a schema. This is the case that needs a solution. The schema processing model is irrelevant because the schema is not really a schema.

I'm not totally clear on the second solution which you've implemented in your comment. Could you maybe expand it a little please?

Sure. The problem is that we can't treat an instance as a schema because we don't know that it's a schema. The solution I was proposing is to annotate it somehow so the validator knows that it's a schema and can process it differently. My suggestion was to add a new type value "schema". So, meta-schemas would replace all { "$recursiveRef": "#" } with { "type": "schema" }. That tells the validator that that part of the instance should be interpreted as a schema rather than just plain JSON. It then kicks off a meta-validation based on the declared or inherited $schema.

An instance is valid against "type": "schema" if it is valid against the meta-schema identified by it's $schema.

This solution would also eliminate the need for special processing when loading a schema. If it works when we don't know it's it's a schema, we don't need to do anything special when we do know it's a schema.

I don't think we CAN form a correct meta-schema for these situations, unless we re-create old meta-schemas in newer versions of JSON Schema (Which I think would cause some confusion).

Right. That's the biggest problem with "type": "schema". Previous drafts and their meta-schemas would have to change for this to work with previous drafts. It could be added to the next draft, but we would only get the benefit for future drafts. Given the expectation of more custom vocabularies, it might be worth adding for future mixing of schema dialects, but it won't help with past drafts.

Honestly, at this point I like the other solution better, "define bundled schemas as a transport encoding and not a valid schema by itself". It's simple, doesn't require new keywords, and it's backwards compatible.

@Relequestual
Copy link
Member

Relequestual commented Jul 2, 2020

I've been away from this issue for a while so I'm going to try and summarise what I feel is the consensus here, ask people to agree or dissagree on that summary of consensus, and look to move this to a PR.

In my view, when loading a schema document, an implementation should be able to recognize embedded schema resources and treat them separately, as if they were $ref'd. - @handrews

Yup, in fact, I came to this colusion about 5 mins before reading the first comment of this issue again.

    How does the meta-schema look?

Meta-validation can be handled properly without any change to meta-schemas if schemas are processed in a certain way when they are loaded. My implementation splits embedded schemas into the equivalent two schemas with a reference when schemas are loaded. The schemas are then validated separately against the meta-schema that applies that schema. - @jdesrosiers

So, an implementation needs to recognise when they are handling a Schema which has an embedded resource, and act as if that resource has been $refed (as @handrews said).

[Where the $schema or $vocabulary is unknown] is covered in 2019-09 for independent documents, although it took me a while to find it (we don't talk about it under $schema, only under $vocabulary, even for behavior of unknown $schema values, so maybe that needs tidying up.

TL;DR: behavior when the vocabulary can't be determined is implementation-defined, because that's essentially what it always was before. - @handrews

I think what's being said here is, in this regard, we don't need to change anything. (Although as an aside, I find it worrying that when presented with an unknown $schema value, an implementation could "legally" just claim the instance isn't invalid, and the user would be none the wiser.)

I'm not sure my last quote and comment requires any action.

This issue is blocking #914
I'd like that to be unblocked! =]

I'm not 100% sure if there's anything relating to handling meta schemas in this issue or not... I'd need to re-parse.
If there is, and it doesn't impact #914, then I'd like to punt it to ANOTHER issue, and unblock #914 as a priority first.

@Relequestual
Copy link
Member

Relequestual commented Jul 3, 2020

I merged #914 because I think a follow up PR is fine, and the basis of that PR is correct.

@jdesrosiers
Copy link
Member

So, an implementation needs to recognise when they are handling an instance which has an embedded resource, and act as if that resource has been $refed.

You mean, "handling a schema", not "handling an instance", right? We can't make any assumptions about the semantics of instances.

The important bit that's missing from your summary is addressing the problem that a meta schema can't properly describe a schema with embedded schemas. It should be included in the spec that when validating a schema (as the instance) against a meta-schema (as the schema), the schema can't have embedded schemas.

@Relequestual
Copy link
Member

Relequestual commented Jul 31, 2020

Thanks @jdesrosiers I've fixed /instance/schema/.

Ah yes, good point... taking an earlier quote....

Honestly, at this point I like the other solution better, "define bundled schemas as a transport encoding and not a valid schema by itself". It's simple, doesn't require new keywords, and it's backwards compatible.

Essentially, JSON Schema does not afford provision to validate a bundled schema by simply applying a meta-schema without the forknowledge that the instance is a schema.

Simply saying

...when validating a schema (as the instance) against a meta-schema (as the schema), the schema can't have embedded schemas.

I'm not sure is what we want. We DO want implementations to be able to validate a bundled schema document... but only by virtue of handling each resource individually with the appropriate JSON Schema dialect.

To paraphrase, a bundled schema document may be formed out of resources with different JSON Schema dialect, and as such an implementation that applies a meta-schema to the bundled schema document as if it were simply JSON, is likely to encounter incorrect validation results.

Implementers may provide a configuration option to allow a user to identify the provided instance as a bundled schema document, which allows the implementation to apply meta-schemas to individual schema resources for which they know the meta-schema associated with dialect identifier ($schema value) provided.

This is sort of paraphrasing from #936 (comment) also.

I guess the main difference to the processing model is: if the instance is identified as a schema, I don't know how you define exactly the processing of the root instance location which contains the embedded resource. Extracting from the link above, I suggested that the validation result be true, because actual validation of that value is going to be handled by a different feature set (aka dialect).

Does this make any sense @jdesrosiers

@jdesrosiers
Copy link
Member

@Relequestual This doesn't feel right to me. It violates one of the core principles of JSON Schema. JSON Schema validates plain JSON instances only. Nothing in the schema has any semantics that effects validation. This is the principle we cite when we say $schema doesn't have meaning in an instance. We would be violating that principle by saying we're now going to support a different mode of validation where instances are interpreted as schemas instead of plain JSON. I think violating a core principle is a pretty strong indicator that we're not on the right track.

Because we are violating a core principle, it could be challenging for implementers to adapt their implementations to support a completely different validation mode. If this is a path we want to take, I think there should be some proof-of-concept to make sure it's feasible.

@Relequestual
Copy link
Member

OK. Assuming you meant "nothing in the instance has any semantics that effects validation" as opposed to "nothing in the schema has any semantics that effects validation".
Yes. I agree. I wasn't suggesting that the instance SELF identifies as a schema in any way, but that, as a configuration option of the library, you could signify "the instance you're about to validate contains bundled schemas from potentially different dialects".

If this were so, by default, the validation process would remain as is, and validation of the bundle (as an instance) against the meta-schema might fail.

If the config option is set, I see two approaches to processing the instance we now know is a bundled schema document. Both appraoches would avoid needing to add a new type, which I'd like to avoid. It has implications for tooling beyond just validation, and I don't feel it's necessary.

Two possible approaches similar to your suggested approach:

  1. When processing the recursive references, the implementation could choose the correct function to use depending on if that config option was set when calling the validation function.

  2. Each identified schema resource is extracted and validated on its own against the appropriate meta-schema. Where the schema resource was embedded is converted to true.

Does the 2nd approach sound feasable and similar enough to your currently working approach?

@jdesrosiers
Copy link
Member

Assuming you meant "nothing in the instance has any semantics that effects validation" as opposed to "nothing in the schema has any semantics that effects validation".

Ha. Yes, that's what I meant.

I wasn't suggesting that the instance SELF identifies as a schema in any way

I wasn't suggesting that either. The point isn't how it identifies as a schema. The point is that it needs to be treated as something other than plain JSON. In this case, it's $id + $schema that has special semantics that needs to be taken into account by the validator. It doesn't matter if it self identified as a schema or some config identified it as a schema, the effect on the validation algorithm is the same.

After thinking about implementation a bit more, I realized that if we treat an instance as a schema, then the schema is not necessary. No matter what the schema (meta-schema) is, we are actually validating against all meta-schemas the implementation supports, not just that meta-schema. The instance (schema) determines what schema ($schema) the instance will be validated against, not the given schema (meta-schema). It doesn't fit into the instance-validates-against-schema pattern. Hopefully that makes sense.

My suggestion of adding a "type": "schema" doesn't solve that problem either. The only option left is to define bundled schemas as a transport encoding and not a valid schema by itself.

@Relequestual
Copy link
Member

The issue here is that people want to validate a bundled schema.

We can specify that AS IS you cannot, but that it must be deconstructed into the individual schemas resources first, then validated.

I guess the key question here is, will implementations be required to provide a means / function to do this?

If we do require implementations to provide a means, then we have to define the approach.

A viable approach would be as was quoted in the first comment of this issue.

If we do NOT require implementations to provide a means, then we leave it up to each implementation to implement it if they choose, and end up with N potential solutions, where the resulting output may differ depending on approach.

My expectation (if my suggested approach was followed) is to result in multiple validation results (because of the validation mode has been indicated as validating a schema document with bundled schema resources).

Because we are violating a core principle, it could be challenging for implementers to adapt their implementations to support a completely different validation mode. If this is a path we want to take, I think there should be some proof-of-concept to make sure it's feasible.

I believed the simplest approach would be to, extract out each schema resource, replacing them with true, and validate each schema resource individually. (This is the distilled version of the initial comments quote).

@jdesrosiers
Copy link
Member

The issue here is that people want to validate a bundled schema.

They can still do that, just not through the standard (schema, instance) => boolean mechanism. Is that good enough? For example, in my implementation, you can validate a bundled schema by compiling the schema with the meta-validate flag turned on (it's on by default). Implementations that don't have a compile step would have to provide some other mechanism. For example, they could provide a validateSchema function that looks something like (schema) => boolean. Or, maybe make the instance an optional function argument and only do meta-validation if there is no instance. I wouldn't specify an approach. Implementors can choose what makes the most sense for their implementation.

I believed the simplest approach would be to, extract out each schema resource, replacing them with true, and validate each schema resource individually.

Even simpler than replacing embedded schemas with true is to replace them with an equivalent $ref. That way you can just validate the top level schema and follow $refs like normal during validation. You only need to validate once and you don't need to combine the results of multiple validations somehow when you're done.

If it helps, this is what my implementation does during each phase of the process.

  1. Add Schema
  • Decompose the schema into a set of equivalent schemas with references in place of embedded schemas.
  • Convert all schemas to an internal representation and add them to internal schema storage
  1. Compile Schema
  • Walk the schema (including following references) using keyword implementations to construct an AST
  • If the shouldMetaValidate flag is true, validate any schema encountered during the compilation process. Throw an error and abort compilation if a schema fails validation.
  1. Validate Instance
  • Walk the AST using keyword implementations to validate the instance

@Relequestual
Copy link
Member

I think we got some cross talking happening here, because I think we're on the same page based on your last comment! =D

Schema resource bundles can be validated, but cannot be validated using the standard "apply schema (meta-schema) to instance (schema resource bundle)". In stead, implementations may prove another means by which to allow validation of schema resource bundles.

And yes, your approach is actually a LOT cleaner, because once decomposed, you can then apply the standard validation process.

I imagine there are some implications here for verbose validation output...
@gregsdennis could you provide any thoughts on this?

Anyway, I'm going to mark this as accepted and move to writing a PR.

@Relequestual
Copy link
Member

Progress: I've written some notes on this based on the above discussions. I will look to have a PR ready before mid next week.

@Relequestual
Copy link
Member

Relequestual commented Aug 29, 2020

I've made some progress, and have a further consideration which I think may have been mentioned on slack or another issue...

When bundling a schema, you cannot simply replace the schema object which contains the $ref reference.
Given we're defining a Compound Schema Document, we must provide guidance on how to bundle schemas.

I started to reason about how to do this.
We CANNOT suggest the schema object which contains $ref is wrapped in an allOf where another item in the allOf array is the embedded schema document, as this will result in incorrect evaulation of keywords which rely on annotations derrived from sibling keywords.

I considered an approach where you must wrap the embedded schema resource in an allOf, or combine with existing allOf values in the instance where they exist. This is a little messy.

I considered an approach where we allow $ref to have a value of an object which is the embedded schema resource, which could be allowable only for compound documents... but messing with the model feels like a very non ideal solution.

Finally, I settle on an approach where, embedded schema resources MUST be put into $defs and referenced, but I'm uncertain how to determine the keys, in order to avoid conflict, and in order to mandate a consistent approach (to make sure output is conistent).

My only idea here is the $id of the embedded schema resource, but is OK I guess, but I think the references would then have to URI encode the URI... =s which is ugly... "$ref": "#/$defs/https%3A%2F%2Fjson-schema.org%2Fdraft%2F2019-09%2Fschema", and I don't know if that will make things equally horribly confusing to try and understand.

Please give me your suggestions?
Also happy for rebuttles or further considerations / comments on the issue in this comment.


Alternativly, we say NOTHING on how to construct a Compound Schema Document. But I KNOW the most popular schema de-referencing library is full of holes that people WILL trip over, as it JUST replaces the schema object containing $ref, with a pre "$ref plays nicely with other keywords now" mentality.

I would feel bad to say nothing, but even if we DO specify, people are still likely to try use that library, and run into problems. At least if we said something, it would be easier for someone to create a standardised bundler.

(Please do not re-open unrelated concerns around this issue thread, for now. PR in progress.)

Relequestual added a commit to Relequestual/json-schema-spec that referenced this issue Aug 29, 2020
Define Compound Schema Document and associated concerns
@helpr helpr bot added the pr-available label Aug 29, 2020
@jdesrosiers
Copy link
Member

jdesrosiers commented Aug 29, 2020

@Relequestual It's was @handrews that brought up these bundling issues when making the $ref changes for draft 2019-09. I think the resolution at the time was to do nothing, wait for feedback from implementors, and incorporate any necessary changes in the next draft. However, I don't think anyone has tackled a bundler for draft 2019-09 yet so there hasn't been any feedback.

I like the option of allowing $ref to be either a URI or a sub-schema. The change to the model isn't really different from the original definition of $ref, it just looks like it is because the type change is more obvious. To explain, I'll need to share the mental model I use for references.

Let's take this example of a draft-07 schema.

{
  "type": "object",
  "properties": {
    "foo": { "$ref": "/another-schema" }
  }
}

In this example, { "$ref": "/another-schema" } is a reference. It's not a schema, it's not even an object even though it looks like both. (It's not that adjacent keywords aren't allowed, it's that they are not keywords because a reference is not a schema, but this is a digression). The only reason it uses that syntax is because it needs to be represented in JSON compatible syntax. To help our brains see this as something other than a schema let's change the syntax to use angle brackets.

{
  "type": "object",
  "properties": {
    "foo": <ref="/another-schema">
  }
}

So, the values of properties can be either a schema or a reference.

Now let's interpret the same schema as a draft 2019-09 schema. Now, the reference (the part that can be replaced) is no longer an object, it's a string: "/another-schema". Let's use the same angle bracket trick to visualize it.

{
  "type": "object",
  "properties": {
    "foo": { "$ref": <ref="/another-schema"> }
  }
}

At this point the value of $ref can only be a reference. But, if we change that to say it can be a schema or a reference, it's exactly the same behavior as draft-07, we just redefined how we represent a reference. $ref is now just a keyword that behaves like a singleton allOf.

The problem we have with 2019-09 references is that $ref is a special case. It would be nicer if the same rule applied everywhere. That would mean that just lik in draft-07, anywhere a schema is expected, a reference is allowed. Although, the conceptual change to references is small, the look and feel of schemas would be drastic. It may be a big change to the look and feel of a schema, but it really cleans up a lot of the verbosity which I consider a pretty big win.

{
  "type": "object",
  "properties": {
    "foo": "/another-schema"
  }
}

The way I see it, the only downside to allowing this syntax is that tools can't identify a reference without knowing the vocabulary of the schema it's working with. I only know that /properties/foo is a reference because I know that properties is a keyword whose values are schemas. An example would be a bundler. Ideally, a bundler shouldn't have to know anything about your schema other than how to identify and inline references. I'd be able to use an extension vocabulary of additional applicators and the bundler doesn't care. That would be the only reason I think it might be worth $ref being a special case.

In summary, I think it makes sense to allow the value of $ref to be a schema. It's consistent with how references have always worked.

@jdesrosiers
Copy link
Member

jdesrosiers commented Aug 29, 2020

My only idea here is the $id of the embedded schema resource, but is OK I guess, but I think the references would then have to URI encode the URI... =s which is ugly... "$ref": "#/$defs/https%3A%2F%2Fjson-schema.org%2Fdraft%2F2019-09%2Fschema", and I don't know if that will make things equally horribly confusing to try and understand.

I didn't quite understand this at first, but I'm coming around. I think this should work. How this would be referenced is irrelevant, right? The actual references won't change. They will still reference an identifier. The names of these $defs just have to be something that won't conflict with a name someone might use naturally because they won't actually be used. A UUID would be a good choice.

The only thing I'm not sure of is if the keywordLocation would be preserved in the standard output results.

@Relequestual
Copy link
Member

Addressing the latest comment only... Kinda.
The reference would need to change to be prefixed with #/$defs, AND, the URI in the ref would need to be encoded. So you would end up with this...

{
  "$defs": {
    "https://jsonschema.dev/test.json": true
  },
  "allOf": [
    {
      "$ref": "#/$defs/https%3A%2F%2Fjsonschema.dev%2Ftest.json"
    }
  ]
}

I don't like it, but I can't think of an alternative solution that doesn't mess with other things (such as "the value of $ref can now also be a schema", which I think will cause ENDLESS confusion).

@Relequestual
Copy link
Member

I'm a moron. Let me update this comment before reply. Spoke to Henry. It's obvious and I'm dumb.

@Relequestual
Copy link
Member

My example was a really bad example.
We do need to modify the $ref reference because the $id in the transcluded schema can be used as the identifer string directly!

It's kind of obvious, we (I) just forgot because you can't do it within a SINGLE resource anymore.
So...

{
  "$defs": {
    "https://jsonschema.dev/test.json": {
      "$id": "https://jsonschema.dev/test.json",
      "title": "The transcluded test schema..."
    }
  },
  "allOf": [
    {
      "$ref": "https://jsonschema.dev/test.json"
    }
  ]
}

The key of the definition COULD be anything. It doesn't matter.
We can suggest making sure it doesn't clash, and using "meh whatever just something else" if it does. (A UUID would be fine I guess).

Relequestual added a commit to Relequestual/json-schema-spec that referenced this issue Oct 6, 2020
Define Compound Schema Document and associated concerns
Relequestual added a commit that referenced this issue Oct 25, 2020
Define compound schema documents (#936)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants