Skip to content

Profiles and Sets

Austin Wright edited this page Feb 14, 2021 · 4 revisions

When dealing with classifications of items to categories, note the distinction of a "set" (in the mathematical sense) and a "profile" (in a media-type sense).

Types of taxonomy relationships

In JSON Schema, we can classify an instance as a member of a category in one of two ways:

  1. Sets - Every JSON Schema has an associated set of valid instances, the set of all JSON documents that are valid against the schema. In the case of true or {}, this is the same as the set of all JSON documents. In the case of false or {not:{}} or {type:[]}, this is the empty set.
  2. Collections - A collection is a closed set of documents. The collection defines which documents are a part of its collection. This is a similar behavior to profiles below.
  3. Profiles - A profile is a set of documents with a profile relationship to the profile. A JSON document has a profile only when explicitly indicated as such. A profile may additionally contain restrictions on what data the JSON document may have. That is, you can assert that members of a profile must also valid against a schema.

These forms of association exhibit different behaviors.

Sets

Sets are motivated by the mathematical concept of a set and members of a set.

An instance in a set is uniquely identified by its contents.

JSON Schema can be used to create subsets - in which every member of a subset is by definition also a member of the superset.

If X and Y are sets described by schemas, where Y is a subset of X, then any instance of Y will also be an instance of X.

Profiles

Profiles are similar in function to media types. A media type is a short, standardized string that associates with a string of octets to form a document. Profiles are somewhat more abstract as a concept.

Profiles are often used to tell others that the document carries certain semantics that are only meaningful when talking about that subset of documents. For example, asserting that a document is about a person, and therefore will describe that person's name, contact information, etc. Using a profile tells you what the document means, rather than how the document is structured (perhaps with ambiguous meanings). (See also "polyglot programming" where a program is valid in two different languages: often one benign, and one malicious.)

An instance of a profile can determine how it is uniquely identified; frequently a URI is used as an identifier, and there can be multiple instances with identical data.

A resource can be described my multiple profiles. Rules can be applied to profiles and implications made about their membership. If X and Y are profiles, where Y is a subclass of X, any instance of Y is also an instance of X.

Sometimes we want to use JSON to describe properties only when they're an instance of a profile, using this subclass logic. For example, instances of X have property A, and A is only found in instances of X; instances of Y have property B, and property B is only found in instances of Y.

How do we describe data like this? There's a few options:

  1. Use one JSON document per profile per resource. If I have a resource Q that is an instance of Y, then have two JSON documents <Q.X.json> and <Q.Y.json>.

  2. Allow additionalItems on a document, and ignore unknown properties. List the document having every profile, including superclasses (though it might sometimes be possible to omit superclasses or any profiles implied from other profiles). Properties across profiles must not overlap.

  3. Prohibit additionalItems on a document. List the document as having only one media-type profile. Duplicate properties from superclasses, and optionally specify each property as copied/inherited from that superclass.

Solutions

How do we solve this?

Option for additional keywords

  1. Create a keyword that explicitly creates a subclass/superclass relationship. "properties" and some other keywords would get imported to the current document according to a well-defined behavior.

  2. Create a keyword that specifies a property matches (was inherited from) a property in another schema.

  3. Keyword to the current instance against another schema, with special instructions to ignore all properties not from a certain list - side-stepping additionalProperties: false if it exists.