This file provides a guide for contributing to the Stencila Schema definitions and associated documentation. For tips on developing the programming language bindings please see the relevant README files for the language package.
For each type in this schema, there are two canonical files from which code and documentation is generated:
-
<type>.schema.yamlis the JSON Schema, written in YAML, for the type. -
<type>.mddocumentation for the type including examples and a description of design considerations for the schema
The schema for a type is defined, using JSON Schema, in a <type>.schema.yaml file. We use YAML, which is a superset of JSON, because it is more readable than JSON.
See the excellent Understanding JSON Schema for guides on writing a JSON Schema. The following sections describe the most important, and custom, keywords in the schema in the order that they normally appear.
Each type schema MUST begin with the title of the type e.g.
title: OrganizationThis is a custom keyword used to define a term with the vocabulary of the Stencila JSON-LD @context.
Where possible use terms from existing vocabularies. Currently, the Stencila context allows you to refer to the following external vocabularies:
schema: https://schema.org/bioschemas: http://bioschemas.orgcodemeta: https://doi.org/10.5063/schema/codemeta-2.0
You MUST declare the @id keyword for each type using the format <context>:<type>. Note that because this property name begins with the special character @, that it needs to be surrounded by quotes e.g.
'@id': schema:PersonUse existing type names from other vocabularies as much as possible. For example, the type schema to represent a laboratory protocol might use the @id of the Bioschemas LabProtocol.
'@id': bioschemas:LabProtocolWhen a type is not represented in another vocabulary, or has a sufficiently different structure to a similar type elsewhere, define the id within the Stencila context i.e. '@id': stencila:<type>
You MUST declare the @id keyword for each property of a type using the format <context>:<property>.
Often, the @id will be the same as the property name. However, you should reuse property names from other vocabularies where possible. For example, the Person type schema has a property givenNames (not the plural) which is an array of strings.
givenNames:
'@id': schema:givenName
type: array
items: stringBy declaring the @id of that property as schema:givenName we are saying "within this vocabulary, when we use the term 'givenNames', we mean the same as http://schema.org/givenName".
Sometimes, a property name is not represented in another vocabulary. In these casese, define the property name as a new term within the Stencila vocabulary i.e. '@id': stencila:<property>
This is a custom keyword which allows your type schema to inherit the properties and required keywords of a parent type schema. It should be the name of another type e.g.
extends: EntityA recommended custom keyword to indicate the role of the type schema:
base: base types, not usually instantiated but required for other types e.gEntityorThingprimary: types that are usually the root of a tree generated from a file e.g.Article,Datatable,Collectionsecondary: types usually only referred to by primary types e.g.Organizationis used for thepublisherproperty on aArticletertiary: types usually only referred to by secondary types e.g.ContactPointis used for thecontactPointsproperty on anOrganization
A recommended custom keyword to indicate the development status of a type schema e.g.
experimental: extension types (i.e. not defined on schema.org or elsewhere) that are still under development and for which the likelihood of breaking changes is relatively highunstable: types that are defined elsewhere (e.g. on http://bioschemas.org) but for which the schema definition is still being developed; breaking changes are possible but less likely thanexperimentaltypesstable: types for which the schema definition can be considered stable and breaking changes unlikely.
If a type schema is marked as experimental it will not be published as being part of the schema. This is to avoid breaking changes, and thus new major version numbers, to the schema as a whole.
When a type is promoted from experimental to unstable or stable, the change should be associated with a feat commit to increment the minor version number.
Expansion of type schemas with new properties and other non-breaking changes is allowed. Renaming or removal of schema classes or any of their properties is considered a breaking change and should be done with careful consideration. If any such changes need to be made,they must first be marked as deprecated but not removed. Once a major version is to be released, properties and helpers marked as deprecated should be removed all at once.
You must add a description for all types and properties. Descriptions must be plain text and less than 120 characters. We apply this rule so that descriptions can be rendered in a variety on contexts including documentation strings in a variety of languages. If you need to add more details, or want to use Markdown, put it in the $comment property.
A custom keyword which allows you to define aliases for properties. For example,
properties:
...
familyNames:
'@id': schema:familyName
aliases:
- familyName
- surname
- surnames
- lastName
- lastNamesA custom keyword which allow you to define allowable shorthand strings for a property or type. The parser keyword can be used by validators to coerce strings into more complex objects e.g. array, Person.
Parsers always take a string but differ in the type that they produce, for example:
ssi: decodes a space separated list of items to an array of stringscsi: decodes a comma separated list of items to an array of stringsperson: decodes a personal name, email or url to aPerson
You can specify a parser for both types and properties. To specify a parser for a type, add the parser keyword at the top level e.g.
title: Person
---
parser: personSpecifying a parser at the type means that it will be used to attempt to coerce a string to that type. For example, a CreativeWork has a authors property which could be specified in the schema like this:
authors:
'@id': schema:author
description: The authors of this creative work.
type: array
items:
$ref: PersonThis allows users to enter authors of a CreativeWork as strings, but have them coerced, to structured, semantic content. For example, Encoda will convert the following simple YAML description of a work,
type: CreativeWork
authors:
- Jan Jones
- Bob C. Adams <bob@example.org>intro this JSON:
{
"type": "Article",
"authors": [
{
"type": "Person",
"familyNames": ["Jones"],
"givenNames": ["Jan"]
},
{
"type": "Person",
"emails": ["bob@example.org"],
"familyNames": ["Adams"],
"givenNames": ["Bob", "C."]
}
]
}You can specify a parser for a property using anyOf. For example, to allow givenNames to de provided as either a space separated values string or as an array of strings.
title: Person
...
properties:
...
givenNames:
...
anyOf:
- parser: ssi
- type: array
items:
type: stringsThe anyOf keyword is one of the ways to combine JSON Schemas. It is used to specify alternatives for the type of a property: data must be valid against one or more of the given subschemas.
Usually, the order that subschemas are listed under anyOf is of no importance. However, in the Stencila Schema, it is. That is because when coercing data to the schema, the JSON Schema validator that we use in Encoda, Ajv, will attempt to coerce to each type in the order that they appear in anyOf. That generally means that more complex types should be listed before more simple types. For example, number should be before boolean, and string should generally come last in all circumstances.
Documentation for a schema is written in a markdown file, named to match the corresponding schema. For example, schema/Link.schema.yaml is documented in schema/Link.md. For the rest of this document we will refer to the Link schema but this should be replaced with the name of whatever schema you are creating.
Each documentation file should contain a title, list of authors, schema property documentation table and some examples of the content. Much of this can be automatically generated with encoda once an empty documentation Markdown file has been created, and there are some standard text that can just be included into the file.
A Markdown file containing a table of the schema properties for your new schema can be generated by running npm run docs:build. Then, you can create a markdown file next to the schema (i.e. schema/<type>.md).
The public properties table can be included in the file with an include directive:
include: ../public/<type>.schema.md
:::
:::Process the file with encoda to fill in this table and generate the metadata heading.
$ npx encoda process schema/<type>.mdYou can now complete the title (it should match the <type>) and authors section.
The documentation should start with a level one header with the and then a short description of how to use the type.
Following the introduction, include some examples of the implementation of this schema. With the use of encoda it is
possible to automatically generate these examples from a single implementation.
Start by creating a code block with the example in JSON format (this example will refer to the Link.md documentation).
Make sure the type is json and add import=... argument to name the code chunk as a variable.
// extra spaces added between backticks for escaping, these are not necessary
`` `json import=ex1 { "type": "Link", "content": ["Stencila’s website"], "target": "https://stenci.la" }` ``Now the code chunk can be referred to with the variable name given (in this case, ex1) with the export argument. To
convert to HTML, for example, just create a code chunk with the format and exported variable name:
// extra spaces added between backticks for escaping, these are not necessary
`` `html export=ex1` ``Now, run encoda process again:
$ npx encoda process schema/Link.mdAnd any exported code chunks should be automatically converted and included:
// extra spaces added between backticks for escaping, these are not necessary
`` `html export=ex1 <a href="https://stenci.la">Stencila’s website</a>` ``This can be done for most types supported by encoda. In some instances where full documents are generated (e.g. Docx
or ODF) you should instead export to a standalone file, like this:
[`odt`](link-ex1.out.odt){export=ex1}The file link-ex1.out.odt will be created.
When writing Markdown examples within a Markdown file, a sequence of three (n) backticks can be escaped by enclosing it four (n+1) backticks e.g.
````
```py
a + 6 * b
```
````
Each example should include a little bit about what the element has been converted to. It is also good to include a link to a definition of which element a schema encodes to. For example, for Markdown examples, link to an MDAST definition. For HTML definitions, developer.mozilla.org is a good resource.
The schema/Link.md and schema/Delete.md are good examples of documentation that can be referred to when writing your
own.
We derive language binding files such as types.ts, types.py, and types.R from the schema YAML files.
Despite those files being auto-generated, there are a couple of reasons why it is useful to include them in this repository.
- Having them so that packages can be installed from the Git repo e.g.
devtools::install_github("stencila/schema", subdir = "r", upgrade = "ask")pip install -e git+https://github.com/stencila/schema.git#subdirectory=python- Allows us to more easily compare changes to the bindings and potentially identify issues.
However, since these files are derived from the YAML files, there is potential risk of the type bindings and schema YAML to drift out of sync.
One remedy we have in place is to use git hooks. This allows us to automatically build the schema, detects changes in the bindings, and commit them prior to each push command.
To see the exact steps being performed, please see the check-bindings command in the Makefile.
If you don't want to install npm on your host directly, you can build the provided
Dockerfile:
make build-imageThen bind your local working directory and run commands from inside the container. For example, here is how we would run make docs
$ make run-image
# Now we are inside the container...
> make docsCommit messages should follow the conventional commits specification. This important because commit messages are used to determine the semantic version of releases and to generate the project's CHANGELOG.md. If appropriate, use the sentence case theme name as the scope (to help make both git log and the CHANGELOG more readable). Some previous examples,
fix(BlockContent): Add Figure and Collection as valid typesfix(R): Fix and improve generated bindingsfeat(Elife): Use eLife corresponding author envelope iconfeat(MathBlock): Add label propertyfeat(Python bindings): Add node_type utility functionfeat(Typescript factory functions): Only first required prop is unnameddocs(CreativeWork): Add some referencesdocs(Python): Add doc string to types.py