schema/CONTRIBUTING.md at master · stencila/schema

Contributing

Overview

This file provides a guide for contributing to the Stencila Schema definitions and associated documentation. For tips on developing the programming language bindings please see the relevant README files for the language package.

For each type in this schema, there are two canonical files from which code and documentation is generated:

<type>.schema.yaml is the JSON Schema, written in YAML, for the type.
<type>.md documentation for the type including examples and a description of design considerations for the schema

Writing `<type>.schema.yaml` files

The schema for a type is defined, using JSON Schema, in a <type>.schema.yaml file. We use YAML, which is a superset of JSON, because it is more readable than JSON.

See the excellent Understanding JSON Schema for guides on writing a JSON Schema. The following sections describe the most important, and custom, keywords in the schema in the order that they normally appear.

The `title` keyword

Each type schema MUST begin with the title of the type e.g.

title: Organization

The `@id` keyword

This is a custom keyword used to define a term with the vocabulary of the Stencila JSON-LD @context.

Where possible use terms from existing vocabularies. Currently, the Stencila context allows you to refer to the following external vocabularies:

schema: https://schema.org/
bioschemas: http://bioschemas.org
codemeta: https://doi.org/10.5063/schema/codemeta-2.0

Type `@id`s

You MUST declare the @id keyword for each type using the format <context>:<type>. Note that because this property name begins with the special character @, that it needs to be surrounded by quotes e.g.

'@id': schema:Person

Use existing type names from other vocabularies as much as possible. For example, the type schema to represent a laboratory protocol might use the @id of the Bioschemas LabProtocol.

'@id': bioschemas:LabProtocol

When a type is not represented in another vocabulary, or has a sufficiently different structure to a similar type elsewhere, define the id within the Stencila context i.e. '@id': stencila:<type>

Property `@id`s

You MUST declare the @id keyword for each property of a type using the format <context>:<property>.

Often, the @id will be the same as the property name. However, you should reuse property names from other vocabularies where possible. For example, the Person type schema has a property givenNames (not the plural) which is an array of strings.

givenNames:
  '@id': schema:givenName
  type: array
  items: string

By declaring the @id of that property as schema:givenName we are saying "within this vocabulary, when we use the term 'givenNames', we mean the same as http://schema.org/givenName".

Sometimes, a property name is not represented in another vocabulary. In these casese, define the property name as a new term within the Stencila vocabulary i.e. '@id': stencila:<property>

The `extends` keyword

This is a custom keyword which allows your type schema to inherit the properties and required keywords of a parent type schema. It should be the name of another type e.g.

extends: Entity

The `role` keyword

A recommended custom keyword to indicate the role of the type schema:

base: base types, not usually instantiated but required for other types e.g Entity or Thing
primary: types that are usually the root of a tree generated from a file e.g. Article, Datatable, Collection
secondary: types usually only referred to by primary types e.g. Organization is used for the publisher property on a Article
tertiary: types usually only referred to by secondary types e.g. ContactPoint is used for the contactPoints property on an Organization

The `status` keyword

A recommended custom keyword to indicate the development status of a type schema e.g.

experimental: extension types (i.e. not defined on schema.org or elsewhere) that are still under development and for which the likelihood of breaking changes is relatively high
unstable: types that are defined elsewhere (e.g. on http://bioschemas.org) but for which the schema definition is still being developed; breaking changes are possible but less likely than experimental types
stable: types for which the schema definition can be considered stable and breaking changes unlikely.

If a type schema is marked as experimental it will not be published as being part of the schema. This is to avoid breaking changes, and thus new major version numbers, to the schema as a whole.

When a type is promoted from experimental to unstable or stable, the change should be associated with a feat commit to increment the minor version number.

Expansion of type schemas with new properties and other non-breaking changes is allowed. Renaming or removal of schema classes or any of their properties is considered a breaking change and should be done with careful consideration. If any such changes need to be made,they must first be marked as deprecated but not removed. Once a major version is to be released, properties and helpers marked as deprecated should be removed all at once.

The `description` keyword

You must add a description for all types and properties. Descriptions must be plain text and less than 120 characters. We apply this rule so that descriptions can be rendered in a variety on contexts including documentation strings in a variety of languages. If you need to add more details, or want to use Markdown, put it in the $comment property.

The `aliases` keyword

A custom keyword which allows you to define aliases for properties. For example,

properties:
  ...
  familyNames:
    '@id': schema:familyName
    aliases:
      - familyName
      - surname
      - surnames
      - lastName
      - lastNames

The `parser` keyword

A custom keyword which allow you to define allowable shorthand strings for a property or type. The parser keyword can be used by validators to coerce strings into more complex objects e.g. array, Person.

Parsers always take a string but differ in the type that they produce, for example:

ssi: decodes a space separated list of items to an array of strings
csi: decodes a comma separated list of items to an array of strings
person: decodes a personal name, email or url to a Person

You can specify a parser for both types and properties. To specify a parser for a type, add the parser keyword at the top level e.g.

title: Person
---
parser: person

Specifying a parser at the type means that it will be used to attempt to coerce a string to that type. For example, a CreativeWork has a authors property which could be specified in the schema like this:

authors:
  '@id': schema:author
  description: The authors of this creative work.
  type: array
  items:
    $ref: Person

This allows users to enter authors of a CreativeWork as strings, but have them coerced, to structured, semantic content. For example, Encoda will convert the following simple YAML description of a work,

type: CreativeWork
authors:
  - Jan Jones
  - Bob C. Adams <bob@example.org>

intro this JSON:

{
  "type": "Article",
  "authors": [
    {
      "type": "Person",
      "familyNames": ["Jones"],
      "givenNames": ["Jan"]
    },
    {
      "type": "Person",
      "emails": ["bob@example.org"],
      "familyNames": ["Adams"],
      "givenNames": ["Bob", "C."]
    }
  ]
}

You can specify a parser for a property using anyOf. For example, to allow givenNames to de provided as either a space separated values string or as an array of strings.

title: Person
...
properties:
  ...
  givenNames:
    ...
    anyOf:
      - parser: ssi
      - type: array
        items:
          type: strings

The `anyOf` keyword

The anyOf keyword is one of the ways to combine JSON Schemas. It is used to specify alternatives for the type of a property: data must be valid against one or more of the given subschemas.

Usually, the order that subschemas are listed under anyOf is of no importance. However, in the Stencila Schema, it is. That is because when coercing data to the schema, the JSON Schema validator that we use in Encoda, Ajv, will attempt to coerce to each type in the order that they appear in anyOf. That generally means that more complex types should be listed before more simple types. For example, number should be before boolean, and string should generally come last in all circumstances.

Writing `<type>.md` documentation files

Documentation for a schema is written in a markdown file, named to match the corresponding schema. For example, schema/Link.schema.yaml is documented in schema/Link.md. For the rest of this document we will refer to the Link schema but this should be replaced with the name of whatever schema you are creating.

Each documentation file should contain a title, list of authors, schema property documentation table and some examples of the content. Much of this can be automatically generated with encoda once an empty documentation Markdown file has been created, and there are some standard text that can just be included into the file.

Generating skeleton

A Markdown file containing a table of the schema properties for your new schema can be generated by running npm run docs:build. Then, you can create a markdown file next to the schema (i.e. schema/<type>.md).

The public properties table can be included in the file with an include directive:

include: ../public/<type>.schema.md
:::
:::

Process the file with encoda to fill in this table and generate the metadata heading.

$ npx encoda process schema/<type>.md

You can now complete the title (it should match the <type>) and authors section.

Intro Paragraph

The documentation should start with a level one header with the and then a short description of how to use the type.

Examples

Following the introduction, include some examples of the implementation of this schema. With the use of encoda it is possible to automatically generate these examples from a single implementation.

Start by creating a code block with the example in JSON format (this example will refer to the Link.md documentation). Make sure the type is json and add import=... argument to name the code chunk as a variable.

// extra spaces added between backticks for escaping, these are not necessary
`` `json import=ex1 { "type": "Link", "content": ["Stencila’s website"], "target": "https://stenci.la" }` ``

Now the code chunk can be referred to with the variable name given (in this case, ex1) with the export argument. To convert to HTML, for example, just create a code chunk with the format and exported variable name:

// extra spaces added between backticks for escaping, these are not necessary
`` `html export=ex1` ``

Now, run encoda process again:

$ npx encoda process schema/Link.md

And any exported code chunks should be automatically converted and included:

// extra spaces added between backticks for escaping, these are not necessary
`` `html export=ex1 <a href="https://stenci.la">Stencila’s website</a>` ``

This can be done for most types supported by encoda. In some instances where full documents are generated (e.g. Docx or ODF) you should instead export to a standalone file, like this:

[`odt`](link-ex1.out.odt){export=ex1}

The file link-ex1.out.odt will be created.

Markdown Examples

When writing Markdown examples within a Markdown file, a sequence of three (n) backticks can be escaped by enclosing it four (n+1) backticks e.g.

````
```py
a + 6 * b
```
````

Example Metadata

Each example should include a little bit about what the element has been converted to. It is also good to include a link to a definition of which element a schema encodes to. For example, for Markdown examples, link to an MDAST definition. For HTML definitions, developer.mozilla.org is a good resource.

Documentation Examples

The schema/Link.md and schema/Delete.md are good examples of documentation that can be referred to when writing your own.

Autogenerated Schema Bindings

We derive language binding files such as types.ts, types.py, and types.R from the schema YAML files.

Despite those files being auto-generated, there are a couple of reasons why it is useful to include them in this repository.

Having them so that packages can be installed from the Git repo e.g.

devtools::install_github("stencila/schema", subdir = "r", upgrade = "ask")

pip install -e git+https://github.com/stencila/schema.git#subdirectory=python

Allows us to more easily compare changes to the bindings and potentially identify issues.

However, since these files are derived from the YAML files, there is potential risk of the type bindings and schema YAML to drift out of sync.

One remedy we have in place is to use git hooks. This allows us to automatically build the schema, detects changes in the bindings, and commit them prior to each push command.

To see the exact steps being performed, please see the check-bindings command in the Makefile.

Developing with Docker

If you don't want to install npm on your host directly, you can build the provided Dockerfile:

make build-image

Then bind your local working directory and run commands from inside the container. For example, here is how we would run make docs

$ make run-image
# Now we are inside the container...
> make docs

Committing

Commit messages should follow the conventional commits specification. This important because commit messages are used to determine the semantic version of releases and to generate the project's CHANGELOG.md. If appropriate, use the sentence case theme name as the scope (to help make both git log and the CHANGELOG more readable). Some previous examples,

fix(BlockContent): Add Figure and Collection as valid types
fix(R): Fix and improve generated bindings
feat(Elife): Use eLife corresponding author envelope icon
feat(MathBlock): Add label property
feat(Python bindings): Add node_type utility function
feat(Typescript factory functions): Only first required prop is unnamed
docs(CreativeWork): Add some references
docs(Python): Add doc string to types.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing

Overview

Writing `<type>.schema.yaml` files

The `title` keyword

The `@id` keyword

Type `@id`s

Property `@id`s

The `extends` keyword

The `role` keyword

The `status` keyword

The `description` keyword

The `aliases` keyword

The `parser` keyword

The `anyOf` keyword

Writing `<type>.md` documentation files

Generating skeleton

Intro Paragraph

Examples

Markdown Examples

Example Metadata

Documentation Examples

Autogenerated Schema Bindings

Developing with Docker

Committing

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Overview

Writing <type>.schema.yaml files

The title keyword

The @id keyword

Type @ids

Property @ids

The extends keyword

The role keyword

The status keyword

The description keyword

The aliases keyword

The parser keyword

The anyOf keyword

Writing <type>.md documentation files

Generating skeleton

Intro Paragraph

Examples

Markdown Examples

Example Metadata

Documentation Examples

Autogenerated Schema Bindings

Developing with Docker

Committing

Writing `<type>.schema.yaml` files

The `title` keyword

The `@id` keyword

Type `@id`s

Property `@id`s

The `extends` keyword

The `role` keyword

The `status` keyword

The `description` keyword

The `aliases` keyword

The `parser` keyword

The `anyOf` keyword

Writing `<type>.md` documentation files