Skip to content

Metadata Proposal for Docs #166

Open
@ovflowd

Description

FYI: This Description is Outdated! (Need update)

As discussed in our Collaborator Summit 2022 edition, we discussed a series of proposals within the current way we structure the metadata of our API docs. This proposal will eventually deprecate specific proposed changes here.

Within this issue, we will adhere to naming the proposal as an "API metadata proposal" for all further references.


The API Metadata Proposal

Proposal Demo: https://github.com/ovflowd/node-doc-proposal

Introduction

What is this proposal about? Our API docs currently face a few issues varying from maintainability and organization to the extra tooling required to make it work. These are namely the following:

  • The current infrastructure for doc generation is non-standard and not easy to contribute/update for newcomers as it does complex ASTs with unified. Making it harder to debug, update or change how things are done
  • We use a specifically crafted Remark Plugin (and ESLint config) to make some non-conforming rules work. Ultimately the ESLint plugin is neither ensuring that certain things are valid Markdown.
  • Our API docs use non-conforming Markdown, which is incompatible and not standard. As most of the Markdown parsers and linters are becoming stricter, eventually (and already for specific parsers such as MDX), it will fail. Namely, for example, our inline YAML snippets are also not validated. Hence, some have "invalid" YAML syntax.
  • We require our infrastructure to interpolate content from Markdown and guess what is being done. For example, to get the Stability Index, the Level of Heading, or if the section refers to a class or method.
  • Some Markdown files are way too big. This outright makes the build process complex, and some pages become massive for the Web, being unreasonable for metered internet connections.
    • Not to mention that from a maintainability standpoint, this is unfeasible.
  • This proposal will also achieve better-generated doc metadata that can be used by projects such as TypeScript
  • This proposal will also allow Internationalisation to be done as the metadata is separated from the actual Markdown files.

There are many other issues within the current API docs, from non-standard conventions to ensure that rules are appropriately made, from maintaining those files to creating sustainable docs that are inclusive for newcomers and well detailed.

The Proposal

This proposal, at its core, boils down to 4 simple changes:

  • All the actual API structure/metadata gets extracted to dedicated YAML files
    • Each YAML file has its corresponding Markdown file
    • E.g., doc/api/modules/fs/promises.metadata.yml has doc/api/modules/fs/promises.en.content.md
  • The folder structure for API docs gets updated in a tree fashion for the modules
    • Each class has its YAML and Markdown file
    • TL;DR files are broken down into their minimal section (being a class)
  • Markdown file is responsible for:
    • Descriptions
    • Introductions
    • Examples
    • References
    • Real-world usages

Re-structuring the existing file directory

In this proposal, the tree of files gets updated by adopting a node approach (pun intended) for how we structure the files of our API docs and how we name them.

Notably, these are the significant changes:

  • The nature of a file categorizes the top-level folders; for example, anything related to a Node.js module will reside within modules. Globals, will, for example, reside within globals
    • There's no concrete list of all the possible-top level folders for now; for example, "About this documentation," "How to install Node.js," or another kind of general Documentation related to Node.js would probably not fit on any of these folders. A suggestion would be a misc folder, but this is open for debate as this is not a crucial point.
  • The second level of folders, in the case of modules, is the name of the module (top-level) import. For example, "File Systems" would be "fs" Resulting in doc/api/modules/fs
  • Any other level of sub-directories would be a sub-namespace of the module. For example, node:fs/promises would be doc/api/modules/node/fs/promises.
  • Finally, the last level would be the name of a Class e.g., doc/api/modules/node/fs/promises/file-handle.yaml, Whereas for the promises import itself, it would be doc/api/modules/node/fs/promises.yaml
    • You will notice in the first case promises is a folder and in the second a YAML file; that's because we're following a Node approach, just like a Binary-Tree.

Accomplishing this change

This can be quickly done by an automated script that will break down files and generate files. Using a script for tree shaking and creating this node approach would, in the best scenarios, work for all the current files existing on our doc/api and, worst case scenario 98% of the files, based on the consistency of adoption and how modules are following these patterns.

Extracting the metadata

As mentioned before, the Markdown files should be clean from the actual Metadata, only containing the Description, Introduction (when needed), Examples (both for CJS and MJS) and more in-depth details of when this class/method should be used, and external references that might be useful.

Extracting the metadata allows our contributors and maintainers to focus on writing quality documentation and not get lost in the specificities of the metadata.

What happens with the extracted metadata?

It will be added to a dedicated YAML file containing all the metadata of a particular class, for example. (We created a new tooling infrastructure that would facilitate this on being done here.

The metadata structure will be explained in another section below.

The extraction and categorization process can be automated for all modules and classes, reducing (and erasing) the manual work needed to adopt this proposal.

Enforcing the Adoption of best practices

The actual content of the Markdown files will be "enforced" for Documentation reviewers and WGs for specific Node.js parts, possibly by the adoption of this PR.

The Metadata (YAML) schema

Similarly to the existing YAML schema, it would namely be structured as this:

name: 'api/modules/crypto/certificate'
source: "lib/crypto.js"
stability: stable
tags:
  - "certificates"
  - "digital certificates"
history:
  - type: added
    versions: [v0.11.8]
methods:
  - name: exportChallenge
    stability: deprecated
    static: true
    history:
      - type: added
        versions: [v9.0.0]
        pullRequest: "https://github.com/nodejs/node/pull/35093"
        details: "crypto.certificate.method.exportChallenge.history.[0].details"
    params:
      - name: spkac
        optional: false
        types:
          - String
          - ArrayBuffer
          - Buffer
          - TypedArray
          - DataView
      - name: encoding
        details: "crypto.certificate.method.exportChallenge.params.[1].details"
        optional: true
        types:
          - String
        defaults:
          - "UTF-8"
    returns:
      - type: Buffer
        details: "crypto.certificate.method.exportChallenge.returns.[0].details"
constants:
  - name: S_IXUSR
    import: "fs.constants.S_IXUSR"

The structure above allows easily to structure and organise the metadata of each method available within a Class and quickly describe the types, return types, parameters and history of a method, Class, or anything related.

I18n and ICU on YAML files

The structure is also I18N friendly, as precise text details that should not be defined within the Markdown file can be easily referenced using the ICU format. These details can be accessed on files that match the same level of a specific module. For the example above, for example, doc/api/modules/node/fs/promises.en.i18n.json contains entries that follow the ICU format such as:

{
  "fs.promises.tags": ["writing files", "creating files", "file systems"],
  "fs.promises.method.lchmod.returns.[0].details": "The lchmod method returns a Boolean when specific parameters are ....",
  ...
}

Specification Table

The table below demonstrates the entire length of the proposed YAML schema.

Note.: All the properties of type Enum will have their possible values discussed in the future, as this is just a high-level specification proposal.

Top Level Properties

Field Optional Type Description
name No String The Heading ID identifier for that module, should usually be the path of module on the doc folder.
import No String The canonical import of the module (i.e. the string used to import this class/module). This will generate on CJS/MJS imports usages
stability No Enum The Stability of a given module. It follows the widely adopted "Stability Index" from our existing docs.
tags Yes Lang ID A translation ID for tags used to identify or help users to find this method with Search engines.
history Yes Array<History> An array of history entries to decorate the notable historical changes of that module
methods Yes Array<Method> The methods of that class/module
constants Yes Array<Constant> If the Language is enabled and currently supported by the website. It should only be enabled if both the I18n team and Nodejs.dev team agrees that sufficient content for that page was translated.
source Yes String The path to the source of that class/module

History

Field Optional Type Description
type No Enum The type of the change
pullRequest Yes String An optional Pull Request for the related landed change
issue Yes String An optional Issue link for the related landed change
details Yes Lang ID A translation ID for extra short details of that change. Actual details should usually link to a PR or Issue
versions Yes Array<String> An array containing the versions this change impacted initially
when Yes String A date string following the ISO-8601 (https://en.wikipedia.org/wiki/ISO_8601)

Method

Field Optional Type Description
name No String The Heading ID identifier for the method. It should also reflect to the actual name that is imported
stability No Enum The Stability of a given module. It follows the widely adopted "Stability Index" from our existing docs.
tags Yes Lang ID A translation ID for tags used to identify or help users to find this method with Search engines
history Yes Array<History> An array of history entries to decorate the notable historical changes of that method
returns No Array<ReturnType|Enum> An array containing the return types of the method
params Yes Array<MethodParam> An array containing the parameters of the method

MethodParam

Field Optional Type Description
name No String The name of the parameter of the method
optional No Boolean If the parameter is optional or not
defaults Yes Array<ParameterDefault> An array containing the default values of the Parameter
types No Array<ParameterType|Enum> An array containing the types of the Parameter

ReturnType, ParameterType, ParameterDefault

Field Optional Type Description
details Yes Lang ID A Translation ID for the details of this return type
type No Enum The type of the return type

Incorporating the Metadata within the Markdown files

As each Class has numerous methods (possibly constants) and more, the parser needs to know where to attach the data within the final generated result when, for example, building for the web.

This would be quickly done by using Markdown compatible Heading IDs

# File Systems {#api/modules/node/fs/promises}

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Quisque non tellus orci ac. Maecenas accumsan lacus vel facilisis volutpat est velit egestas. Placerat in egestas erat imperdiet sed euismod. Egestas maecenas pharetra convallis posuere morbi leo urna molestie at. Ultricies mi eget mauris pharetra et ultrices neque ornare aenean. Sodales ut etiam sit amet nisl purus in. Nunc pulvinar sapien et ligula ullamcorper malesuada. Pulvinar neque laoreet suspendisse interdum. Lectus proin nibh nisl condimentum id. Habitant morbi tristique senectus et netus et malesuada fames ac. Nulla porttitor massa id neque aliquam vestibulum morbi.

## Method: LCHMOD {#lchmod}

Curabitur gravida arcu ac tortor dignissim convallis. Urna id volutpat lacus laoreet non curabitur. Sem integer vitae justo eget. Amet purus gravida quis blandit. Posuere urna nec tincidunt praesent semper feugiat nibh sed pulvinar. Nunc eget lorem dolor sed viverra ipsum nunc. Dignissim cras tincidunt lobortis feugiat. Maecenas pharetra convallis posuere morbi leo. Volutpat lacus laoreet non curabitur gravida arcu. Leo a diam sollicitudin tempor id.

....

The parser would map the Heading IDs to each YAML entry's name fields to the associated Heading ID. Allowing you to write the Heading as you wish by still keeping the Heading ID intact.

Naming for Markdown files

To ensure that we have a 1:1 mapping between YAML and Markdown, the Markdown files should reside in the same folder as the YAML ones and have the same name, the only difference being the Markdown files have the .md extension in lowercase. They're suffixed by their languages e.g. .en.md.

Note.: By default, the Markdown files will default to .en.md extension.

The Build Process

Generating the final result in a tangible readable format for Humans and IDE's is no easy feat.

The new tooling build process would consist of two different outputs:

  • Generating JSON files from the YAML metadata.
    • These are namely used for JSDocs or IDE scanning/IntelliSense, such as TypeScript (cc @nodejs/typescript)
  • Generating MDX Buffers that our Websites can use
    • MDX is a JSX-in-Markdown format that allows us to insert Reactive-Components within our Codebase
    • The idea here is, during the build process, to generate a Buffer that is the combination of the plain Markdown + React Components that are used to render the Metadata.
    • This is more tooling required for the end-users of the documentation and is also helpful in previewing the documentation. This must be discussed on a separate Issue to address topics such as:
      • Where should the tooling reside
      • How to generate documentation previews just containing the documentation (not the whole website) and also allow generating docs only of what you changed (e.g., generating previews of a specific file)
      • How would be the categorization of the files
      • How would the links for the files and redirects from the old API schema to the new one

Example of the file structure

An essential factor in easing the visualization of how this proposal would change the current folder structure is to show an example of how it would look with all the changes applied. The snippet below is an illustration of how it would look.

Note.: The root directory below would be doc/api.

├── api
│   ├── en.navigation.md
│   ├── documentation.en.content.mdx
│   ├── modules
│   │   ├── en.navigation.md
│   │   ├── fs
│   │   │   ├── en.navigation.md
│   │   │   ├── index.metadata.yml
│   │   │   ├── index.en.content.md
│   │   │   ├── promises.metadata.yml
│   │   │   ├── promises.en.content.md
│   │   │   └── ...
│   │   ├── streams
│   │   ├── crypto
│   │   │   ├── en.navigation.md
│   │   │   ├── webcyrpto.metadata.yml
│   │   │   ├── webcyrpto.en.i18n.json
│   │   │   └── webcrypto.en.content.md
│   │   └── ...
│   ├── globals
│   ├── others
│   ├── packages.en.content.md
│   └── ...
└── ...

The Navigation (Markdown) Schema

Navigating through the API docs is as essential as displaying the content correctly. The idea here is to allow each module to define its Navigation entries and then generate the whole Navigation by aggregating all the navigation files.

Book of Rules for the Navigation System

  • The Navigation file is made in Markdown and has a reserved name (navigation.md)
  • A navigation file can be on any sub-level of any directory
  • Navigation files are not imported automatically
  • The build-tools specify the main Navigation file (e.g.: build-docs --navigation-entry=doc/api/v18/navigation.md)
  • The order of items is respected as-is
  • Each Item can be either a:
    • Heading without a link
    • Heading referring to an entry (YAML file)
    • Heading referring to another Navigation file (To import the entries there)
  • Cool part is that Navigation items can be anything you want, not limited to something generated.

Note.: The Navigation source would be on Markdown, using a Markdown List format with a maximum of X-indentation levels.

The Schema of Navigation

The code snippet below shows all examples of the Schema and how it would be generated in the end.

File: doc/api/v18/en.navigation.md

* [About this Documentation](documentation.en.content.md)
* [Modules](modules/en.navigation.md)
* Some Header
  * Sub-Levels Supported
    * To a certain-max-level
    * [An External Link](https://nodejs.org/certification)

File: doc/api/v18/modules/en.navigation.md

* [File System](fs/en.navigation.md)
* [Streams](streams/en.navigation.md)

File: doc/api/v18/modules/fs/en.navigation.md

* [About File Systems](fs.en.content.md)
* [File System Promises](promises.en.content.md)
* ....

Example output in Markdown

* [About this Documentation](documentation.en.content.md)
* Modules
  * File System
    * [About File Systems](fs.en.content.md)
    * [File System Promises](promises.en.content.md)
  * Streams
    * ....
* Some Header
  * Sub-Levels Supported
    * To a certain-max-level
    * [An External Link](https://nodejs.org/certification)

It is essential to mention that the final output of the Navigation would be Markdown and can be used by the build tools to either generate an output on MDX or plain HTML or JSON.

Conclusion

As explained before, the proposal has several benefits and would significantly add to our Codebase. Knowing that the benefits vary from tooling, build process, maintainability, adoption, ease of documentation, translations, and even more, this proposal is faded to succeed! Also, all the items explained here can be automated, ensuring a smooth transition process.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions