Skip to content

Move to Electron's docs-parser tooling for Node.js documentation #32206

Closed as not planned
@bnb

Description

@bnb

Over the past four months, I've gotten more and more involved in Electron as a project. Generally, they are extremely forward on automation and tooling intended to reduce maintainer burden since they are such a small team maintaining such a large project.

One of the tools they created, docs-parser, is especially interesting and I think Node.js could benefit from adopting it as a fundamental piece of our documentation tooling.

Why

I see quite significant benefits to adopting docs-parser:

  • More consistency in Node.js documentation. Docs parser requires you to write in according to the documentation styleguide that Electron uses. As a reader and consumer I've never had a particularly good experience using our docs, and have . Whether it's the hierarchical improvements or the relative consistency from section to section, docs-parser's enforced writing style helps ensure that our users' experience while consuming documentation is consistent.
  • Straightforward detection of missing context. In the example above, you can see that each of the properties on the object returned by getHeapStatistics() have an empty "description" property. This emptiness is extremely useful in finding places where additional context can be added. For example, it's trivial to write a tool that checks for empty "description" properties. This provides an excellent path forward to enriching our documentation with context that's useful to users who don't have the same understanding that we do.
  • Potential internal tooling to reduce maintainer burden. Electron doesn't just ship this by itself. They have two other tools, typescript-definitions and archaeologist which - when used together - provide a GitHub check that surfaces a representation of the documentation API as a .d.ts, diffing the current PR's representation with the representation of the documentation in master. This provides a way for maintainers to parse out the changes to APIs in code in addition to just personally reading the docs themselves.

How does docs-parser differ from what we currently have?

Document Structure

docs-parser requires a specific document structure that is currently documented in the Electron Styleguide API Reference section.

This structure has specific expectations about titles, descriptions, modules, methods, events, classes (undocumented here: multi-class mode, which would be needed in Node.js and is currently supported by docs-parser), static properties, instance methods, and instance properties.

I've worked on converting Node.js's querystring, v8, and worker-threads docs in a personal repo which you can find here in the docs/api directory: bnb/node-docs-parser. Please take a look if you're interested in what the differences are - the first commit on each file was the state I started with and the final commit in each is the docs-parser version. Additionally, there's a directory with the original versions that you can compare side-by-side if you'd prefer to approach it that way.

In doing this I found that - while a few things did need to be shuffled around to correctly parse - the overall updated structure was more clear and approachable with minimal additional effort.

Actual Markdown

docs-parser requires a comparatively strict structure around markdown input, since it directly parses markdown.

Here's an example from Node.js:

## `querystring.escape(str)`
<!-- YAML
added: v0.1.25
-->

* `str` {string}

The `querystring.escape()` method performs URL percent-encoding on the given
`str` in a manner that is optimized for the specific requirements of URL
query strings.

The `querystring.escape()` method is used by `querystring.stringify()` and is
generally not expected to be used directly. It is exported primarily to allow
application code to provide a replacement percent-encoding implementation if
necessary by assigning `querystring.escape` to an alternative function.

And here's the current equivalent in docs-parser:

### `querystring.escape(str)`

- `str` String

The `querystring.escape()` method performs URL percent-encoding on the given
`str` in a manner that is optimized for the specific requirements of URL
query strings.

The `querystring.escape()` method is used by `querystring.stringify()` and is
generally not expected to be used directly. It is exported primarily to allow
application code to provide a replacement percent-encoding implementation if
necessary by assigning `querystring.escape` to an alternative function.

They seem nearly identical, and indeed they basically are. This is a good example of how minor some of the necessary changes are. Here's another slightly more complicated example:

Node.js version:

### `v8.getHeapStatistics()`

Returns `Object`

* `total_heap_size` number
* `total_heap_size_executable` number
* `total_physical_size` number
* `total_available_size` number
* `used_heap_size` number
* `heap_size_limit` number
* `malloced_memory` number
* `peak_malloced_memory` number
* `does_zap_garbage` number
* `number_of_native_contexts` number
* `number_of_detached_contexts` number

`does_zap_garbage` is a 0/1 boolean, which signifies whether the
`--zap_code_space` option is enabled or not. This makes V8 overwrite heap
garbage with a bit pattern. The RSS footprint (resident memory set) gets bigger
because it continuously touches all heap pages and that makes them less likely
to get swapped out by the operating system.

`number_of_native_contexts` The value of native_context is the number of the
top-level contexts currently active. Increase of this number over time indicates
a memory leak.

`number_of_detached_contexts` The value of detached_context is the number
of contexts that were detached and not yet garbage collected. This number
being non-zero indicates a potential memory leak.

<!-- eslint-skip -->
` ` `js
{
  total_heap_size: 7326976,
  total_heap_size_executable: 4194304,
  total_physical_size: 7326976,
  total_available_size: 1152656,
  used_heap_size: 3476208,
  heap_size_limit: 1535115264,
  malloced_memory: 16384,
  peak_malloced_memory: 1127496,
  does_zap_garbage: 0,
  number_of_native_contexts: 1,
  number_of_detached_contexts: 0
}
` ` `

docs-parser version:

### `v8.getHeapStatistics()`

Returns `Object`

* `total_heap_size` number
* `total_heap_size_executable` number
* `total_physical_size` number
* `total_available_size` number
* `used_heap_size` number
* `heap_size_limit` number
* `malloced_memory` number
* `peak_malloced_memory` number
* `does_zap_garbage` number
* `number_of_native_contexts` number
* `number_of_detached_contexts` number

`does_zap_garbage` is a 0/1 boolean, which signifies whether the
`--zap_code_space` option is enabled or not. This makes V8 overwrite heap
garbage with a bit pattern. The RSS footprint (resident memory set) gets bigger
because it continuously touches all heap pages and that makes them less likely
to get swapped out by the operating system.

`number_of_native_contexts` The value of native_context is the number of the
top-level contexts currently active. Increase of this number over time indicates
a memory leak.

`number_of_detached_contexts` The value of detached_context is the number
of contexts that were detached and not yet garbage collected. This number
being non-zero indicates a potential memory leak.

<!-- eslint-skip -->
` ` `js
{
  total_heap_size: 7326976,
  total_heap_size_executable: 4194304,
  total_physical_size: 7326976,
  total_available_size: 1152656,
  used_heap_size: 3476208,
  heap_size_limit: 1535115264,
  malloced_memory: 16384,
  peak_malloced_memory: 1127496,
  does_zap_garbage: 0,
  number_of_native_contexts: 1,
  number_of_detached_contexts: 0
}
` ` `

However, docs-parser has an interesting technical benefit. Like our current setup, it outputs JSON. Compare the two following JSON outputs:

Node.js JSON output:

{
          "textRaw": "`v8.getHeapStatistics()`",
          "type": "method",
          "name": "getHeapStatistics",
          "meta": {
            "added": [
              "v1.0.0"
            ],
            "changes": [
              {
                "version": "v7.2.0",
                "pr-url": "https://github.com/nodejs/node/pull/8610",
                "description": "Added `malloced_memory`, `peak_malloced_memory`, and `does_zap_garbage`."
              },
              {
                "version": "v7.5.0",
                "pr-url": "https://github.com/nodejs/node/pull/10186",
                "description": "Support values exceeding the 32-bit unsigned integer range."
              }
            ]
          },
          "signatures": [
            {
              "return": {
                "textRaw": "Returns: {Object}",
                "name": "return",
                "type": "Object"
              },
              "params": []
            }
          ],
          "desc": "<p>Returns an object with the following properties:</p>\n<ul>\n<li><code>total_heap_size</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>total_heap_size_executable</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>total_physical_size</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>total_available_size</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>used_heap_size</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>heap_size_limit</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>malloced_memory</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>peak_malloced_memory</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>does_zap_garbage</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>number_of_native_contexts</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n<li><code>number_of_detached_contexts</code> <a href=\"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures#Number_type\" class=\"type\">&lt;number&gt;</a></li>\n</ul>\n<p><code>does_zap_garbage</code> is a 0/1 boolean, which signifies whether the\n<code>--zap_code_space</code> option is enabled or not. This makes V8 overwrite heap\ngarbage with a bit pattern. The RSS footprint (resident memory set) gets bigger\nbecause it continuously touches all heap pages and that makes them less likely\nto get swapped out by the operating system.</p>\n<p><code>number_of_native_contexts</code> The value of native_context is the number of the\ntop-level contexts currently active. Increase of this number over time indicates\na memory leak.</p>\n<p><code>number_of_detached_contexts</code> The value of detached_context is the number\nof contexts that were detached and not yet garbage collected. This number\nbeing non-zero indicates a potential memory leak.</p>\n<!-- eslint-skip -->\n<pre><code class=\"language-js\">{\n  total_heap_size: 7326976,\n  total_heap_size_executable: 4194304,\n  total_physical_size: 7326976,\n  total_available_size: 1152656,\n  used_heap_size: 3476208,\n  heap_size_limit: 1535115264,\n  malloced_memory: 16384,\n  peak_malloced_memory: 1127496,\n  does_zap_garbage: 0,\n  number_of_native_contexts: 1,\n  number_of_detached_contexts: 0\n}\n</code></pre>"
        },

docs-parser JSON output:

      {
        "name": "getHeapStatistics",
        "signature": "()",
        "description": "* `total_heap_size` number\n* `total_heap_size_executable` number\n* `total_physical_size` number\n* `total_available_size` number\n* `used_heap_size` number\n* `heap_size_limit` number\n* `malloced_memory` number\n* `peak_malloced_memory` number\n* `does_zap_garbage` number\n* `number_of_native_contexts` number\n* `number_of_detached_contexts` number\n\n`does_zap_garbage` is a 0/1 boolean, which signifies whether the `--zap_code_space` option is enabled or not. This makes V8 overwrite heap garbage with a bit pattern. The RSS footprint (resident memory set) gets bigger because it continuously touches all heap pages and that makes them less likely to get swapped out by the operating system.\n\n`number_of_native_contexts` The value of native_context is the number of the top-level contexts currently active. Increase of this number over time indicates a memory leak.\n\n`number_of_detached_contexts` The value of detached_context is the number of contexts that were detached and not yet garbage collected. This number being non-zero indicates a potential memory leak.\n\n<!-- eslint-skip -->",
        "parameters": [],
        "returns": {
          "collection": false,
          "type": "Object",
          "properties": [
            {
              "name": "total_heap_size",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "total_heap_size_executable",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "total_physical_size",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "total_available_size",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "used_heap_size",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "heap_size_limit",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "malloced_memory",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "peak_malloced_memory",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "does_zap_garbage",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "number_of_native_contexts",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            },
            {
              "name": "number_of_detached_contexts",
              "description": "",
              "required": true,
              "additionalTags": [],
              "collection": false,
              "type": "number"
            }
          ]
        },
        "additionalTags": []
      },

The latter output has a significantly larger amount of useful context extracted from the same Markdown.

Current Challenges and Potential Blockers

I would like to get a feeling for how folks feel about these potential technical/functional blockers.

  • Three elements of metadata that Node.js uses are currently missing from docs-parser: changes, introduced in, and stability.
    • Potential solution: I've talked with @MarshallOfSound and it seems that there's potential for adding an extensible metadata section to docs-parser.
  • docs-parser does not currently output individual, per-API JSON files.
    • Potential solution: this could be PR'ed or extracted in an additional step from the all-in-one file.
  • some elements of docs-parser are currently hardcoded to be electron-specific.
  • Bug in the multi-class mode which results in a parsing error in docs that have mutliple classes (@electronjs/docs-parser#27)
    • Potential solution: slated to be fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues opened for discussions and feedbacks.docIssues and PRs related to the documentations.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions