Skip to content

rubys/mdast

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MDAST

Markdown Abstract Syntax Tree.


MDAST discloses markdown as an abstract syntax tree. Abstract means not all information is stored in this tree and an exact replica of the original document cannot be re-created. Syntax Tree means syntax is present in the tree, thus an exact syntactic document can be re-created.

MDAST is a subset of unist, and implemented by remark.

This document may not be released. See releases for released documents. The latest released version is 2.2.0.

Table of Contents

AST

Root

Root (Parent) houses all nodes.

interface Root <: Parent {
  type: "root";
}

Paragraph

Paragraph (Parent) represents a unit of discourse dealing with a particular point or idea.

interface Paragraph <: Parent {
  type: "paragraph";
}

For example, the following markdown:

Alpha bravo charlie.

Yields:

{
  "type": "paragraph",
  "children": [{
    "type": "text",
    "value": "Alpha bravo charlie."
  }]
}

Blockquote

Blockquote (Parent) represents a quote.

interface Blockquote <: Parent {
  type: "blockquote";
}

For example, the following markdown:

> Alpha bravo charlie.

Yields:

{
  "type": "blockquote",
  "children": [{
    "type": "paragraph",
    "children": [{
      "type": "text",
      "value": "Alpha bravo charlie."
    }]
  }]
}

Heading

Heading (Parent), just like with HTML, with a level greater than or equal to 1, lower than or equal to 6.

interface Heading <: Parent {
  type: "heading";
  depth: 1 <= uint32 <= 6;
}

For example, the following markdown:

# Alpha

Yields:

{
  "type": "heading",
  "depth": 1,
  "children": [{
    "type": "text",
    "value": "Alpha"
  }]
}

Code

Code (Text) occurs at block level (see InlineCode for code spans). Code supports an info string and a language tag (when the line with the opening fence contains some text, it is stored as the info string, the first word following the fence is stored as the language tag, the rest of the line is stored as the info string, both are null if missing)

interface Code <: Text {
  type: "code";
  lang: string | null;
  info: string | null;
}

For example, the following markdown:

    foo()

Yields:

{
  "type": "code",
  "lang": null,
  "info": null,
  "value": "foo()"
}

InlineCode

InlineCode (Text) occurs inline (see Code for blocks). Inline code does not sport a lang attribute.

interface InlineCode <: Text {
  type: "inlineCode";
}

For example, the following markdown:

`foo()`

Yields:

{
  "type": "inlineCode",
  "value": "foo()"
}

YAML

YAML (Text) can occur at the start of a document, and contains embedded YAML data.

interface YAML <: Text {
  type: "yaml";
}

Note: YAML used to be available through the core of remark and thus is specified here. Support for it now moved to remark-frontmatter, and the definition here may be removed in the future.

For example, the following markdown:

---
foo: bar
---

Yields:

{
  "type": "yaml",
  "value": "foo: bar"
}

HTML

HTML (Text) contains embedded HTML.

interface HTML <: Text {
  type: "html";
}

For example, the following markdown:

<div>

Yields:

{
  "type": "html",
  "value": "<div>"
}

List

List (Parent) contains ListItems. No other nodes may occur in lists.

The start property contains the starting number of the list when ordered: true; null otherwise.

When all list items have loose: false, the list’s loose property is also false. Otherwise, loose: true.

interface List <: Parent {
  type: "list";
  ordered: true | false;
  start: uint32 | null;
  loose: true | false;
}

For example, the following markdown:

1. [x] foo

Yields:

{
  "type": "list",
  "ordered": true,
  "start": 1,
  "loose": false,
  "children": [{
    "type": "listItem",
    "loose": false,
    "checked": true,
    "children": [{
      "type": "paragraph",
      "children": [{
        "type": "text",
        "value": "foo",
      }]
    }]
  }]
}

ListItem

ListItem (Parent) is a child of a List.

Loose ListItems often contain more than one block-level elements.

A checked property exists on ListItems, set to true (when checked), false (when unchecked), or null (when not containing a checkbox). See Task Lists on GitHub for information.

interface ListItem <: Parent {
  type: "listItem";
  loose: true | false;
  checked: true | false | null;
}

For an example, see the definition of List.

Table

Table (Parent) represents tabular data, with alignment. Its children are TableRows, the first of which acts as a table header row.

table.align represents the alignment of columns.

interface Table <: Parent {
  type: "table";
  align: [alignType];
}
enum alignType {
  "left" | "right" | "center" | null;
}

For example, the following markdown:

| foo | bar |
| :-- | :-: |
| baz | qux |

Yields:

{
  "type": "table",
  "align": ["left", "center"],
  "children": [
    {
      "type": "tableRow",
      "children": [
        {
          "type": "tableCell",
          "children": [{
            "type": "text",
            "value": "foo"
          }]
        },
        {
          "type": "tableCell",
          "children": [{
            "type": "text",
            "value": "bar"
          }]
        }
      ]
    },
    {
      "type": "tableRow",
      "children": [
        {
          "type": "tableCell",
          "children": [{
            "type": "text",
            "value": "baz"
          }]
        },
        {
          "type": "tableCell",
          "children": [{
            "type": "text",
            "value": "qux"
          }]
        }
      ]
    }
  ]
}

TableRow

TableRow (Parent). Its children are always TableCell.

interface TableRow <: Parent {
  type: "tableRow";
}

For an example, see the definition of Table.

TableCell

TableCell (Parent). Contains a single tabular field.

interface TableCell <: Parent {
  type: "tableCell";
}

For an example, see the definition of Table.

ThematicBreak

A ThematicBreak (Node) represents a break in content, often shown as a horizontal rule, or by two HTML section elements.

interface ThematicBreak <: Node {
  type: "thematicBreak";
}

For example, the following markdown:

***

Yields:

{
  "type": "thematicBreak"
}

Break

Break (Node) represents an explicit line break.

interface Break <: Node {
  type: "break";
}

For example, the following markdown (interpuncts represent spaces):

foo··
bar

Yields:

{
  "type": "paragraph",
  "children": [
    {
      "type": "text",
      "value": "foo"
    },
    {
      "type": "break"
    },
    {
      "type": "text",
      "value": "bar"
    }
  ]
}

Emphasis

Emphasis (Parent) represents slight emphasis.

interface Emphasis <: Parent {
  type: "emphasis";
}

For example, the following markdown:

*alpha* _bravo_

Yields:

{
  "type": "paragraph",
  "children": [
    {
      "type": "emphasis",
      "children": [{
        "type": "text",
        "value": "alpha"
      }]
    },
    {
      "type": "text",
      "value": " "
    },
    {
      "type": "emphasis",
      "children": [{
        "type": "text",
        "value": "bravo"
      }]
    }
  ]
}

Strong

Strong (Parent) represents strong emphasis.

interface Strong <: Parent {
  type: "strong";
}

For example, the following markdown:

**alpha** __bravo__

Yields:

{
  "type": "paragraph",
  "children": [
    {
      "type": "strong",
      "children": [{
        "type": "text",
        "value": "alpha"
      }]
    },
    {
      "type": "text",
      "value": " "
    },
    {
      "type": "strong",
      "children": [{
        "type": "text",
        "value": "bravo"
      }]
    }
  ]
}

Delete

Delete (Parent) represents text ready for removal.

interface Delete <: Parent {
  type: "delete";
}

For example, the following markdown:

~~alpha~~

Yields:

{
  "type": "delete",
  "children": [{
    "type": "text",
    "value": "alpha"
  }]
}

Link

Link (Parent) represents the humble hyperlink.

interface Link <: Parent {
  type: "link";
  title: string | null;
  url: string;
}

For example, the following markdown:

[alpha](http://example.com "bravo")

Yields:

{
  "type": "link",
  "title": "bravo",
  "url": "http://example.com",
  "children": [{
    "type": "text",
    "value": "alpha"
  }]
}

Image

Image (Node) represents the figurative figure.

interface Image <: Node {
  type: "image";
  title: string | null;
  alt: string | null;
  url: string;
}

For example, the following markdown:

![alpha](http://example.com/favicon.ico "bravo")

Yields:

{
  "type": "image",
  "title": "bravo",
  "url": "http://example.com",
  "alt": "alpha"
}

Footnote

Footnote (Parent) represents an inline marker, whose content relates to the document but is outside its flow.

interface Footnote <: Parent {
  type: "footnote";
}

For example, the following markdown:

[^alpha bravo]

Yields:

{
  "type": "footnote",
  "children": [{
    "type": "text",
    "value": "alpha bravo"
  }]
}

LinkReference

LinkReference (Parent) represents a humble hyperlink, its url and title defined somewhere else in the document by a Definition.

referenceType is needed to detect if a reference was meant as a reference ([foo][]) or just unescaped brackets ([foo]).

reference provides the original raw reference, if the reference differs from the identifier. This enables compilers, transformers, and linters to accurately reconstruct the original input.

interface LinkReference <: Parent {
  type: "linkReference";
  identifier: string;
  reference: string;
  referenceType: referenceType;
}
enum referenceType {
  "shortcut" | "collapsed" | "full";
}

For example, the following markdown:

[alpha][Bravo]

Yields:

{
  "type": "linkReference",
  "identifier": "bravo",
  "reference": "Bravo",
  "referenceType": "full",
  "children": [{
    "type": "text",
    "value": "alpha"
  }]
}

ImageReference

ImageReference (Node) represents a figurative figure, its url and title defined somewhere else in the document by a Definition.

referenceType is needed to detect if a reference was meant as a reference (![foo][]) or just unescaped brackets (![foo]). See LinkReference for the definition of referenceType.

reference provides the original raw reference. See LinkReference for the definition of reference.

interface ImageReference <: Node {
  type: "imageReference";
  identifier: string;
  reference: string;
  referenceType: referenceType;
  alt: string | null;
}

For example, the following markdown:

![alpha][Bravo]

Yields:

{
  "type": "imageReference",
  "identifier": "bravo",
  "reference": "Bravo",
  "referenceType": "full",
  "alt": "alpha"
}

FootnoteReference

FootnoteReference (Node) is like Footnote, but its content is already outside the documents flow: placed in a FootnoteDefinition.

interface FootnoteReference <: Node {
  type: "footnoteReference";
  identifier: string;
  reference: string;
}

For example, the following markdown:

[^Alpha]

Yields:

{
  "type": "footnoteReference",
  "identifier": "Alpha"
  "reference": "alpha"
}

Definition

Definition (Node) represents the definition (i.e., location and title) of a LinkReference or an ImageReference.

definition provides the original raw definition, if the definition differs from the identifier. This enables compilers, transformers, and linters to accurately reconstruct the original input.

interface Definition <: Node {
  type: "definition";
  identifier: string;
  definition: string;
  title: string | null;
  url: string;
}

For example, the following markdown:

[Alpha]: http://example.com

Yields:

{
  "type": "definition",
  "identifier": "alpha",
  "definition": "Alpha",
  "title": null,
  "url": "http://example.com"
}

FootnoteDefinition

FootnoteDefinition (Parent) represents the definition (i.e., content) of a FootnoteReference.

definition provides the original raw definition. See Definition for the definition of definition.

interface FootnoteDefinition <: Parent {
  type: "footnoteDefinition";
  identifier: string;
}

For example, the following markdown:

[^alpha]: bravo and charlie.

Yields:

{
  "type": "footnoteDefinition",
  "identifier": "alpha",
  "children": [{
    "type": "paragraph",
    "children": [{
      "type": "text",
      "value": "bravo and charlie."
    }]
  }]
}

TextNode

TextNode (Text) represents everything that is just text. Note that its type property is text, but it is different from Text.

interface TextNode <: Text {
  type: "text";
}

For example, the following markdown:

Alpha bravo charlie.

Yields:

{
  "type": "text",
  "value": "Alpha bravo charlie."
}

List of Utilities

Related

Contribute

mdast is built by people just like you! Check out contribute.md for ways to get started.

This project has a Code of Conduct. By interacting with this repository, organisation, or community you agree to abide by its terms.

Want to chat with the community and contributors? Join us in Gitter!

Have an idea for a cool new utility or tool? That’s great! If you want feedback, help, or just to share it with the world you can do so by creating an issue in the syntax-tree/ideas repository!

Acknowledgments

The initial release of this project was authored by @wooorm.

Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!

Thanks to @anandthakker, @BarryThePenguin, @izumin5210, @jasonLaster, @justjake, @KyleAMathews, @Rokt33r, @rhysd, @Sarah-Seo, @sethvincent, and @simov for contributing commits since!

License

CC-BY-4.0 © Titus Wormer

About

Markdown Abstract Syntax Tree format

Resources

Code of conduct

Stars

Watchers

Forks

Packages

No packages published