Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: codeblock without "<" consumes extra char #25

Merged
merged 3 commits into from
Sep 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 29 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,48 +7,47 @@ well-formed; the _input_ (vimdoc) is secondary. The first step should always be
to try to fix the input (within reason) rather than insist on a grammar that
handles vimdoc's endless quirks.

Notes
-----
Overview
--------

- vimdoc format "spec":
- [:help help-writing](https://neovim.io/doc/user/helphelp.html#help-writing)
- https://github.com/nanotee/vimdoc-notes
- whitespace is intentionally captured in `(word)`, because it is often necessary to be
able to correctly layout vim help files (especially old/legacy).
- `(codeblock)` is contained by `(line)` because `>` can start a code block at the end of a line.
- `(column_heading)` is contained by `(line)` because `>` (to close
a `(codeblock)` can appear at the start of `(column_heading)`.
- `h1` ("Heading 1"): `======` followed by text and optional `*tags*`.
- `h2` ("Heading 2"): `------` followed by text and optional `*tags*`.
- `h3` ("Heading 3"): only UPPERCASE WORDS, followed by optional `*tags*`.
- whitespace is intentionally captured in all atoms, because it is often used
for "layout" and ascii art in legacy help files.
- `block` is the main top-level node which contains `line` nodes.
- ends at blank line(s) or a line starting with `<`.
- `line`:
- contains atoms (words, tags, taglinks, …)
- contains `codeblock` because `>` can start a codeblock at the end of a line.
- contains headings (`h1`, `h2`, `h3`) because `codeblock` terminated by
"implicit stop" (no terminating `<`) consumes blank lines, so `block` has
no way to end.
- contains `column_heading` because `<` (the `codeblock` terminating char)
can appear at the start of `column_heading`.
- `codeblock`:
- contains `line` nodes which do not contain `word` nodes, it's just the full
raw text line including whitespace. This is somewhat dictated by its
"preformatted" nature; parsing the contents would require loading a "child"
language (injection). See [#2](https://github.com/neovim/tree-sitter-vimdoc/issues/2).
- the terminating `<` (and any following whitespace) is discarded (anonymous).
- `h1` = "Heading 1": `======` followed by text and optional `*tags*`.
- `h2` = "Heading 2": `------` followed by text and optional `*tags*`.
- `h3` = "Heading 3": only UPPERCASE WORDS, followed by optional `*tags*`.

Known issues
------------

- `line_li` ("list item") is _experimental_. It doesn't support nesting yet and
it may not work well; you can treat it as a normal `line` for layout purposes.
- `codeblock` ">" must not be preceded only by tabs, a space char is required (" >").
See `:help lcs-tab` for example. Currently the grammar doesn't enforce this.
- `codeblock` terminated by an "implicit stop" (i.e. no terminating `<`)
consumes the first char of the terminating line, and continues the parent
`block`, preventing top-level forms like `h1`, `h2` from being recognized
until a blank line is encountered.
- `line` in a `codeblock` does not contain `word` atoms, it's just the full
raw text line including whitespace. This is somewhat dictated by its
"preformatted" nature; parsing the contents would require loading a "child"
language (injection). See [#2](https://github.com/vigoux/tree-sitter-vimdoc/issues/2).
- `line_li` ("list item") is experimental. It doesn't support nesting yet.
- Spec requires that `codeblock` delimiter ">" must be preceded by a space
(" >"), not a tab. But currently the grammar doesn't enforce this. Example:
`:help lcs-tab`.
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
- Ideally `block_end` should consume the last block of the document _only_ if that
block is missing a trailing blank line or EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?
- Ideally `line_noeol` should consume the last line of the document _only_ if
that line is missing EOL ("\n").
- TODO: consider simply _not supporting_ docs without EOL?

TODO
----

- `line_noeol` is a special-case to support documents that don't end in EOL.
Grammar could be a bit simpler if we just require EOL at end of document.
- `line_modeline` (only at EOF)
Grammar could be simpler if we require EOL at end of document.
- `line_modeline` ?
5 changes: 4 additions & 1 deletion corpus/arguments.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ NOT an argument
(line
(argument
(word)
(ERROR))
(MISSING "}"))
(word)
(argument
(word))
(word)
(codespan
(word))
Expand Down
128 changes: 115 additions & 13 deletions corpus/codeblock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ block3:
(word))
(line
(codeblock
(line)))
(line))))
(block
(line
(word)))
(block
Expand Down Expand Up @@ -92,17 +93,28 @@ text
(word))))

================================================================================
codeblock with implicit stop (FIXME)
codeblock with implicit stop
================================================================================
>
line1
line2

-------------------------------
===============================
h1-headline *foo*
line1

line2

>
line1

-------------------------------
h1-headline *foo*
h2-headline *foo*

>
line1

H3 HEADLINE *foo*

--------------------------------------------------------------------------------

Expand All @@ -114,15 +126,35 @@ h1-headline *foo*
(line)
(line)))
(line
(word))
(h1
(word)
(tag
(word))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h1/h2/h3 are no longer top-level elements, they are now contained by line. This is necessary because implicitly-terminated codeblock (without <) consumes blank lines (because codeblocks can have blank and empty lines), so there is no way for block to know when to terminate and allow other top-level elements to start.

(line
(word)
(tag
(word))))
(h2
(word)
(tag
(word))))
(word)))
(block
(line
(word)))
(block
(line
(codeblock
(line)
(line)))
(line
(h2
(word)
(tag
(word)))))
(block
(line
(codeblock
(line)
(line)))
(line
(h3
(uppercase_name)
(tag
(word))))))

================================================================================
codeblock with empty lines
Expand Down Expand Up @@ -155,7 +187,9 @@ x
(line)
(line)
(line)
(line)))))
(line)))
(line
(word))))

================================================================================
tricky codeblock
Expand All @@ -166,7 +200,17 @@ tricky codeblock
< line3
<

Example: >

vim.spell.check()
-->
{
{'quik', 'bad', 4}
}
<

tricky

--------------------------------------------------------------------------------

(help_file
Expand All @@ -176,6 +220,16 @@ tricky
(line)
(line)
(line))))
(block
(line
(word)
(codeblock
(line)
(line)
(line)
(line)
(line)
(line))))
(block
(line
(word))))
Expand Down Expand Up @@ -243,3 +297,51 @@ To test for a non-empty string, use empty(): >
(word)
(codeblock
(line)))))

================================================================================
codeblock stop and start on same line
================================================================================
Examples: >
:lua vim.api.nvim_command('echo "Hello, Nvim!"')
< LuaJIT: >
:lua =jit.version
<
*:lua-heredoc*
:lua << [endmarker]
{script}

Example: >
lua << EOF
EOF
<

--------------------------------------------------------------------------------

(help_file
(block
(line
(word)
(codeblock
(line))))
(block
(line
(word)
(codeblock
(line))))
(block
(line
(tag
(word)))
(line
(word)
(word)
(word))
(line
(argument
(word))))
(block
(line
(word)
(codeblock
(line)
(line)))))
15 changes: 3 additions & 12 deletions corpus/codespan.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,10 @@ an error`.
(word))))

================================================================================
NOT a codespan
NOT codespan
================================================================================
*'* *'a* *`* *`a*
'{a-z} `{a-z} Jump to the mark.
*g'* *g'a* *g`* *g`a*
*'* *'a* *`* *`a*
*g'* *g'a* *g`* *g`a*
g'{mark} g`{mark}

--------------------------------------------------------------------------------
Expand All @@ -66,14 +65,6 @@ g'{mark} g`{mark}
(word))
(tag
(word)))
(ERROR)
(line
(argument
(word))
(word)
(word)
(word)
(word))
(line
(tag
(word))
Expand Down
50 changes: 29 additions & 21 deletions corpus/heading1_2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,24 @@ Text2
--------------------------------------------------------------------------------

(help_file
(h1
(word)
(tag
(word)))
(block
(line
(h1
(word)
(tag
(word)))))
(block
(line
(word)))
(h2
(word)
(word)
(tag
(word))
(tag
(word)))
(block
(line
(h2
(word)
(word)
(tag
(word))
(tag
(word)))))
(block
(line
(word))))
Expand All @@ -50,19 +54,23 @@ Text
--------------------------------------------------------------------------------

(help_file
(h1
(tag
(word))
(word)
(word))
(block
(line
(h1
(tag
(word))
(word)
(word))))
(block
(line
(word)))
(h2
(tag
(word))
(word)
(word))
(block
(line
(h2
(tag
(word))
(word)
(word))))
(block
(line
(word))))
Expand Down
Loading