Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split docutils only functionality into separate package #347

Closed
chrisjsewell opened this issue Apr 16, 2021 · 12 comments · Fixed by #456
Closed

Split docutils only functionality into separate package #347

chrisjsewell opened this issue Apr 16, 2021 · 12 comments · Fixed by #456
Labels
discussion no fixed close condition

Comments

@chrisjsewell
Copy link
Member

extracted from #342:

I was surprised to see docutils 0.17 add experimental Markdown support using Recommonmark

ah I completely missed the response on the email thread.

Ah well that's a different proposition: I literally have myst_parser/docutils_renderer.py and myst_parser/sphinx_renderer.py as a sub-class, so that it is certainly possible to use myst-parser with "docutils only" functionality.

If I had known previously about such an intention to ship Markdown in docutils, I would certainly have considered splitting the "docutils only" aspects in to a separate "myst-docutils" package (and have that as a dependency here), which I think would probably be a better solution than just moving dependencies to extras.
Indeed, if there was an agreement in principle with the docutils guys to include something like myst-docutils in docutils I would look into this.

@cpitclaudel
Copy link
Contributor

cpitclaudel commented Jul 18, 2021

Ah well that's a different proposition: I literally have myst_parser/docutils_renderer.py and myst_parser/sphinx_renderer.py as a sub-class, so that it is certainly possible to use myst-parser with "docutils only" functionality.

I think this would be great. The thing that's missing right now for convenient integration with docutils, I think (besides separating dependencies), is a version of sphinx_parser for docutils. It would allow this kind of code to work:

from docutils.core import publish_string
from myst_parser.sphinx_parser import MystParser
print(publish_string(source="{math}`e^{i\pi} = -1`", parser=MystParser(), writer_name='html5'))

(and with that integration into a variety of other docutils-based pipelines; my use case is integrating with Alectryon for Coq proofs.)

As currently written, this code crashes:

Traceback (most recent call last):
  File "minimyst.py", line 3, in <module>
    print(publish_string(source="{math}`a^2 + b^2 = c^2`", parser=MystParser(), writer_name='html5'))
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 407, in publish_string
    output, pub = publish_programmatically(
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 665, in publish_programmatically
    output = pub.publish(enable_exit_status=enable_exit_status)
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 217, in publish
    self.document = self.reader.read(self.source, self.parser,
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/readers/__init__.py", line 71, in read
    self.parse()
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/readers/__init__.py", line 77, in parse
    self.parser.parse(self.input, document)
  File "/home/clement/.local/lib/python3.8/site-packages/myst_parser/sphinx_parser.py", line 55, in parse
    config = document.settings.env.myst_config
AttributeError: 'Values' object has no attribute 'env'

(in fact I see now that it's almost exactly the same example as in https://sourceforge.net/p/docutils/mailman/docutils-users/thread/rkv4nb%24139g%241%40ciao.gmane.io/#msg37118232)

This crash is due to parse taking an additional argument that docutils doesn't pass:

    def parse(self, inputstring: str, document: nodes.document, renderer: str = "sphinx") -> None:
        if renderer == "sphinx":
            config = document.settings.env.myst_config # Here, since docutils will call this function without specifying the renderer
        else:
            config = MdParserConfig()

It works fine if I use a subclass of MystParser that passes a dummy renderer string into super().parse instead of the default "sphinx":

from docutils.core import publish_string
from myst_parser.sphinx_parser import MystParser

class NonSphinxMystParser(MystParser):
    def parse(self, inputstring, document) -> None:
        return super().parse(inputstring, document, r"¯\_(ツ)_/¯")

print(publish_string(source="{math}`e^{i\pi} = -1`", parser=NonSphinxMystParser(), writer_name='html5').decode('utf-8'))

… but it's no clear whether that's the right approach (?) So separating things out in a way that makes this works would be very very nice, especially setting MyST options works from the usual docutils.conf, etc.

Also nice would be some canonical way to register MyST's directives and roles: some function that clients can call to make the appropriate calls to docutils.directives.register_directive etc.

In #342 @astrojuanlu wrote:

It sounds like docutils could potentially use https://github.com/executablebooks/markdown-it-py for its Markdown support then?

and @choldgraf wrote

Yep, i think that is the answer.

But am I correct to think that this requires extra work to also parse MyST's config, recreate its custom roles/directives (?) and make a Reader that's compatible with docutils?

In any case it looks like with very small changes (and independently of integrating with the official docutils package) applications like Alectryon that already depend on docutils but not necessarily sphinx could add support for MyST, which would be great!

cpitclaudel added a commit to cpitclaudel/MyST-Parser that referenced this issue Jul 19, 2021
The global approach in 395 isn't really sustainable: it requires all-ways
cooperation between all projects that want to customize MathJax.  Additionally,
when processing a MyST document without Sphinx, the MathJax configuration
changes are not performed (part of executablebooks#347).  And, of course, this approach of
overriding the MathJax object causes issues down the line for projects that need
to customize MathJax (the setting in Sphinx isn't sufficient, see sphinx-doc/sphinx#9450)

The following two approaches would not cause these issues:

1. Add a custom script instead of touching the mathjax3_config variable;
   something like this, essentially:

   ```js
   app.add_js_file(None, priority=0, body="""
      var MathJax = window.MathJax || MathJax;
      MathJax.options = MathJax.options || {};
      MathJax.options.processHtmlClass = (MathJax.options.processHtmlClass || "")
      + "|math";
   """)
   ```

- Don't touch MathJax_config at all; instead, add an explicit `mathjax_process`
  class on all math nodes, either by changing `docutils_renderer` (this PR) or by
  adding a Docutils transform to processes all math nodes:

  ```python
  class ActivateMathJaxTransform(Transform):
      default_priority = 800

      @staticmethod
      def is_math(node):
          return isinstance(node, (math, math_block))

      def apply(self, **kwargs):
          for node in self.document.traverse(self.is_math):
              node.attributes.setdefault("classes", []).append("mathjax_process")
  ```

This PR isn't ready for merging; it's just to start a discussion.
@gmilde
Copy link

gmilde commented Oct 16, 2021

Given the end-of-life for recommonmark, is there a chance for a Docutils-only version of the MyST parser that can be utilised by Docutils?

@chrisjsewell
Copy link
Member Author

Heya, DocutilsRenderer is already a docutils-only renderer:

class DocutilsRenderer(RendererProtocol):

@cpitclaudel has kindly made a PR to allow for controlling the configuration via docutils: #426, that I'm just trying to find time to circle round to and finalise the tests.

The only sticking point perhaps is myst-parser's pinned dependency on sphinx, creating a cylic dependency. I specifically added this, because changes in docutils/sphinx kept breaking myst-parser for users, so would have to think how this could be achieved

@chrisjsewell
Copy link
Member Author

myst-parser v0.16.0 now introduces docutils-only functionality (https://myst-parser.readthedocs.io/en/latest/docutils.html) and https://pypi.org/project/myst-docutils/ release pipeline, which inlcudes no install dependencies on sphinx/docutils 😄

@gmilde, do you want me to open an issue on docutils, to move from recommonmark to myst-docutils?

@gmilde
Copy link

gmilde commented Dec 16, 2021 via email

@chrisjsewell
Copy link
Member Author

I see that 5 front-end tools shall be installed as well (but in the package download I didn't find the corresponding files).

They don't need separate files, they use python entry points:

MyST-Parser/setup.cfg

Lines 50 to 57 in 11fb239

[options.entry_points]
console_scripts =
myst-anchors = myst_parser.cli:print_anchors
myst-docutils-html = myst_parser.docutils_:cli_html
myst-docutils-html5 = myst_parser.docutils_:cli_html5
myst-docutils-latex = myst_parser.docutils_:cli_latex
myst-docutils-xml = myst_parser.docutils_:cli_xml
myst-docutils-pseudoxml = myst_parser.docutils_:cli_pseudoxml

I feel this is the more modern approach for including CLI tools in python package distributions.

IMV, separate front end tools are not necessary

I felt this was easier for end users, and also inline with the separate rst2xxx.py front end tools that docutils already ships with

Feel free to open an enhancement ticket or start a thread

Yep will do then 👍

@chrisjsewell
Copy link
Member Author

I feel this is the more modern approach for including CLI tools in python package distributions.

See https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/?highlight=console_scripts#scripts

Although setup() supports a scripts keyword for pointing to pre-made scripts to install, the recommended approach to achieve cross-platform compatibility is to use console_scripts entry points

@chrisjsewell
Copy link
Member Author

a test strategy

Also, just to note, there are now separate test jobs for basic testing of myst-docutils against docutils 0.16, 0.17, and 0.18 (on top of the full test suite): https://github.com/executablebooks/MyST-Parser/runs/4546203130?check_suite_focus=true

@gmilde
Copy link

gmilde commented Dec 20, 2021 via email

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Dec 20, 2021

Unfortunately, the approach of a separate front-end tool for every reader-parser-writer combination doesn't scale well.

Oh indeed, but also having to write the full parser path, etc, every time is not ideal.
Usually, if I am making a complex CLI I would use click.palletsprojects.com/

I suppose that docutils-cli.py --parser=myst_parser.docutils_` should do the trick.

Yes the parser is just a standard docutils parser.
Again here, I would suggest you use entry points to load parsers, rather than module paths

If myst_parser.docutils_.Parser.settings_spec could
provide an interface to the relevant myst configuration options, this would
allow configuring reader

Yep this is already what happens, settings_spec already contains all the myst options, so when you use the CLI you get, e.g. (as shown in the drop down at https://myst-parser.readthedocs.io/en/latest/docutils.html)

Usage
=====
  myst-docutils-<writer> [options] [<source> [<destination>]]

Options
=======
General Docutils Options
------------------------
--title=TITLE           Specify the document title as metadata.
--generator, -g         Include a "Generated by Docutils" credit and link.
--no-generator          Do not include a generator credit.
--date, -d              Include the date at the end of the document (UTC).
--time, -t              Include the time & date (UTC).
--no-datestamp          Do not include a datestamp of any kind.
--source-link, -s       Include a "View document source" link.
--source-url=<URL>      Use <URL> for a source link; implies --source-link.
--no-source-link        Do not include a "View document source" link.
--toc-entry-backlinks   Link from section headers to TOC entries.  (default)
--toc-top-backlinks     Link from section headers to the top of the TOC.
--no-toc-backlinks      Disable backlinks to the table of contents.
--footnote-backlinks    Link from footnotes/citations to references. (default)
--no-footnote-backlinks
                        Disable backlinks from footnotes and citations.
--section-numbering     Enable section numbering by Docutils.  (default)
--no-section-numbering  Disable section numbering by Docutils.
--strip-comments        Remove comment elements from the document tree.
--leave-comments        Leave comment elements in the document tree. (default)
--strip-elements-with-class=<class>
                        Remove all elements with classes="<class>" from the
                        document tree. Warning: potentially dangerous; use
                        with caution. (Multiple-use option.)
--strip-class=<class>   Remove all classes="<class>" attributes from elements
                        in the document tree. Warning: potentially dangerous;
                        use with caution. (Multiple-use option.)
--report=<level>, -r <level>
                        Report system messages at or higher than <level>:
                        "info" or "1", "warning"/"2" (default), "error"/"3",
                        "severe"/"4", "none"/"5"
--verbose, -v           Report all system messages.  (Same as "--report=1".)
--quiet, -q             Report no system messages.  (Same as "--report=5".)
--halt=<level>          Halt execution at system messages at or above <level>.
                        Levels as in --report.  Default: 4 (severe).
--strict                Halt at the slightest problem.  Same as "--halt=info".
--exit-status=<level>   Enable a non-zero exit status for non-halting system
                        messages at or above <level>.  Default: 5 (disabled).
--debug                 Enable debug-level system messages and diagnostics.
--no-debug              Disable debug output.  (default)
--warnings=<file>       Send the output of system messages to <file>.
--traceback             Enable Python tracebacks when Docutils is halted.
--no-traceback          Disable Python tracebacks.  (default)
--input-encoding=<name[:handler]>, -i <name[:handler]>
                        Specify the encoding and optionally the error handler
                        of input text.  Default: <locale-dependent>:strict.
--input-encoding-error-handler=INPUT_ENCODING_ERROR_HANDLER
                        Specify the error handler for undecodable characters.
                        Choices: "strict" (default), "ignore", and "replace".
--output-encoding=<name[:handler]>, -o <name[:handler]>
                        Specify the text encoding and optionally the error
                        handler for output.  Default: UTF-8:strict.
--output-encoding-error-handler=OUTPUT_ENCODING_ERROR_HANDLER
                        Specify error handler for unencodable output
                        characters; "strict" (default), "ignore", "replace",
                        "xmlcharrefreplace", "backslashreplace".
--error-encoding=<name[:handler]>, -e <name[:handler]>
                        Specify text encoding and error handler for error
                        output.  Default: UTF-8:backslashreplace.
--error-encoding-error-handler=ERROR_ENCODING_ERROR_HANDLER
                        Specify the error handler for unencodable characters
                        in error output.  Default: backslashreplace.
--language=<name>, -l <name>
                        Specify the language (as BCP 47 language tag).
                        Default: en.
--record-dependencies=<file>
                        Write output file dependencies to <file>.
--config=<file>         Read configuration settings from <file>, if it exists.
--version, -V           Show this program's version number and exit.
--help, -h              Show this help message and exit.

reStructuredText Parser Options
-------------------------------
--pep-references        Recognize and link to standalone PEP references (like
                        "PEP 258").
--pep-base-url=<URL>    Base URL for PEP references (default
                        "http://www.python.org/dev/peps/").
--pep-file-url-template=<URL>
                        Template for PEP file part of URL. (default
                        "pep-%04d")
--rfc-references        Recognize and link to standalone RFC references (like
                        "RFC 822").
--rfc-base-url=<URL>    Base URL for RFC references (default
                        "http://tools.ietf.org/html/").
--tab-width=<width>     Set number of spaces for tab expansion (default 8).
--trim-footnote-reference-space
                        Remove spaces before footnote references.
--leave-footnote-reference-space
                        Leave spaces before footnote references.
--no-file-insertion     Disable directives that insert the contents of
                        external file ("include" & "raw"); replaced with a
                        "warning" system message.
--file-insertion-enabled
                        Enable directives that insert the contents of external
                        file ("include" & "raw").  Enabled by default.
--no-raw                Disable the "raw" directives; replaced with a
                        "warning" system message.
--raw-enabled           Enable the "raw" directive.  Enabled by default.
--syntax-highlight=<format>
                        Token name set for parsing code with Pygments: one of
                        "long", "short", or "none (no parsing)". Default is
                        "long".
--smart-quotes=<yes/no/alt>
                        Change straight quotation marks to typographic form:
                        one of "yes", "no", "alt[ernative]" (default "no").
--smartquotes-locales=<language:quotes[,language:quotes,...]>
                        Characters to use as "smart quotes" for <language>.
--word-level-inline-markup
                        Inline markup recognized at word boundaries only
                        (adjacent to punctuation or whitespace). Force
                        character-level inline markup recognition with "\ "
                        (backslash + space). Default.
--character-level-inline-markup
                        Inline markup recognized anywhere, regardless of
                        surrounding characters. Backslash-escapes must be used
                        to avoid unwanted markup recognition. Useful for East
                        Asian languages. Experimental.

MyST options
------------
--myst-commonmark-only=MYST_COMMONMARK_ONLY
                        Use strict CommonMark parser (type: bool, default:
                        False)
--myst-enable-extensions=MYST_ENABLE_EXTENSIONS
                        Enable extensions (type: comma-delimited, default:
                        'dollarmath')
--myst-linkify-fuzzy-links=MYST_LINKIFY_FUZZY_LINKS
                        linkify: recognise URLs without schema prefixes (type:
                        bool, default: True)
--myst-dmath-allow-labels=MYST_DMATH_ALLOW_LABELS
                        Parse `$$...$$ (label)` (type: bool, default: True)
--myst-dmath-allow-space=MYST_DMATH_ALLOW_SPACE
                        dollarmath: allow initial/final spaces in `$ ... $`
                        (type: bool, default: True)
--myst-dmath-allow-digits=MYST_DMATH_ALLOW_DIGITS
                        dollarmath: allow initial/final digits `1$ ...$2`
                        (type: bool, default: True)
--myst-dmath-double-inline=MYST_DMATH_DOUBLE_INLINE
                        dollarmath: parse inline `$$ ... $$` (type: bool,
                        default: False)
--myst-disable-syntax=MYST_DISABLE_SYNTAX
                        Disable syntax elements (type: comma-delimited,
                        default: '')
--myst-url-schemes=MYST_URL_SCHEMES
                        URL schemes to allow in links (type: comma-delimited,
                        default: 'http,https,mailto,ftp')
--myst-footnote-transition=MYST_FOOTNOTE_TRANSITION
                        Place a transition before any footnotes (type: bool,
                        default: True)
--myst-words-per-minute=MYST_WORDS_PER_MINUTE
                        For reading speed calculations (type: int, default:
                        200)

@gmilde
Copy link

gmilde commented Dec 23, 2021 via email

@chrisjsewell
Copy link
Member Author

Now, docutils-cli.py --parser=myst should work just like
myst-docutils-html5.

yep that sounds good, although I would note, I do like the "inherent" tab-completion you get in the terminal with the myst-docutils- commands:

$ myst-docutils-<tab>
myst-docutils-html       myst-docutils-html5      myst-docutils-latex      myst-docutils-pseudoxml  myst-docutils-xml

makes it very easy to use.

BTW: Why do the myst-doctuils-… front-ends include rst parser settings?

myst-parser hooks in to the RST directive/role parsing mechanisms, I seem to recall some of these settings were required, and it was breaking without them being present.

Feel free to open an enhancement ticket

Anyhow, I have now opened https://sourceforge.net/p/docutils/feature-requests/86/ and created #487 for any parallel discussion here, so any more conversation can move to them cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion no fixed close condition
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants