Skip to content

Commit

Permalink
Add a Sphinx role to link to GitHub files (#961)
Browse files Browse the repository at this point in the history
* Add a new role to link to cpython files.

* Use the new :cpy-file: role in internals/parser.rst.

* Make the output a literal.

* Apply suggestions from code review

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>

* Apply suggestions from code review

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

* Remove tabs.

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
  • Loading branch information
3 people authored Nov 14, 2022
1 parent 6b98942 commit f0c9151
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 27 deletions.
22 changes: 17 additions & 5 deletions _extensions/custom_roles.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,35 @@


def setup(app):
# role to link to cpython files
app.add_role(
"cpy-file",
autolink("https://github.com/python/cpython/blob/main/{}"),
)
# role to link to cpython labels
app.add_role(
"gh-label",
autolink("https://github.com/python/cpython/labels/%s"),
autolink("https://github.com/python/cpython/labels/{}"),
)
# Parallel safety:
# https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata
return {"parallel_read_safe": True, "parallel_write_safe": True}


def autolink(pattern):
def role(name, rawtext, text, lineno, inliner, options={}, content=[]):
def role(name, rawtext, text, lineno, inliner, _options=None, _content=None):
"""Combine literal + reference (unless the text is prefixed by a !)."""
if " " in text:
url_text = urllib.parse.quote(f"{text}")
url_text = urllib.parse.quote(text)
else:
url_text = text
url = pattern % (url_text,)
node = nodes.reference(rawtext, text, refuri=url, **options)
url = pattern.format(url_text)
# don't create a reference if the text starts with !
if text.startswith('!'):
node = nodes.literal(rawtext, text[1:])
else:
node = nodes.reference(rawtext, '', nodes.literal(rawtext, text),
refuri=url, internal=False)
return [node], []

return role
44 changes: 22 additions & 22 deletions internals/parser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -518,17 +518,17 @@ Pegen
=====

Pegen is the parser generator used in CPython to produce the final PEG parser used by the interpreter. It is the
program that can be used to read the python grammar located in :file:`Grammar/Python.gram` and produce the final C
program that can be used to read the python grammar located in :cpy-file:`Grammar/python.gram` and produce the final C
parser. It contains the following pieces:

* A parser generator that can read a grammar file and produce a PEG parser written in Python or C that can parse
said grammar. The generator is located at :file:`Tools/peg_generator/pegen`.
said grammar. The generator is located at :cpy-file:`Tools/peg_generator/pegen`.
* A PEG meta-grammar that automatically generates a Python parser that is used for the parser generator itself
(this means that there are no manually-written parsers). The meta-grammar is
located at :file:`Tools/peg_generator/pegen/metagrammar.gram`.
located at :cpy-file:`Tools/peg_generator/pegen/metagrammar.gram`.
* A generated parser (using the parser generator) that can directly produce C and Python AST objects.

The source code for Pegen lives at :file:`Tools/peg_generator/pegen` but normally all typical commands to interact
The source code for Pegen lives at :cpy-file:`Tools/peg_generator/pegen` but normally all typical commands to interact
with the parser generator are executed from the main makefile.

How to regenerate the parser
Expand All @@ -539,18 +539,18 @@ parser (the one used by the interpreter) just execute: ::

make regen-pegen

using the :file:`Makefile` in the main directory. If you are on Windows you can
using the :cpy-file:`!Makefile` in the main directory. If you are on Windows you can
use the Visual Studio project files to regenerate the parser or to execute: ::

./PCbuild/build.bat --regen

The generated parser file is located at :file:`Parser/parser.c`.
The generated parser file is located at :cpy-file:`Parser/parser.c`.

How to regenerate the meta-parser
---------------------------------

The meta-grammar (the grammar that describes the grammar for the grammar files
themselves) is located at :file:`Tools/peg_generator/pegen/metagrammar.gram`.
themselves) is located at :cpy-file:`Tools/peg_generator/pegen/metagrammar.gram`.
Although it is very unlikely that you will ever need to modify it, if you make any modifications
to this file (in order to implement new Pegen features) you will need to regenerate
the meta-parser (the parser that parses the grammar files). To do so just execute: ::
Expand All @@ -570,7 +570,7 @@ Pegen has some special grammatical elements and rules:

* Strings with single quotes (') (e.g. ``'class'``) denote KEYWORDS.
* Strings with double quotes (") (e.g. ``"match"``) denote SOFT KEYWORDS.
* Upper case names (e.g. ``NAME``) denote tokens in the :file:`Grammar/Tokens` file.
* Upper case names (e.g. ``NAME``) denote tokens in the :cpy-file:`Grammar/Tokens` file.
* Rule names starting with ``invalid_`` are used for specialized syntax errors.

- These rules are NOT used in the first pass of the parser.
Expand All @@ -592,7 +592,7 @@ to handle things like indentation boundaries, some special keywords like ``ASYNC
interactive mode and much more. Some of these reasons are also there for historical purposes, and some
others are useful even today.

The list of tokens (all uppercase names in the grammar) that you can use can be found in the :file:`Grammar/Tokens`
The list of tokens (all uppercase names in the grammar) that you can use can be found in the :cpy-file:`Grammar/Tokens`
file. If you change this file to add new tokens, make sure to regenerate the files by executing: ::

make regen-token
Expand All @@ -601,7 +601,7 @@ If you are on Windows you can use the Visual Studio project files to regenerate

./PCbuild/build.bat --regen

How tokens are generated and the rules governing this is completely up to the tokenizer (:file:`Parser/tokenizer.c`)
How tokens are generated and the rules governing this is completely up to the tokenizer (:cpy-file:`Parser/tokenizer.c`)
and the parser just receives tokens from it.

Memoization
Expand All @@ -627,7 +627,7 @@ To know if a new rule needs memoization or not, benchmarking is required
(comparing execution times and memory usage of some considerably big files with
and without memoization). There is a very simple instrumentation API available
in the generated C parse code that allows to measure how much each rule uses
memoization (check the :file:`Parser/pegen.c` file for more information) but it
memoization (check the :cpy-file:`Parser/pegen.c` file for more information) but it
needs to be manually activated.

Automatic variables
Expand Down Expand Up @@ -777,7 +777,7 @@ two phases:
(see the :ref:`how PEG parsers work section <how-peg-parsers-work>` for more information).

You can find a collection of macros to raise specialized syntax errors in the
:file:`Parser/pegen.h` header file. These macros allow also to report ranges for
:cpy-file:`Parser/pegen.h` header file. These macros allow also to report ranges for
the custom errors that will be highlighted in the tracebacks that will be
displayed when the error is reported.

Expand All @@ -803,13 +803,13 @@ Generating AST objects
----------------------

The output of the C parser used by CPython that is generated by the
:file:`Grammar/Python.gram` grammar file is a Python AST object (using C
:cpy-file:`Grammar/python.gram` grammar file is a Python AST object (using C
structures). This means that the actions in the grammar file generate AST objects
when they succeed. Constructing these objects can be quite cumbersome (see
the :ref:`AST compiler section <compiler-ast-trees>` for more information
on how these objects are constructed and how they are used by the compiler) so
special helper functions are used. These functions are declared in the
:file:`Parser/pegen.h` header file and defined in the :file:`Parser/action_helpers.c`
:cpy-file:`Parser/pegen.h` header file and defined in the :cpy-file:`Parser/action_helpers.c`
file. These functions allow you to join AST sequences, get specific elements
from them or to do extra processing on the generated tree.

Expand All @@ -823,8 +823,8 @@ from them or to do extra processing on the generated tree.

As a general rule, if an action spawns multiple lines or requires something more
complicated than a single expression of C code, is normally better to create a
custom helper in :file:`Parser/action_helpers.c` and expose it in the
:file:`Parser/pegen.h` header file so it can be used from the grammar.
custom helper in :cpy-file:`Parser/action_helpers.c` and expose it in the
:cpy-file:`Parser/pegen.h` header file so it can be used from the grammar.

If the parsing succeeds, the parser **must** return a **valid** AST object.

Expand All @@ -833,14 +833,14 @@ Testing

There are three files that contain tests for the grammar and the parser:

* ``Lib/test/test_grammar.py``.
* ``Lib/test/test_syntax.py``.
* ``Lib/test/test_exceptions.py``.
* :cpy-file:`Lib/test/test_grammar.py`
* :cpy-file:`Lib/test/test_syntax.py`
* :cpy-file:`Lib/test/test_exceptions.py`

Check the contents of these files to know which is the best place to place new tests depending
on the nature of the new feature you are adding.

Tests for the parser generator itself can be found in the :file:`Lib/test/test_peg_generator` directory.
Tests for the parser generator itself can be found in the :cpy-file:`Lib/test/test_peg_generator` directory.


Debugging generated parsers
Expand All @@ -854,7 +854,7 @@ new rules to the grammar you cannot correctly compile and execute Python anymore
to debug when something goes wrong, especially when making experiments.

For this reason it is a good idea to experiment first by generating a Python parser. To do this, you can go to the
:file:`Tools/peg_generator/` directory on the CPython repository and manually call the parser generator by executing:
:cpy-file:`Tools/peg_generator/` directory on the CPython repository and manually call the parser generator by executing:

.. code-block:: shell
Expand All @@ -874,7 +874,7 @@ Verbose mode
------------

When Python is compiled in debug mode (by adding ``--with-pydebug`` when running the configure step in Linux or by
adding ``-d`` when calling the :file:`PCbuild/build.bat` script in Windows), it is possible to activate a **very** verbose
adding ``-d`` when calling the :cpy-file:`PCbuild/build.bat` script in Windows), it is possible to activate a **very** verbose
mode in the generated parser. This is very useful to debug the generated parser and to understand how it works, but it
can be a bit hard to understand at first.

Expand Down

0 comments on commit f0c9151

Please sign in to comment.