Skip to content

At the Mountains of Tech Debt: Refactor our API doc generation script to be readable #2109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Jun 15, 2024

Conversation

pushfoo
Copy link
Member

@pushfoo pushfoo commented Jun 14, 2024

TL;DR: I made our doc script legible to humans

Why?

When did we add update_quick_index.py?

The stars were different back then. So was Arcade.

There are traces of the era in deep git history. Friendly type annotations hadn't evolved yet. Instead, miserable type names slithered through the code base. Amid the oozing Sphinx reST syntax and slimy docstrings, those primitive ancestors of the annotation clung to safety between :param and their argument names.

"Sphinx is literally from another era." - @nfearnley

That was the world in which update_quick_index.py was first laid down. Both the stars and Arcade were smaller and simpler then.

What went wrong?

TL:DR: The script failed to scale with the project

It's understandable tech debt. We added a lot of modules and complexity, and it's creaking:

  • When you try to add all the new camera modules, it explodes into a shower of errors.
  • When all PRs reported as broken for months due to doc issues, one of the factors was this script

Debugging it's prior state is very painful. How painful? Plenty of long-term contributors have agreed the fie is deep in awful tech debt. To start with, here are the top 3:

Contributor Encounter Aftermath
@Cleptomania Tried writing reST Refuses to touch Sphinx again
@DigiDuncan #2088 (Rectangles) Repeatedly asks about manually writing all API doc in Markdown
@einarf Started writing a new parser Deleted rewrite attempt

What's fixed

TL;DR: Instead of guessing module names from a filename and directory, this PR refactors the script to use them as config

Before During This PR
image image API ref conf is:
  1. file name
  2. title
  3. a list of modules

Before column base image used under CC Attribution 3.0 Unported via Wikimedia Commons, originally by Nottsuo on DeviantArt.

In practice, declarations look like this:

API_FILE_TO_TITLE_AND_MODULES = {
    "types.rst": {
        "title": "Types",
        "use_declarations_in": [
            "arcade.types",
            "arcade.types.numbers",
            "arcade.types.vector_like",
            "arcade.types.color",
            "arcade.types.rect"
        ]
    },
    ...
}

How?

TL;DR: Untangle the script's current requirements, delete old cruft, and uninvert the file mapping

The Old Logic

Before, we had a Rube Goldberg machine reassembling which module items should go in via:

  • partial filenames
  • a few scattered dicts
  • some prepend logic

We also had a bunch of obsolete arcade.gl declarations we don't use since einarf documented that manually. Accounting for that allows for incremental upgrades. That's the key difference between this PR and prior attempts to tackle the issue. No matter how brave and monumentally skilled contributors are, we're still all human. That means total rewrites often fail for big projects.

The New Logic

TL;DR: You can now understand the config and control flow, but the regex is still mostly the same

After cutting out all the obsolete entries and special casing, the requirements simplify radically. It seems like we can get away with the following1 for each API page:

CONFIG = {
  # This will be written to doc/api_docs/api/
  "filename.rst" : {
    "title": "Page Title Here",
    "use_declarations_in": [
       # The declarations but not imports are scanned, exactly as before
       "arcade.submodule.to.regex",
       "arcade.another.submodule",
       "arcade.etc"
    ]
  }
}

In theory, you (yes, you!) can2 add modules to API pages in the exact same way. In practice, keep reading since there's work left to do as follow-up.

To get your spirits up, here's more good news: the parsing is now more configurable:

  1. The logic for ignoring LOG is now part of a regex via negative look-ahead
  2. Getting module declarations is only a matter of passing in a dict[str, re.Pattern]

Example:

declared = get_file_declarations(
  get_module_path('arcade.example.module'),
  {
    "class": CLASS_RE,
    "union": re.compile(r'YOUR_PATTERN_HERE'),
    ...
  })

Follow-up Work

Fix Duplicate Member Errors

I'm working on this locally and see some light that seems a bit like hope. It's a pre-requisite for getting things other than Camera2D into the doc.

Config Files?

Since config's just a dict...

  1. We can probably make this neater with the team's choice of:

    • dataclasses or typing.NamedTuple
    • the attrs package
    • Pydantic

Also, we can probably put things into JSON, TOML, or YAML...or anywhere else we want. I'm undecided on in-line decorators, but there's precedent for putting doc config in modules:

skip_extensions = arcade.resources._resource_list_skip_extensions

Investigate Better Parsing

Regex is kinda bad. I've looked at griffe, but it seems painfully slow for this among other issues. Maybe importing with inspect and pyglet's doc run detection will help? We'd have to force it since I don't think the script inherits the outer interpreter's sys changes, but we'll see.

Just use automod?

This is looking like an appealing option if we can get the duplicate member stuff sorted. This PR and later ones can be used as reference for that.

Footnotes

  1. This will probably change and evolve, but a structured format makes it more durable

  2. There's still more work to be done here due to some duplicate member errors with Sphinx. They seem like they shouldn't be possible, but then again, that's life.

pushfoo added 30 commits June 14, 2024 01:39
* Add a top-level docstring explaining the build loop

* Add docstrings explaining intended usage in build scripts for IDE users
* Split and comment dir and file mappings in process_directory

* Make print statement include variable names
* ROOT -> REPO_ROOT

* Add ARCADE_ROOT constant

* Centralize the API doc folder into API_DOC_DIR constant
* Add VirtualFile.include_file

* Replace usage of include_template in update_quick_index.py

* Remove include_template
* Move shared paths into a utility class in vfs.py

* Comment the sys.path.insert lines explaining why we have those

* Use imports in existing scripts
* Rename delete_glob

* Rephrase docstring to be clearer

* Remove extra str wrapping the shared API_DOC_DIR path
* Use future-style | annotations for paths

* Remove Union import
* Fix name in special casing rules

* Fix rendering indent

* Move the member up with the OpenGL commented out lines
pushfoo added 24 commits June 14, 2024 01:41
* Add exclusion helper callable class

* Make exclusion list constant cased

* Use filter + callables to remove indent level and simplify code
* Remove quick index file from signature of process_directory

* Use with block context manager in main()

* open(QUICK_INDEX_PATH, "a") inside process_directory
* Move main() call into if __name__ == "__main__" check

* Pass vfs into proces_directory

* Update process_directory docstring
* Use negative lookahead feature in TYPE_RE

* Remove the if check making regexes non-interchangeable
* Add passable expressions via member_expressions keyword arg

* Change return type to a dict instead of named tuple

* Replace usage in process_directory
* Remove named group validation and enforcement

* Update docstring
* Add simple get_module_path function

* Rename declarations function to get_file_declarations

* Update get_file_declarations docstring

* Comment out type expression in defaults for now since we can't render those yet

* Fix broken declarations out and comment out ambiguous ones
* Remove extra whitespace

* Comment fixes

* Path fixes for Vfs + update_quick_index.py
@pvcraven pvcraven merged commit 58bde7e into pythonarcade:development Jun 15, 2024
8 checks passed
@pushfoo pushfoo mentioned this pull request Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants