Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx breaks transforms that depend on document.transformer.components, like the standard Filter transform #9632

Open
cpitclaudel opened this issue Sep 13, 2021 · 3 comments

Comments

@cpitclaudel
Copy link
Contributor

cpitclaudel commented Sep 13, 2021

Describe the bug

Sphinx does not give transforms (nor post-transforms) a way to determine which writer will be used. This is because it reads the document with a DummyWriter to generate a doctree, and then it calls post_transforms with an empty list of self.document.transformer.components.

This breaks transforms that depend on components, including the standard docutils.transforms.components.Filter.

How to Reproduce

The standard Filter transform does this:

    def apply(self):
        pending = self.startnode
        component_type = pending.details['component'] # 'reader' or 'writer'
        format = pending.details['format']
        component = self.document.transformer.components[component_type]
        if component.supports(format):
            pending.replace_self(pending.details['nodes'])
        else:
            pending.parent.remove(pending)

This breaks with Sphinx: if added as a regular transform by note_pending, components[component_type] will be Sphinx' DummyWriter if component_type is "writer", and the call to supports will return the wrong results. If added as a post-transform it throws an exception because components is empty.

Here is another example simplified from a separate project; it works with plain Docutils, but not with Sphinx:

class MyTransform(Transform):
    def apply(self):
        formats = set(self.document.transformer.components['writer'].supported)
        for node in self.document.traverse(some_pending_node_type):
            if "html" in formats:
                node.replace_self(nodes.raw("<em>Hello!</em>", format="html"))
            if {'latex', 'xelatex', 'lualatex'} & formats:
                node.replace_self(nodes.raw(r"\emph{Hello!}", format="latex"))

If added as a regular transform, components['writer'] will be Sphinx' DummyWriter and supported will be only {'html'}. If added as a post_transform the code will throw an exception because components won't have a 'writer' key.

Expected behavior

Ideally, the Filter transform (and other similar transforms) should just work, which might require running transforms after the caching stage, with the correct set of components (I imagine DummyWriter is for caching purposes?).

If that's not possible, then maybe it's possible for post_transforms? At the moment post_transforms do not see any components at all.

If that's not possible, then it would be nice to have some (Sphinx-specific, unfortunately) way to determine the list of formats supported by the writer from a post-transform.

Python version

Python 3.8.10

Sphinx version

sphinx-build 3.5.4

@cpitclaudel
Copy link
Contributor Author

(For future visitors, I should add that the "right" way to do this under Sphinx is using document.settings.env.app.tags in a post-transform. Still, it would be much nicer if post-transforms were called with an apropriately set-up list of components.

Another note: if reading from cache, Sphinx actually calls post-transforms with document.transformer == None, not just document.transformer.components == [].

@tk0miya
Copy link
Member

tk0miya commented Sep 19, 2021

You're right. Sphinx outputs intermediate doctree using DummyWriter for some purpose; cache, build cross-references, and so on. Hence the pending nodes depend on output format will not work as expected. We must admit it's a restriction of Sphinx at this moment.

If that's not possible, then maybe it's possible for post_transforms? At the moment post_transforms do not see any components at all.
If that's not possible, then it would be nice to have some (Sphinx-specific, unfortunately) way to determine the list of formats supported by the writer from a post-transform.

Good point. You can refer to what builder is used and what format is specified on post_transforms. Please check self.document.settings.env.app.builder on post_transforms (As a shortcut, you can use self.app.builder instead if your transform inherits sphinx.transforms.SphinxTransform).

I hope this will help your case.

Note: Sphinx wraps docutils' writer component by Builder. The writer is an internal component of the builders.

@cpitclaudel
Copy link
Contributor Author

Please check self.document.settings.env.app.builder on post_transforms (As a shortcut, you can use self.app.builder instead if your transform inherits sphinx.transforms.SphinxTransform).

Thanks, but that's still Sphinx-specific, which still means that there's no way to write code compatible with Sphinx and Docutils without special-casing Sphinx.

if your transform inherits sphinx.transforms.SphinxTransform

That also breaks compatibility, since SphinxTransform doesn't exist in Docutils

Is it not possible to set document.transformer.components to the same list as Docutils would? This would greatly improve compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants