Skip to content

Lower HTML memory usage by lowering the number of unnecessary spans #13522

Open
@bolshoytoster

Description

@bolshoytoster

Is your feature request related to a problem? Please describe.
I'm using the angr API reference, which is built with sphinx. Currently the linked page uses ~500MB of memory, mostly due to the sheer number of elements (yes, I know that page is much bigger than the average sphinx page at ~14MB of HTML).

I've noticed that a lot of this is from the many <span class="pre">blah</span>s that are everywhere (they make up ~2MB of the raw HTML). A comment tells me that the spans are there to prevent code from line-wrapping, which is fair, but they're unneccessary if the code is just a single word (in fact, furo, the theme that angr uses, ignores these spans, so they're completely useless).

Describe the solution you'd like
Sphinx shouldn't output these spans for a single word (i.e. len(self.words_and_spaces.findall(encoded)) == 1), in this case, the pre class should be added to the parent element instead.

It's probably hard to do this well with this basically being a string builder, but it could be something like replacing

for token in self.words_and_spaces.findall(encoded):
if token.strip():
# protect literal text from line wrapping
self.body.append('<span class="pre">%s</span>' % token)
elif token in {' ', '\n'}:
# allow breaks at whitespace
self.body.append(token)
else:
# protect runs of multiple spaces; the last one can wrap
self.body.append('&#160;' * (len(token) - 1) + ' ')

with

tokens = self.words_and_spaces.findall(encoded)
# Very fragile, will probably break
if len(tokens) == 1:
    self.body[-1] = f'{self.body[-1][:-2]} pre">'
for token in tokens:
    if token.strip():
        if len(tokens) == 1:
            self.body.append(token)
        else:
            # protect literal text from line wrapping
            self.body.append('<span class="pre">%s</span>' % token)
    elif token in {' ', '\n'}:
        # allow breaks at whitespace
        self.body.append(token)
    else:
        # protect runs of multiple spaces; the last one can wrap
        self.body.append('&#160;' * (len(token) - 1) + ' ')

Describe alternatives you've considered
Just dealing with it, and closing the tab when I'm not using it

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions