Skip to content

Configuration hash in BuildInfo may be unstable #11777

Closed
@theOehrly

Description

@theOehrly

Describe the bug

The Sphinx HTML builder verifies that the configuration is unchanged by hashing all configuration values and storing this hash in a .buildinfo file. On the next run the current and the previous hash are compared to determine whether the configuration has changed. If a change is detected, all source files are marked as out of date and rebuilt.

Given that the configuration is a Python file, a reference to a function can be used as a configuration value. Some Sphinx extensions make use of this and let the user define custom functions for specific tasks. One example of this is the Sphinx-Gallery extension.

Quoting one example from the Sphinx-Gallery documentation:

sphinx_gallery_conf = {
   ...
   'reset_modules': ('matplotlib', 'seaborn'),
}

Currently, Sphinx-Gallery natively supports resetting matplotlib and seaborn. However, you can also add your own custom function to this tuple

The Sphinx HTML builder hashes the string representation the configuration value object:

return hashlib.md5(str(obj).encode(), usedforsecurity=False).hexdigest()

The string representation of a function is of course something like <function test at 0x0000020C4644D6C0> where the memory address is changing for every run.

This means, that when using any extension that uses a reference to a function as a configuration value, the documentation will be completely rebuilt on every run.

Issues caused by this:

  • the build time increases considerably on large projects (in the order of Minutes with Matplotlib, for example)
  • Breaks reproducibility in html builders #4240 is likely caused by this as well, as Scikit-Learn is using Sphinx-Gallery as well (but I haven't verified that theory)

It is of course debatable, whether this is a bug or not. But given that the configuration file is a Python file, it is not really unexpected that people want to use the possibility of just passing a function object as a configuration value.

It is not reliably possible to actually hash the code within such a function, therefore it is not possible for Sphinx to determine if the function itself has changed.

I see three potential options here:

  1. Exclude values that are a reference to a function when generating the hash
  2. Only hash the function name to get a stable hash
  3. Decide that this is unintended functionality and that modules like Sphinx-Gallery should adapt their approach instead

In my opinion, a user would not expect Sphinx to automatically recognize a change in a user provided function, therefore option 1 or 2 would be acceptable.

How to Reproduce

pip install sphinx sphinx-gallery
sphinx-quickstart --sep --project test --author me -l en -r 0.0.1
mkdir examples
echo "" > examples/readme.txt

Add the following content to the end of the default conf.py in the source directory:

extensions = [
    'sphinx_gallery.gen_gallery',
]


def example_func():
    return


sphinx_gallery_conf = {
   'reset_modules': (example_func, ),
}

Run make html and notice that all HTML files will always be rebuilt without making any changes between runs. If you run with the -vv option (or higher) you will find [build target] did not match: build_info in the output.

Environment Information

Platform:              win32; (Windows-10-10.0.19044-SP0)
Python version:        3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)])
Python implementation: CPython
Sphinx version:        7.2.6
Docutils version:      0.20.1
Jinja2 version:        3.1.2
Pygments version:      2.16.1

Sphinx extensions

extensions = [
    'sphinx_gallery.gen_gallery',
]

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions