Description
Describe the bug
The Sphinx HTML builder verifies that the configuration is unchanged by hashing all configuration values and storing this hash in a .buildinfo
file. On the next run the current and the previous hash are compared to determine whether the configuration has changed. If a change is detected, all source files are marked as out of date and rebuilt.
Given that the configuration is a Python file, a reference to a function can be used as a configuration value. Some Sphinx extensions make use of this and let the user define custom functions for specific tasks. One example of this is the Sphinx-Gallery extension.
Quoting one example from the Sphinx-Gallery documentation:
sphinx_gallery_conf = { ... 'reset_modules': ('matplotlib', 'seaborn'), }
Currently, Sphinx-Gallery natively supports resetting matplotlib and seaborn. However, you can also add your own custom function to this tuple
The Sphinx HTML builder hashes the string representation the configuration value object:
sphinx/sphinx/builders/html/__init__.py
Line 88 in 3596590
The string representation of a function is of course something like
<function test at 0x0000020C4644D6C0>
where the memory address is changing for every run.
This means, that when using any extension that uses a reference to a function as a configuration value, the documentation will be completely rebuilt on every run.
Issues caused by this:
- the build time increases considerably on large projects (in the order of Minutes with Matplotlib, for example)
- Breaks reproducibility in html builders #4240 is likely caused by this as well, as Scikit-Learn is using Sphinx-Gallery as well (but I haven't verified that theory)
It is of course debatable, whether this is a bug or not. But given that the configuration file is a Python file, it is not really unexpected that people want to use the possibility of just passing a function object as a configuration value.
It is not reliably possible to actually hash the code within such a function, therefore it is not possible for Sphinx to determine if the function itself has changed.
I see three potential options here:
- Exclude values that are a reference to a function when generating the hash
- Only hash the function name to get a stable hash
- Decide that this is unintended functionality and that modules like Sphinx-Gallery should adapt their approach instead
In my opinion, a user would not expect Sphinx to automatically recognize a change in a user provided function, therefore option 1 or 2 would be acceptable.
How to Reproduce
pip install sphinx sphinx-gallery
sphinx-quickstart --sep --project test --author me -l en -r 0.0.1
mkdir examples
echo "" > examples/readme.txt
Add the following content to the end of the default conf.py
in the source
directory:
extensions = [
'sphinx_gallery.gen_gallery',
]
def example_func():
return
sphinx_gallery_conf = {
'reset_modules': (example_func, ),
}
Run make html
and notice that all HTML files will always be rebuilt without making any changes between runs. If you run with the -vv
option (or higher) you will find [build target] did not match: build_info
in the output.
Environment Information
Platform: win32; (Windows-10-10.0.19044-SP0)
Python version: 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)])
Python implementation: CPython
Sphinx version: 7.2.6
Docutils version: 0.20.1
Jinja2 version: 3.1.2
Pygments version: 2.16.1
Sphinx extensions
extensions = [
'sphinx_gallery.gen_gallery',
]
Additional context
No response