-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Make pytest-mpl tests agnostic to freetype changes
The pytest-mpl image comparisons are failing due to differences in font rendering caused by changes in freetype versions, which leads to false test failures. The affected test files that use mpl_image_compare are: test_metrics.py, test_preprocess.py, test_xai.py, and test_unsupervised.py (note: test_math_utils.py and test_strings.py do not use mpl_image_compare based on the code).
Possible solutions to make the tests more robust across environments include the implementations below. Please implement one or more of these solutions to address the problem.
Solution 1: Increasing tolerance values in mpl_image_compare decorators
To implement this, update the @pytest.mark.mpl_image_compare decorators in the affected test files to use higher tolerance values. This relaxes the pixel difference threshold during image comparisons, making tests less sensitive to minor rendering variations caused by freetype changes. Based on the provided test files, start by increasing tolerances incrementally (e.g., from current values like 15-18 to 30-50) and adjust as needed after re-running tests.
Here are the specific code changes for each relevant file:
-
test_metrics.py:
# Existing (example): @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=18) # Change to: @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=50) # Apply similarly to other decorators in the file, e.g., tolerance=15 -> tolerance=40
-
test_preprocess.py:
# Existing (example): @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR) # Change to (add tolerance if missing): @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=30) # Repeat for all mpl_image_compare in the file.
-
test_xai.py:
# Existing (example): @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=53) # Change to: @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=80) # Adjust others similarly, e.g., tolerance=54 -> 80, tolerance=11 -> 30.
-
test_unsupervised.py:
# Existing (example): @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR) # Change to: @pytest.mark.mpl_image_compare(baseline_dir=BASELINE_DIR, tolerance=30)
After updates, regenerate baseline images if necessary by running pytest --mpl-generate-path=tests/baseline_images and commit the new baselines.
Solution 2: Setting a consistent font family in matplotlib rcParams, for example, 'DejaVu Sans'
To ensure consistent font rendering, create a pytest fixture that sets matplotlib.rcParams['font.family'] to a reliable, cross-platform font like 'DejaVu Sans' (which is bundled with Matplotlib). This overrides system font variations that could be affected by freetype.
Add this to tests/conftest.py (create if it doesn't exist):
import pytest
import matplotlib as mpl
@pytest.fixture(autouse=True)
def consistent_font():
"""Fixture to set consistent font family for all tests."""
original_font = mpl.rcParams['font.family']
mpl.rcParams['font.family'] = 'DejaVu Sans'
yield
mpl.rcParams['font.family'] = original_font # Restore after testThis fixture will automatically apply to all tests. If 'DejaVu Sans' isn't available in some environments, fallback to ['sans-serif'] or test with 'Liberation Sans' (as suggested in some Matplotlib discussions). Regenerate baselines after implementing: pytest --mpl-generate-path=tests/baseline_images.
Solution 3: Switching to hash-based comparisons using --mpl-generate-hash-library
Switch from pixel-based image comparisons to hash-based ones, which compute a hash of the image (insensitive to minor pixel differences from freetype) and compare against a stored hash library. This is built into pytest-mpl.
Implementation steps:
-
Install or ensure
pytest-mplis inpyproject.tomlunder[tool.hatch.envs.test.dependencies](it's already there as "pytest-mpl==0.17.0"). -
Create a hash library file, e.g.,
tests/hash_library.json. Generate it by running:pytest --mpl-generate-hash-library=tests/hash_library.jsonThis runs tests and stores hashes instead of images.
-
Update test decorators to use hash mode:
In each mpl_image_compare decorator, addstyle='default'(to ensure consistent rendering) if not present, but no other changes to the decorator itself. -
In
pyproject.toml, add pytest config to use hash mode by default:[tool.pytest.ini_options] mpl-hash-library = "tests/hash_library.json" # Path to your hash file mpl-use-full-test-name = true # Optional: Use full test name for hash keys
-
To encode versions (as recommended for robustness), name the hash file with Matplotlib and FreeType versions, e.g.,
hash_library_mpl3.8_ft2.12.json. Query versions in code:import matplotlib as mpl import matplotlib.font_manager as fm print(mpl.__version__) print(fm.ft2font.__freetype_version__)
Run tests with
--mplto use hashes:pytest --mpl.
Commit the hash library file(s) to the repo. For CI (e.g., in test.yml), add --mpl to the python -m hatch run test:run command.
Solution 4: Investigating how matplotlib handles similar issues in their test suite
Based on web searches and browsing Matplotlib's GitHub, here's a summary of how Matplotlib addresses freetype-related test failures in their pytest-mpl image comparisons:
-
Vendoring FreeType: Matplotlib includes its own internal (vendored) copy of FreeType in installations. This ensures consistent font rasterization across environments, as different system FreeType versions produce varying character shapes, leading to pixel differences in images. For tests, they explicitly use this vendored version to make comparisons reproducible. (Sources: Matplotlib dependencies docs, pytest-mpl docs, and developer tips).
-
Font Configuration: They default to fonts like 'DejaVu Sans' (bundled with Matplotlib) and recommend clearing font caches or installing consistent fonts (e.g., Liberation Sans) in CI workflows to avoid discrepancies. Discussions highlight dropping Matplotlib's font cache (
rm ~/.matplotlib/fontlist-*.json) to force rebuilds with consistent fonts. -
Tolerance and Hashing in pytest-mpl: While not always using high tolerances, they leverage pytest-mpl's hybrid mode (hashes + images) and encode Matplotlib/FreeType versions in hash filenames (e.g.,
mpl3.5_ft2.6.json) for version-specific baselines. This avoids failures from minor rendering changes. They also use RMS (root mean square) residuals for comparisons, with adjustable tolerances. -
CI Workflow Insights: Their GitHub Actions workflows (from
.github/workflows/tests.yml) run tests across multiple OS/Python versions, but don't explicitly pin FreeType; instead, they rely on the vendored copy. Issues/discussions (e.g., on Discourse and pytest-mpl repo) show they handle failures by standardizing FreeType in CI (e.g., allowing Matplotlib to install its internal FreeType) and avoiding system dependencies. -
Other Practices: For font consistency with LaTeX or across plots, they customize
rcParams['font.family']. Tests fail on font weight/size differences, so they emphasize reproducible environments over high tolerances.
Overall, vending FreeType is their core strategy for freetype agnosticism, combined with hashing for robustness. For your repo (which can't easily vendor FreeType), solutions 2 (consistent fonts) or 3 (hashing) align closest with their approach. No open issues found for recent freetype failures, suggesting their setup works well.