-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
A recent refactor of the Cover block (to use an <img /> element rather than a CSS background, in order to enable srcsets) caused a number of visual regressions (e.g. #28242). Those regressions were addressed by a number of follow-up PRs (#28114, #28287, #28350, #28361, #28364, #28404).
That sequence of PRs (including some back-and-forth of alleged fixes and reverts) demonstrates that it's notoriously hard to get styling right across a matrix of possible configurations, such as different themes, frontend vs editor, and different block alignments. An attempt at a somewhat organized testing strategy (listing configurations that turned out especially fragile) was made at #28404 (comment).
In order to guard against similar regressions in the future, we should add some automated testing, as manual testing of these different configurations is tedious, and it's easy to miss something.
Unfortunately, it seems like we don't have an established mechanism for these:
- Our snapshot testing focuses on block markup, i.e. their content and attributes -- not on their styling.
- We cannot simply compare CSS classes or styling, since they might change (as in the example given above), whereas the resulting visuals might look the same.
For these reasons, I wonder if it's time to introduce a framework for visual diff testing. It could work on different levels (e.g. block level, or page level), and akin to our existing snapshot testing (store "expected" images in git, compare to those), or compare to the visual output of master.
For the issue documented above, it seems like per-block testing would be the most promising. (Puppeteer seems to support taking screenshots at DOM element level, so we're propably not very constrained on the implementation side.)
We could for starters try to implement taking a screenshot of the Cover block (with some sample background and content)
- across different themes
- both in the editor and the frontend
on each new PR, and compare them to the corresponding screenshots generated from the master branch.
These notes should serve as the benchmark: our automated tests should find all the problematic cases listed there (if all the relevant fixes are reverted, of course).
A library like Resemble.js (where it's possible to customize the level of "accuracy", and to highlight the differences visually) might come in handy.
Automattic's Calypso repo might have some prior art.