Closed
Description
Starting with pip 22.0, the HTML parsing is done using html.parser
instead of html5lib
by default. Along with this, there's an additional check to ensure that a valid HTML 5 doctype declaration is present in the document.
If you're here from a warning/error from pip's output:
- Please reach out to the provider of the package index you're using and ask them to change the index pages to be valid HTML 5 documents (declaring doctype, having the correct structure etc).
- You may pass
--use-deprecated=html5lib
until pip 22.2 (i.e. start of Q3 2022), when this flag will be dropped. This will suppress the warning for now, however you will no longer be able to pass this flag once pip 22.2 is released (and will need to fix the index pages to suppress the warning).
This behaviour change is motivated by two major factors:
- html5lib is the reason that pip pulls in dropping various other libraries, as part of its own dependency graph. Dropping html5lib and its dependencies from pip, enables reducing the maintainance workload on pip's maintainers and helps reduce the size of pip's distributions.
- The Python standard library's
html.parser
is more than sufficient for parsing the pages that pip needs to parse (see https://pypi.org/simple/pip/ for example).
Barring major surprises, the flag to use html5lib will be removed in 22.1. There were surprises.
- The initial implementation of the
html.parser
-based parsing enforced that the page contains a doctype, throwing an error if it did not. Turns out, many third-party package indexes did not include a<!doctype html>
in their index pages. - With pip 22.0.1, certain bugs in the fallback logic were fixed, for pages that did not include the doctype.
- With pip 22.0.2, a fallback to the legacy html5lib logic was introduced, for pages that don't start with
<!doctype html>
(case-insensitive) with a warning presented to the user. - With pip 22.0.3, the fallback to the legacy html5lib logic has been removed and the strict error in the
html.parser
logic has been relaxed to be a warning. - With pip 22.0.4, the warning has been removed. Users will no longer get a warning on an invalid or missing doctype. However, this should still be fixed since a future version of pip may start rejecting such pages (after a deprecation period of ~3-6 months).