Skip to content

read_html docs inconsistent with behaviour if lxml not installed #30281

Open
@attack68

Description

@attack68

Code Sample, a copy-pastable example if possible

>>> dfs = pd.read_html('<table><tr><td>1</td></tr></table>')
ImportError: lxml not found, please install it
>>> dfs = pd.read_html('<table><tr><td>1</td></tr></table>', flavor='bs4')
[ 0
0 1]

Problem description

The documentation explictly states, in the HTML-parsing-gotchas page and the argument docstring that the fall back is to 'bs4+html5lib' if 'lxml' fails.
For me there is no fallback, just an ImportError since I do not have lxml installed.

Expected Output

Docs should be changed to require explicit flavor input, or the ImportError is caught.

Output of pd.show_versions()

pandas == 0.25.3

Metadata

Metadata

Assignees

Labels

BugIO HTMLread_html, to_html, Styler.apply, Styler.applymap

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions