Skip to content

Installation breaking due to lxml>=5.x #630

Open
@Abdullah0297445

Description

@Abdullah0297445

Describe the bug
Trying to install newskpaper4k via pip. And getting the error:

ImportError: lxml.html.clean module is now a separate project lxml_html_clean.

To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:

  1. pip install newspaper4k
  2. See the following traceback:
[stderr] from newspaper import Article as NPArticle
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/__init__.py", line 17, in <module>
[stderr] from .api import (
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/api.py", line 8, in <module>
[stderr] from .article import Article
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/article.py", line 21, in <module>
[stderr] from . import network
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/network.py", line 15, in <module>
[stderr] from newspaper import parsers
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/parsers.py", line 18, in <module>
[stderr] import lxml.html.clean
[stderr] File "/usr/local/lib/python3.11/site-packages/lxml/html/clean.py", line 18, in <module>
[stderr] raise ImportError(
[stderr] ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
[stderr] Install lxml[html_clean] or lxml_html_clean directly.

Expected behavior
Installation via pip should've worked.

System information

  • OS: python3.11-slim in Docker
  • Python version [3.11]
  • newspaper4k [0.9.1]
  • lxml [5.1.0]

Workaround
Anyone who's having this issue, for now just add lxml[html_clean]==5.2.0 in your requirements.txt file.

Quickfix
To quickly fix the issue in this repo, for now we can edit this line in pyproject,toml file and pin the version of lxml below 5.x:
https://github.com/AndyTheFactory/newspaper4k/blob/b5b20976bd320f89ffa25b8d4a7a94d190ee549a/pyproject.toml#L34C3-L34C15

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions