Skip to content

Encoding 'text/html; charset="utf-8"' will cause LookupError: unknown encoding: 'b'"utf-8"'' #12

Closed
@andyfcx

Description

Have you searched if there an existing issue for this?

  • I have searched the existing issues

Python version (python --version)

Python 3.11

Scrapling version (scrapling.version)

0.2.4

Dependencies version (pip3 freeze)

aiofiles 24.1.0 File support for asyncio.
anyio 4.6.2.post1 High level compatibility layer for multiple asynchronous event loop implementations
brotli 1.1.0 Python bindings for the Brotli compression library
browserforge 1.1.2 Intelligent browser header & fingerprint generator
camoufox 0.3.10 Wrapper around Playwright to help launch Camoufox
certifi 2024.8.30 Python package for providing Mozilla's CA Bundle.
charset-normalizer 3.4.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
click 8.1.7 Composable command line interface toolkit
cssselect 1.2.0 cssselect parses CSS3 Selectors and translates them to XPath 1.0
cython 3.0.11 The Cython compiler for writing C extensions in the Python language.
filelock 3.16.1 A platform independent file lock.
greenlet 3.1.1 Lightweight in-process concurrent programming
h11 0.14.0 A pure-Python, bring-your-own-I/O implementation of HTTP/1.1
httpcore 1.0.7 A minimal low-level HTTP client.
httpx 0.27.2 The next generation HTTP client.
idna 3.10 Internationalized Domain Names in Applications (IDNA)
language-tags 1.2.0 This project is a Python version of the language-tags Javascript project.
lxml 5.3.0 Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
markdown-it-py 3.0.0 Python port of markdown-it. Markdown parsing, done right!
mdurl 0.1.2 Markdown URL utilities
numpy 2.1.3 Fundamental package for array computing in Python
orjson 3.10.11 Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
platformdirs 4.3.6 A small Python package for determining appropriate platform-specific dirs, e.g. a user data dir.
playwright 1.48.0 A high-level API to automate web browsers
pyee 12.0.0 A rough port of Node.js's EventEmitter to Python with a few tricks of its own
pygments 2.18.0 Pygments is a syntax highlighting package written in Python.
pyobjc-core 10.3.1 Python<->ObjC Interoperability Module
pyobjc-framework-cocoa 10.3.1 Wrappers for the Cocoa frameworks on macOS
pysocks 1.7.1 A Python SOCKS client module. See https://github.com/Anorov/PySocks for more information.
pyyaml 6.0.2 YAML parser and emitter for Python
rebrowser-playwright 1.48.100 A high-level API to automate web browsers
requests 2.32.3 Python HTTP for Humans.
requests-file 2.1.0 File transport adapter for Requests
rich 13.9.4 Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
scrapling 0.2.4 Scrapling is a powerful, flexible, and high-performance web scraping library for Python. It
screeninfo 0.8.1 Fetch location and size of physical screens.
sniffio 1.3.1 Sniff out which async library your code is running under
tldextract 5.1.3 Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions. You can optionally ...
tqdm 4.67.0 Fast, Extensible Progress Meter
typing-extensions 4.12.2 Backported and Experimental Type Hints for Python 3.8+
ua-parser 0.18.0 Python port of Browserscope's user agent parser
urllib3 2.2.3 HTTP library with thread-safe connection pooling, file post, and more.
w3lib 2.2.1 Library of web-related functions
zstandard 0.23.0 Zstandard bindings for Python

What's your operating system?

macOS 15.0 (24A335)

Are you using a separate virtual environment?

Yes

Expected behavior

'utf-8'

Actual behavior (Remember to use debug parameter)

'"utf-8"'

Steps To Reproduce

  1. content_type = 'text/html; charset="utf-8"'
  2. Run scrapling/engines/camo.py, line 109~113

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions