Skip to content

TypeError: argument 'html': 'BeautifulSoup' object cannot be cast as 'str'` when parsing a wiki site. #8

@W1ld3b34st

Description

@W1ld3b34st

Notes: This appears to be a local side error, as it never makes an API call. Do not believe the specific model is relevant, but listing the one I used incase it would be.

Steps:

  1. Set up GLM 4.6 as the model.
  2. Use Prompt Engineering for the Project.
  3. Move to "Generate Entries"
  4. Attempt to Parse "https://highschooldxd.fandom.com/wiki/Sitri"
  5. (Hopefully) get error.

Console Output:

INFO - 2025-10-28 14:12:36,685 - uvicorn.access - h11_impl - 127.0.0.1:61387 - "GET /api/jobs/latest?project_id=high-school-dxd&task_name=process_project_entries HTTP/1.1" 200
INFO - 2025-10-28 14:12:36,687 - uvicorn.access - h11_impl - 127.0.0.1:61387 - "GET /api/projects/high-school-dxd HTTP/1.1" 200
INFO - 2025-10-28 14:12:36,789 - worker - worker - Worker: Submitting job 80706fa6-c0b7-4394-8956-2452f1adf492 (Task: process_project_entries, Active: 1, Limit: 1)
INFO - 2025-10-28 14:12:36,901 - uvicorn.access - h11_impl - 127.0.0.1:61387 - "GET /api/jobs/latest?project_id=high-school-dxd&task_name=process_project_entries HTTP/1.1" 200
INFO - 2025-10-28 14:12:36,902 - uvicorn.access - h11_impl - 127.0.0.1:61387 - "GET /api/projects/high-school-dxd HTTP/1.1" 200
INFO - 2025-10-28 14:12:36,904 - uvicorn.access - h11_impl - 127.0.0.1:61387 - "GET /api/projects/high-school-dxd/links?limit=50&offset=0 HTTP/1.1" 200
INFO - 2025-10-28 14:12:37,057 - httpx - _client - HTTP Request: GET https://highschooldxd.fandom.com/wiki/Sitri "HTTP/1.1 200 OK"
ERROR - 2025-10-28 14:12:37,213 - services.background_jobs - background_jobs - [80706fa6-c0b7-4394-8956-2452f1adf492] I/O phase error processing link 142ada57-674a-4eca-a693-0697afa3b146: argument 'html': 'BeautifulSoup' object cannot be cast as 'str'
Traceback (most recent call last):
  File "C:\SillyTavernAI\lorecard\server\src\services\background_jobs.py", line 1083, in _process_single_link_io
    else await scraper.get_content(link.url, type="markdown", clean=True)
  File "C:\SillyTavernAI\lorecard\server\src\services\scraper.py", line 143, in get_content
    return html_to_markdown(html)
  File "C:\SillyTavernAI\lorecard\server\src\services\scraper.py", line 112, in html_to_markdown
    return convert_to_markdown(soup).strip()
  File "C:\SillyTavernAI\lorecard\server\.venv\lib\site-packages\html_to_markdown\v1_compat.py", line 175, in convert_to_markdown
    return convert_v2(html, options, preprocessing)
  File "C:\SillyTavernAI\lorecard\server\.venv\lib\site-packages\html_to_markdown\api.py", line 110, in convert
    return cast("str", _rust.convert(html, rust_options))
TypeError: argument 'html': 'BeautifulSoup' object cannot be cast as 'str'```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions