Skip to content

[Bug]: Again with website has anti-bot detection #801

Open
@aidenpearce001

Description

@aidenpearce001

crawl4ai version

0.5.0.post4

Expected Behavior

It's can scrape the data from the website,

Current Behavior

[ERROR]... × https://www.hifiboehm.de/de/produkt/sonos-sub-4-we... | Error: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ × Unexpected error in _crawl_web at line 579 in _crawl_web (.venv/lib/python3.11/site- │ │ packages/crawl4ai/async_crawler_strategy.py): │ │ Error: Failed on navigating ACS-GOTO: │ │ Page.goto: Timeout 60000ms exceeded. │ │ Call log: │ │ - navigating to "https://www.hifiboehm.de/de/produkt/sonos-sub-4-weiss", waiting until "domcontentloaded" │ │ │ │ │ │ Code context: │ │ 574 response = await page.goto( │ │ 575 url, wait_until=config.wait_until, timeout=config.page_timeout │ │ 576 ) │ │ 577 redirected_url = page.url │ │ 578 except Error as e: │ │ 579 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │ │ 580 │ │ 581 await self.execute_hook( │ │ 582 "after_goto", page, context=context, url=url, response=response, config=config │ │ 583 ) │ │ 584 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Is this reproducible?

Yes

Inputs Causing the Bug

- URL: https://www.hifiboehm.de/de/produkt/sonos-sub-4-weiss
- Setting used:
+ ["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"]
+ Headless True
+ user_agent_mode="random"
+ magic=True

Steps to Reproduce

Code snippets

class Crawl4AIAdapter:
    def __init__(self, headless: bool = True, verbose: bool = True):
        # Set up browser configuration with extra args for stability.
        self.browser_config = BrowserConfig(
            headless=headless,
            verbose=verbose,
            extra_args=["--disable-gpu", "--disable-dev-shm-usage", "--no-sandbox"],
        )
        # Use your preferred cache mode (here, DISABLED)
        self.crawl_config = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
        self.crawler = AsyncWebCrawler(
            config=self.browser_config,
            user_agent_mode="random",
            user_agent_generator_config={
                "device_type": "mobile",
                "os_type": "android"
            },
            magic=True,
        )

OS

Ubuntu 22.04

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

[ERROR]... × https://www.hifiboehm.de/de/produkt/sonos-sub-4-we... | Error:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 579 in _crawl_web (.venv/lib/python3.11/site- │
│ packages/crawl4ai/async_crawler_strategy.py): │
│ Error: Failed on navigating ACS-GOTO: │
│ Page.goto: Timeout 60000ms exceeded. │
│ Call log: │
│ - navigating to "https://www.hifiboehm.de/de/produkt/sonos-sub-4-weiss", waiting until "domcontentloaded" │
│ │
│ │
│ Code context: │
│ 574 response = await page.goto( │
│ 575 url, wait_until=config.wait_until, timeout=config.page_timeout │
│ 576 ) │
│ 577 redirected_url = page.url │
│ 578 except Error as e: │
│ 579 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │
│ 580 │
│ 581 await self.execute_hook( │
│ 582 "after_goto", page, context=context, url=url, response=response, config=config │
│ 583 ) │
│ 584 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions