Description
I was wondering if you could help me with a recurrent issue which I can find no repeatable solution for. Giving this URL as an example: https://www.newcleo.com/. I have tried many combinations of wait_for, and various js_code strategies but cannot access the actual page. I don't see any significant anti-bot measures on chrome. I do notice that for a split second a .gif animation pops up before the page renders. If i try to use crawl4ai without a delay I basically scrape this url. If i add a delay I see the following error.
async def main():
async with AsyncWebCrawler(always_by_pass_cache=True, verbose=True) as crawler:
result = await crawler.arun(
url="https://www.newcleo.com/",
magic=True,
headless=True,
#delay_before_return_html=5.0
)
print(result.markdown)
return None
if __name__ == "__main__":
asyncio.run(main())
Crawl4AI Error: This page is not fully supported. Possible reasons: 1. The page may have restrictions that prevent crawling. 2. The page might not be fully loaded. Suggestions: - Try calling the crawl function with these parameters: magic=True, - Set headless=False to visualize what's happening on the page. If the issue persists, please check the page's structure and any potential anti-crawling measures.
Thanks for any help!
Activity