Closed
Description
Rotating proxy is a very good feature when dealing with multiple urls to avoid rate limit or even ip ban. However, there a bug with sample code in "document - advanced - proxy & security". The sample code on "Rotating Proxies" suggest:
from crawl4ai.async_configs import BrowserConfig
async def get_next_proxy():
# Your proxy rotation logic here
return {"server": "http://next.proxy.com:8080"}
browser_config = BrowserConfig()
async with AsyncWebCrawler(config=browser_config) as crawler:
# Update proxy for each request
for url in urls:
proxy = await get_next_proxy()
browser_config.proxy_config = proxy
result = await crawler.arun(url=url, config=browser_config)
The config in crawler.arun should be CrawlerRunConfig which is not compatible with BrowserConfig object.
Since there is no proxy parameter in CrawlerRunConfig, one quick fix for the above code could be:
import random
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
browser_config = BrowserConfig()
results = []
# Update proxy for each request
for url in urls:
proxy = await get_next_proxy()
browser_config.proxy_config = proxy
crawler = AsyncWebCrawler(config=browser_config)
await crawler.start()
result = await crawler.arun(url=url, config=CrawlerRunConfig())
results.append(result)
await crawler.close()
This works but may not be the best approach, since the crawler is initiated each time when fetching the urls. It would be even better to add update_proxy
function in BrowserConfig object or add proxy
parameter in CrawlerRunConfig initilization.
Activity