Skip to content

Bug with "rotating proxies" #460

Closed
Closed
@jiezi4ai

Description

Rotating proxy is a very good feature when dealing with multiple urls to avoid rate limit or even ip ban. However, there a bug with sample code in "document - advanced - proxy & security". The sample code on "Rotating Proxies" suggest:

from crawl4ai.async_configs import BrowserConfig

async def get_next_proxy():
    # Your proxy rotation logic here
    return {"server": "http://next.proxy.com:8080"}

browser_config = BrowserConfig()
async with AsyncWebCrawler(config=browser_config) as crawler:
    # Update proxy for each request
    for url in urls:
        proxy = await get_next_proxy()
        browser_config.proxy_config = proxy
        result = await crawler.arun(url=url, config=browser_config)

The config in crawler.arun should be CrawlerRunConfig which is not compatible with BrowserConfig object.
Since there is no proxy parameter in CrawlerRunConfig, one quick fix for the above code could be:

import random
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig

browser_config = BrowserConfig()
results = []
# Update proxy for each request
for url in urls:
    proxy = await get_next_proxy()
    browser_config.proxy_config = proxy
    crawler = AsyncWebCrawler(config=browser_config)
    await crawler.start()
    result = await crawler.arun(url=url, config=CrawlerRunConfig())
    results.append(result)
    await crawler.close()

This works but may not be the best approach, since the crawler is initiated each time when fetching the urls. It would be even better to add update_proxy function in BrowserConfig object or add proxy parameter in CrawlerRunConfig initilization.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

🐞 BugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions