-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Closed
Labels
⚙ DoneBug fix, enhancement, FR that's completed pending releaseBug fix, enhancement, FR that's completed pending release✨ EnhancementImprovement on an existing featureImprovement on an existing feature
Milestone
Description
crawl4ai version
0.6.1
Expected Behavior
Expect the crawl to reduce the crawl time when using arun_many and run the crawls in parallel.
Current Behavior
The url crawls happen in a sequence.
Tested both with stream mode on/off.
In the other examples there is a log that begins with 'PARALLEL', which is never shown in my case.
Is this reproducible?
Yes
Code snippets
deep_seek_crawler_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
stream=True,
extraction_strategy=LLMExtractionStrategy(
llm_config=LLMConfig(provider="deepseek/deepseek-chat", api_token=os.environ['DEEPSEEK_API_KEY'], base_url="https://api.deepseek.com"),
schema=OpenCall.model_json_schema(),
extraction_type="schema",
instruction="",
#extra_args={"temperature": 0.1, "max_tokens": 1000},
#chunk_token_threshold=1000,
)
)
# controls how many crawlers run at once and how much memory to use from machine
dispatcher = MemoryAdaptiveDispatcher(
memory_threshold_percent=95.0,
check_interval=1.0,
max_session_permit=20,
monitor=CrawlerMonitor(enable_ui=False)
)
async with AsyncWebCrawler(config=browser_config) as crawler:
logger.info("Starting the crawl...")
async for result in await crawler.arun_many(
urls=google_search,
config=deep_seek_crawler_config,
dispatcher=dispatcher
):OS
macOS
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Metadata
Metadata
Assignees
Labels
⚙ DoneBug fix, enhancement, FR that's completed pending releaseBug fix, enhancement, FR that's completed pending release✨ EnhancementImprovement on an existing featureImprovement on an existing feature