How to share and export consistent statistics across multiple crawlers? #966
Replies: 2 comments 3 replies
-
| 
         Hello, you can make a custom instance of  I'm not sure I understand what you want to do with   | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         Thanks @janbuchar. Your answer works fine for me, but the time of the first execution of the scraper does not add to the second scraper. Here are the logs: [crawlee.statistics._statistics] INFO  Statistics
┌───────────────────────────────┬─────────┐
│ requests_finished             │ 0       │
│ requests_failed               │ 0       │
│ retry_histogram               │ [0]     │
│ request_avg_failed_duration   │ None    │
│ request_avg_finished_duration │ None    │
│ requests_finished_per_minute  │ 0       │
│ requests_failed_per_minute    │ 0       │
│ request_total_duration        │ 0.0     │
│ requests_total                │ 0       │
│ crawler_runtime               │ 0.02422 │
└───────────────────────────────┴─────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0.0; mem = 0.0; event_loop = 0.0; client_info = 0.0
[crawlee.crawlers._playwright._playwright_crawler] INFO  Navigating to ...
[crawlee.crawlers._playwright._playwright_crawler] INFO  --- Fetch cookies ---
[crawlee.crawlers._playwright._playwright_crawler] INFO  --- End of cookies ---
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[crawlee.crawlers._playwright._playwright_crawler] INFO  Final request statistics:
┌───────────────────────────────┬───────────┐
│ requests_finished             │ 1         │
│ requests_failed               │ 0         │
│ retry_histogram               │ [1]       │
│ request_avg_failed_duration   │ None      │
│ request_avg_finished_duration │ 11.137499 │
│ requests_finished_per_minute  │ 4         │
│ requests_failed_per_minute    │ 0         │
│ request_total_duration        │ 11.137499 │
│ requests_total                │ 1         │
│ crawler_runtime               │ 14.854891 │
└───────────────────────────────┴───────────┘
[rich] INFO  Found 12 cookies
 >>>>> Here finished the execution of the firsr scraper (Playwright) <<<<<<
[rich] INFO  Obteniendo ítems...
[crawlee.statistics._statistics] INFO  Statistics
┌───────────────────────────────┬───────────┐
│ requests_finished             │ 1         │
│ requests_failed               │ 0         │
│ retry_histogram               │ [1]       │
│ request_avg_failed_duration   │ None      │
│ request_avg_finished_duration │ 11.137499 │
│ requests_finished_per_minute  │ 2543      │
│ requests_failed_per_minute    │ 0         │
│ request_total_duration        │ 11.137499 │
│ requests_total                │ 1         │
│ crawler_runtime               │ 0.023598  │
└───────────────────────────────┴───────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0
[crawlee.crawlers._abstract_http._abstract_http_crawler] INFO - Page index: 0. Items remaining: 31 of 31
[crawlee.crawlers._abstract_http._abstract_http_crawler] INFO - Page index: 1. Items remaining: 11 of 31
[crawlee.crawlers._abstract_http._abstract_http_crawler] INFO - Page index: 2. Items remaining: 0 of 40
[crawlee.crawlers._abstract_http._abstract_http_crawler] INFO - Allitems already processed
[crawlee._autoscaling.autoscaled_pool] INFO  Waiting for remaining tasks to finish
[crawlee.crawlers._abstract_http._abstract_http_crawler] INFO  Final request statistics:
┌───────────────────────────────┬───────────┐
│ requests_finished             │ 4         │
│ requests_failed               │ 0         │
│ retry_histogram               │ [4]       │
│ request_avg_failed_duration   │ None      │
│ request_avg_finished_duration │ 3.891656  │
│ requests_finished_per_minute  │ 13        │
│ requests_failed_per_minute    │ 0         │
│ request_total_duration        │ 15.566624 │
│ requests_total                │ 4         │
│ crawler_runtime               │ 18.088441 │
└───────────────────────────────┴───────────┘Could it be the way I'm running them? Here is a summary of the code: http_client = ...
my_stats = ...
crawler_1 = PlaywrightCrawler(statistics=my_stats)
# ... here I have stored the cookies in a storage/dataset/cookies
await crawler_1.run([my_url])
crawler_1 = BeautifulSoupCrawler(statistics=my_stats)
# Here I have added the cookies I have stored before.
await crawler_2.run([my_url]) | 
  
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I want to use the same statistics for different crawlers. I have one HTTP client that I pass to two crawlers (PlaywrightCrawler and BeautifulSoupCrawler). However, when I execute these crawlers, I receive different statistics.
Additionally, I want to export these statistics into FinalStatistics and save them in a storage format (JSON or CSV). My goal is to manage multiple scrapers and save the statistics to analyze them.
Beta Was this translation helpful? Give feedback.
All reactions