Lazy Crawler is a Python package that simplifies web scraping tasks. Built upon the powerful Scrapy framework, it provides additional utilities and features for easier data extraction. With Lazy Crawler, you can quickly set up and deploy web scraping projects, saving time and effort.
- Simplified Setup: Streamlines the process of setting up and configuring web scraping projects.
- Predefined Library: Comes with a library of functions and utilities for common web scraping tasks, reducing the need for manual coding.
- Easy Data Extraction: Simplifies extracting and processing data from websites, allowing you to focus on analysis and insights.
- Versatile Utilities: Includes tools for finding emails, numbers, mentions, hashtags, links, and more.
- Flexible Data Storage: Provides a pipeline for storing data in various formats such as CSV, JSON, Google Sheets, and Excel.
To get started with Lazy Crawler:
- Install: Ensure Python and Scrapy are installed. Then, install Lazy Crawler via pip:
pip install lazy-crawler
- Create a Project: Create a Python file for your project (e.g.,
scrapy_example.py
) and start coding.
Here's an example of how to use Lazy Crawler in a project:
import os
import scrapy
from scrapy.crawler import CrawlerProcess
from lazy_crawler.crawler.spiders.base_crawler import LazyBaseCrawler
from lazy_crawler.lib.user_agent import get_user_agent
class LazyCrawler(LazyBaseCrawler):
name = "example"
custom_settings = {
'DOWNLOAD_DELAY': 0.5,
'CONCURRENT_REQUESTS': 32,
}
headers = get_user_agent('random')
def start_requests(self):
url = 'https://example.com'
yield scrapy.Request(url, self.parse)
def parse(self, response):
title = response.xpath('//title/text()').get()
yield {'Title': title}
settings_file_path = 'lazy_crawler.crawler.settings'
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', settings_file_path)
process = CrawlerProcess()
process.crawl(LazyCrawler)
process.start()
For more information and examples of how to use Lazy Crawler, see the project documentation.
Lazy Crawler was created by Pradip P.
Lazy Crawler is released under the MIT License.