This project provides a flexible and customizable web scraping tool that utilizes a lexer and parser to process commands written in a custom script format (e.g., .scrape files). It allows users to easily scrape data from websites using simple commands.
- Basic Scraping Commands: Extract data from websites and save it in different formats such as JSON, CSV, or XML.
- Customizable Options: Include user-agent strings, set delays, specify retry attempts, and use proxies or authentication headers.
- File-Based Command Input: Use
.scrapefiles to define scraping tasks for better readability and reusability. - Batch File Execution: Run the script seamlessly using a
.batfile for ease of use.
Some advanced features tagged in the documentation, such as validation, filtering, and certain API-related options, are currently in the development phase and not functional at this time. Please refrain from using these features as they may lead to unexpected behavior.
The following advanced features are still in progress:
validate fieldsfilter byusing_apimonitorparallel
These features will be fully implemented in future updates.
-
Install Python (version 3.7 or higher).
-
Install dependencies:
pip install -r requirements.txt
console.bat