Welcome to the Py Web Scraper repository! This project is a robust web scraping tool developed in Python. The purpose of this README is to guide you through the prerequisites for this project and provide instructions on how to clone and run it.
Before cloning and running this project, ensure you have the following installed:
-
Python: The project is written in Python. Ensure you have Python 3.x installed. You can download it here.
-
pip: This is the package installer for Python. If you installed Python from python.org, you likely already have pip. Otherwise, you can get instructions here.
-
Virtual Environment (Optional): It's a good practice to run Python projects within a virtual environment to manage dependencies. You can read more about it here.
-
ChromeDriver:
- This project uses ChromeDriver to interact with the Chrome web browser. Ensure the version of ChromeDriver aligns with your installed version of the Chrome browser.
- You can download ChromeDriver here. After downloading, place the
chromedriverexecutable in the root folder of this project. - Alternatively, you can add the path of the
chromedriverbinary to your system's PATH environment variable.
Follow these steps to get the project up and running:
-
Clone the Repository:
git clone https://github.com/ynevet/py-web-scraper.git
-
Navigate to the Project Directory:
cd py-web-scraper -
Set Up a Virtual Environment (Optional):
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install Required Packages:
pip install -r requirements.txt
After setting up, you can run the main script:
python main.pyUpon successful execution of the script, you should expect two generated files:
- A
CSVfile. - A
parquetfile.
Both files will contain the scraped data.
Feel free to fork this repository and submit pull requests. If you encounter any issues or have suggestions, please open an issue in the repository