A repository designed to help freshers grasp the basics of web scraping. This kit provides simple guides and examples to build a strong foundation in web scraping.
This repository includes four essential Python scripts for web scraping:
-
Web.py
This script introduces the basics of web scraping. It captures and prints data from a website to the terminal. -
WebDataToExcel.py
This script extracts data from a website and saves it to an Excel sheet, with two columns: Heading and Content. -
WebImgToFolder.py
This script retrieves image source paths via web scraping and downloads the images, saving them to a specified folder. -
PaginatedDataSetToExcel.py
This script scrapes data from a paginated site and saves it to an Excel sheet with seven separate columns, organized page by page.
- Clone the Repository
git clone https://github.com/gayanukabulegoda/Web-Scraping-Starter-Kit.git
- Navigate to the Project Directory
cd Web-Scraping-Starter-Kit
- Run the Scripts
- For
Web.py
:python Web.py
- For
WebDataToExcel.py
:python WebDataToExcel.py
- For
WebImgToFolder.py
:python WebImgToFolder.py
- For
PaginatedDataSetToExcel.py
:python PaginatedDataSetToExcel.py
Ensure you have the required Python libraries installed. You can install them using pip:
pip install requests beautifulsoup4 pandas
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or inquiries, please contact me via LinkedIn.
© 2024 Gayanuka Bulegoda