Built with the tools and technologies:
Python | Requests | BeautifulSoup | Pandas
This project is a command-line web scraping application built with Python. It allows users to extract structured data from different websites through an interactive menu. The scraped data is displayed in the console and can be optionally saved to a CSV file for further analysis.
The application is designed to be easily extensible, allowing new scrapers for other websites to be added with minimal effort.
- Interactive CLI: A user-friendly command-line interface to select a scraping target.
- Multiple Scrapers:
- IMDb Top 250 Movies: Scrapes movie title, release year, duration, and IMDb rating.
- Former Presidents of India: Scrapes the list of presidents from Wikipedia, including their name, lifespan, home state, and term of office.
- Data Export: Option to save the scraped data into a clean, well-formatted CSV file.
- Modular Design: The code is organized into modules for scraping, utilities, and the main application logic, promoting readability and maintainability.
- Python 3.7+
- The following Python libraries are required:
requestsbeautifulsoup4pandas
- Clone the repository (you'll need to set this up on a platform like GitHub):
git clone https://github.com/your-username/webscraper-app.git
- Navigate to the project directory:
cd webscraper-app - It is recommended to create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install the required packages from
requirements.txt:pip install -r requirements.txt
- Run the application from the root directory of the project:
python src/main.py
- The console will display a menu of available scraping options. Follow the on-screen prompts to choose a website to scrape, view the results, and optionally save them to a CSV file.
webscraper-app/
├── src/
│ ├── __init__.py
│ ├── main.py # Main application entry point, handles user interaction
│ ├── scraper.py # Contains all the web scraping logic
│ └── utils.py # Utility functions (e.g., saving to CSV)
├── requirements.txt # Lists project dependencies
└── README.md # This file
This project is licensed under the MIT License. Consider creating a LICENSE file in your project root.
If you have any questions or feedback, feel free to reach out to me via my LinkedIn Profile.