A Streamlit-based web app that extracts news article content, author name, and published date from popular Indian news websites using BeautifulSoup and CSS selectors.
- The Hindu
- Economic Times
- Times of India
- Indian Express
- Frontend: Streamlit (for UI)
- Backend: Python, BeautifulSoup
- Web Scraping: requests, bs4
- Deployment: Works locally and deployed on Streamlit Cloud
git clone https://github.com/Gautam413/News-Extractor-Using-CSS-Selectors.git
cd News-Extractor-Using-CSS-Selectors
# On Windows
python -m venv venv
venv\Scripts\activate
# On Linux/macOS
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
├── app.py # Streamlit frontend app ├── scraper.py # Web scraping logic for each news site ├── requirements.txt # Python dependencies ├── README.md # Project description └── .gitignore # Files to exclude from version control
Feel free to open an issue .
This project is licensed under the MIT License.