This project is a Python-based web scraper that collects quotes from Quotes to Scrape and stores them in a SQLite database. The scraper is asynchronous and uses SQLAlchemy for database interaction. An optional FastAPI interface is provided to view the quotes via a web API.
- Asynchronous web scraping using
httpx - HTML parsing with
BeautifulSoup - Storage in SQLite via SQLAlchemy
- Configurable settings via
.envand Pydantic - Optional FastAPI endpoint for browsing quotes
- Ready for deployment via Docker
- Python 3.11+
- Pip packages as listed in
requirements.txt - Optional: Docker Desktop for containerized deployment
- Clone the repository
git clone https://github.com/your-username/web_scraper.git
cd web_scraper- Create and activate a virtual environment
python -m venv venvvenv\Scripts\Activate.ps1source venv/bin/activate- Install dependencies
pip install -r requirements.txt- Create a .env file
Create a file named .env in the project root with the following contents:
BASE_URL=https://quotes.toscrape.com
DATABASE_URL=sqlite:///data.db
REQUEST_TIMEOUT=10explanation:
- BASE_URL – URL of the website to scrape
- DATABASE_URL – SQLAlchemy database URL
- REQUEST_TIMEOUT – HTTP request timeout in seconds
- Initialize the database
python -m scripts.init_db- Run the scraper
python -m scripts.run_scraper- The scraper will fetch quotes and store them in data.db
Option 1: Console Output
Use the demo script to print quotes in a formatted table:
python -m scripts.demo_outputOption 2: CSV File
Generate a CSV file for easy viewing:
python -m scripts.demo_csv- The file quotes_demo.csv will contain all scraped quotes with authors and tags.
Option 3: FastAPI Endpoint (Optional)
If you want a web interface: 1. Modify the Dockerfile or run Uvicorn directly:
uvicorn app.api.main:app --reload- Open the browser:
- API endpoint: http://127.0.0.1:8000/quotes
- Swagger UI: http://127.0.0.1:8000/docs
You can run the scraper in a container without installing Python or dependencies.
- Build the Docker image
docker build -t web_scraper_demo .- Run the container
docker run --rm --env-file .env -v ${PWD}/data.db:/app/data.db web_scraper_demo- --env-file .env passes environment variables to the container
- -v ${PWD}/data.db:/app/data.db ensures the SQLite database persists on the host
- Do not include secret keys or sensitive information in the Dockerfile.
- .env should be created locally and is ignored in Git via .gitignore.
- The project is intended for educational and demonstration purposes using public data.