Skip to content

v01dedknight/web_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper for Quotes

This project is a Python-based web scraper that collects quotes from Quotes to Scrape and stores them in a SQLite database. The scraper is asynchronous and uses SQLAlchemy for database interaction. An optional FastAPI interface is provided to view the quotes via a web API.

Features

  • Asynchronous web scraping using httpx
  • HTML parsing with BeautifulSoup
  • Storage in SQLite via SQLAlchemy
  • Configurable settings via .env and Pydantic
  • Optional FastAPI endpoint for browsing quotes
  • Ready for deployment via Docker

Requirements

  • Python 3.11+
  • Pip packages as listed in requirements.txt
  • Optional: Docker Desktop for containerized deployment

Setup

  1. Clone the repository
git clone https://github.com/your-username/web_scraper.git
cd web_scraper
  1. Create and activate a virtual environment
python -m venv venv

Windows PowerShell

venv\Scripts\Activate.ps1

Linux/macOS

source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Create a .env file

Create a file named .env in the project root with the following contents:

BASE_URL=https://quotes.toscrape.com
DATABASE_URL=sqlite:///data.db
REQUEST_TIMEOUT=10

explanation:

  • BASE_URL – URL of the website to scrape
  • DATABASE_URL – SQLAlchemy database URL
  • REQUEST_TIMEOUT – HTTP request timeout in seconds

Running the Scraper

  1. Initialize the database
python -m scripts.init_db
  1. Run the scraper
python -m scripts.run_scraper
  • The scraper will fetch quotes and store them in data.db

Viewing Data

Option 1: Console Output

Use the demo script to print quotes in a formatted table:

python -m scripts.demo_output

Option 2: CSV File

Generate a CSV file for easy viewing:

python -m scripts.demo_csv
  • The file quotes_demo.csv will contain all scraped quotes with authors and tags.

Option 3: FastAPI Endpoint (Optional)

If you want a web interface: 1. Modify the Dockerfile or run Uvicorn directly:

uvicorn app.api.main:app --reload
  1. Open the browser:

Docker Deployment

You can run the scraper in a container without installing Python or dependencies.

  1. Build the Docker image
docker build -t web_scraper_demo .
  1. Run the container
docker run --rm --env-file .env -v ${PWD}/data.db:/app/data.db web_scraper_demo
  • --env-file .env passes environment variables to the container
  • -v ${PWD}/data.db:/app/data.db ensures the SQLite database persists on the host

Notes

  • Do not include secret keys or sensitive information in the Dockerfile.
  • .env should be created locally and is ignored in Git via .gitignore.
  • The project is intended for educational and demonstration purposes using public data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published