Web Scraping Experiments: BeautifulSoup vs Scrapy

This repository contains multiple web scraping examples using different Python tools and techniques. It serves as a comparison of BeautifulSoup and Scrapy, along with some advanced variations like threading, asyncio, and middleware usage.

Project Structure


Beautiful_Soup_scrapers/
├── bs4_asyncio_aiohttp.py        # Scraper using asyncio + aiohttp + BeautifulSoup
├── bs4_requests_threads.py       # Scraper using threading + requests + BeautifulSoup
└── only_bs4_and_requests.py      # Simple scraper using requests + BeautifulSoup

Scrapy_scrapers/
├── badhtml_messy_project/        # Scrapy project for scraping quotes
├── quotes_project/               # Standard Scrapy project scraping quotes
└── quotes_project_hasdata/       # Scrapy project using HasData API middleware

banner.png                        # banner image
README.md

Included Scrapers

1. BeautifulSoup + Requests

only_bs4_and_requests.py
Basic web scraping using requests and BeautifulSoup.
bs4_requests_threads.py
Uses Python threading to speed up multiple requests concurrently.
bs4_asyncio_aiohttp.py
Uses asyncio and aiohttp for asynchronous scraping.

2. Scrapy

quotes_project/
Standard Scrapy project for scraping quotes from quotes.toscrape.com.
quotes_project_hasdata/
Scrapy project enhanced with HasData API as middleware.
badhtml_messy_project/quotes_project/
Scrapy project combining Scrapy and BeautifulSoup for scraping badhtml.com.

How to Run

BeautifulSoup scripts

python Beautiful_Soup_scrapers/only_bs4_and_requests.py
python Beautiful_Soup_scrapers/bs4_requests_threads.py
python Beautiful_Soup_scrapers/bs4_asyncio_aiohttp.py

Scrapy projects

Navigate to the project folder:

cd Scrapy_scrapers/quotes_project
scrapy crawl quotes -o quotes.json

For quotes_project_hasdata:

cd Scrapy_scrapers/quotes_project_hasdata
scrapy crawl quotes -o quotes_hasdata.json

For badhtml_messy_project (Scrapy + BeautifulSoup):

cd Scrapy_scrapers/badhtml_messy_project/quotes_project
scrapy crawl badhtml -o badhtml.json

Notes

Each scraper saves results in structured JSON format.
BeautifulSoup scripts are simpler and better for small projects.
Scrapy projects are scalable, support pipelines, middlewares, and asynchronous scraping.
Advanced scripts (threading, asyncio) demonstrate performance improvements for multiple requests.

📎 More Resources

The article with results: Scrapy vs. Beautiful Soup: The 2025 Engineering Benchmark
Discord: Join the community
Star this repo if helpful ⭐

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Experiments: BeautifulSoup vs Scrapy

Table of Contents

Project Structure

Included Scrapers

1. BeautifulSoup + Requests

2. Scrapy

How to Run

BeautifulSoup scripts

Scrapy projects

Notes

📎 More Resources

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Beautiful_Soup_scrapers		Beautiful_Soup_scrapers
Scrapy_scrapers		Scrapy_scrapers
README.md		README.md
banner.png		banner.png

HasData/bs4-vs-scrapy

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Experiments: BeautifulSoup vs Scrapy

Table of Contents

Project Structure

Included Scrapers

1. BeautifulSoup + Requests

2. Scrapy

How to Run

BeautifulSoup scripts

Scrapy projects

Notes

📎 More Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages