Etsy Web Scraper

A Python-based web scraper built with Selenium that extracts listing and shop information from Etsy based on user-defined search terms and pagination limits.

Features

Extract comprehensive listing data including pricing, ratings, reviews, and shop information
Configurable search terms and page limits (up to 240 pages per term)
Built-in IP blocking mitigation
Shop name anonymization for privacy
CSV output with 16 data columns
Detailed progress tracking and summary reporting

Prerequisites

Python: 3.9+
Dependencies: See requirements.txt
WebDriver: Chrome/Firefox WebDriver (path configurable)

Installation

Clone the repository:

git clone https://github.com/bigdata5911/Etsy-Scraping.git
cd Etsy-Scraping

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configuration

Before running the scraper, configure the following in scraper_options.py:

WebDriver Path: Set the path to your WebDriver executable
Search Terms: Define your list of search terms in the search_terms variable
Page Limit: Set the number of pages to scrape per search term (max: 240)

Usage

Run the main scraper:

python main_scraper.py

The scraper will:

Display progress for each page scraped
Provide summary statistics for each search term
Output results to raw_data.csv

File Structure

main_scraper.py - Main execution script
scraper_functions.py - Core scraping functions
scraper_options.py - Configuration settings
raw_data.csv - Example output data

Data Schema

The scraper extracts 16 columns of data per listing:

Column	Description	Type
`Title`	Listing title	String
`Shop_Name`	Anonymized shop identifier	Integer
`Is_Ad`	Advertisement flag (1/0)	Integer
`Star_Rating`	Shop's overall star rating	Float
`Num_Reviews`	Total shop reviews	Integer
`Price`	Item price	Float
`Is_Bestseller`	Bestseller tag flag (1/0)	Integer
`Num_Sales`	Total shop sales	Integer
`Num_Basket`	Items in customer baskets	Integer
`Description`	Listing description	String
`Days_to_Arrival`	Estimated delivery days	Integer
`Cost_Delivery`	Shipping cost	Float
`Returns_Accepted`	Return policy flag (1/0)	Integer
`Dispatched_From`	Origin country	String
`Num_Images`	Number of listing images	Integer
`Category`	Search term used	String

Performance

Current Speed: ~15,500 records in 24 hours
Rate Limiting: Built-in delays to prevent IP blocking
Memory Usage: Optimized for large datasets

Customization Options

Remove Shop Name Anonymization

To retain actual shop names, comment out the anonymization code in main_scraper.py.

Potential Enhancements

Performance: Implement headless browsing and concurrent processing
Additional Data: Sale status, personalization options, bulk availability
Review Analysis: Extract and analyze customer reviews
Advanced Filtering: Category-specific data extraction

Known Limitations

Scraping speed is conservative to avoid IP blocking
Maximum 240 pages per search term (Etsy limitation)
Delivery costs displayed in user's local currency
Requires active internet connection and WebDriver maintenance

Troubleshooting

Common Issues

WebDriver errors: Ensure WebDriver version matches your browser
IP blocking: Increase delays in scraper settings
Missing data: Check Etsy's page structure for changes

Contributing

Contributions are welcome! Please feel free to:

Submit bug reports and feature requests
Improve scraping efficiency
Add new data extraction capabilities
Enhance error handling

Resources

License

This project is provided as-is for educational and research purposes. Please ensure compliance with Etsy's Terms of Service and robots.txt when using this scraper.

Contact

For questions or support, please open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Etsy Web Scraper

Features

Prerequisites

Installation

Configuration

Usage

File Structure

Data Schema

Performance

Customization Options

Remove Shop Name Anonymization

Potential Enhancements

Known Limitations

Troubleshooting

Common Issues

Contributing

Resources

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.gitignore		.gitignore
License.txt		License.txt
README.md		README.md
main_scraper.py		main_scraper.py
raw_data.csv		raw_data.csv
requirements.txt		requirements.txt
scraper_functions.py		scraper_functions.py
scraper_options.py		scraper_options.py

License

bigdata5911/etsy-scrapper

Folders and files

Latest commit

History

Repository files navigation

Etsy Web Scraper

Features

Prerequisites

Installation

Configuration

Usage

File Structure

Data Schema

Performance

Customization Options

Remove Shop Name Anonymization

Potential Enhancements

Known Limitations

Troubleshooting

Common Issues

Contributing

Resources

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages