A production-ready boilerplate to collect Amazon product data and reviews using Python with safe-request logic, proxy rotation, and anti-bot handling. Ideal for researchers, analysts, and growth teams who need structured product, price, and review insights at scale.
For discussion, queries, and freelance work — reach out 👆
This repository provides a modular Python scaffold to scrape product details, pricing, availability, ratings, and paginated reviews from Amazon product and search pages. It includes browser and HTTP modes, rotating proxies, throttling, and storage adapters (CSV/JSON/SQLite). Built for analysts, SEOs, and growth teams who need reliable, reproducible data collection.
- Saves time and automates setup.
- Scalable for multiple use cases.
- Safer with anti-detect and proxy logic.
# | Feature | What it does |
---|---|---|
1 | Dual mode: HTTP + Headless | Choose requests+bs4 for speed or Playwright/Selenium for heavy pages |
2 | Proxy & Fingerprint Aids | Rotating proxies, randomized headers, backoff/retry |
3 | Product & Review Extractors | Parse title, price, images, ASIN, attributes, ratings, review text & stars |
4 | Pagination & Rate Control | Auto next-page detection with human-like delays |
5 | Storage Adapters | Save to CSV, JSON, or SQLite with schema migrations |
6 | CLI & Config | .env driven settings, one-liner commands, job presets |
7 | Captcha & Block Handling | Detection hooks, fallbacks, and task resume |
8 | Modular Pipelines | Plug-in architecture for enrichers (exchange rates, categories) |
- Competitor price monitoring for specific ASINs
- Review mining for sentiment analysis and VOC research
- Daily product catalog snapshots for marketplace analytics
- SEO research: SERP coverage, buy-box presence, and availability trends
Q: How to use python to scrape amazon?
A: Use either HTTP mode (requests + BeautifulSoup
) for speed or headless mode (Playwright/Selenium) for dynamic pages. Configure rotating proxies and headers via .env
, then run the provided CLI to fetch product pages or search results and export to CSV/JSON/SQLite with built-in parsers and rate limits.
Q: How to build amazon product data scraper with python?
A: Start with structured modules: a fetcher (HTTP/headless), a parser (product + review schemas), a storage layer (CSV/JSON/SQLite), and a controller for retries and pagination. This repo scaffolds all of these with ready-made commands and configuration.
Q: How to scrape amazon.com product data and reviews using python?
A: Point the CLI to a product URL or a list of ASINs. The pipeline fetches HTML, parses core fields (title, price, images, features), then iterates through review pages to capture ratings, text, date, and helpful votes—respecting delays, proxies, and block detection. Export results using --out products.csv
/ --out reviews.csv
.
10x faster posting schedules
80% engagement increase on group campaigns
Fully automated lead response system
Average Performance Benchmarks:
- Speed: 2x faster than manual posting
- Stability: 99.2% uptime
- Ban Rate: <0.5% with safe automation mode
- Throughput: 100+ posts/hour per session
Contact Us
- Node.js or Python
- Git
- Docker (optional)
# Clone the repo
git clone https://github.com/yourusername/amazon-scraper-python.git
cd amazon-scraper-python
# Install dependencies
pip install -r requirements.txt
# or
npm install
# Setup environment
cp .env.example .env
# edit proxies, mode=HTTP|HEADLESS, delays, and output paths
# Run (examples)
# Single product (ASIN or URL)
python main.py scrape:product --asin B0XXXXXXX --out products.csv
# Reviews for a product
python main.py scrape:reviews --asin B0XXXXXXX --pages 10 --out reviews.csv
# Search results
python main.py scrape:search --q "wireless earbuds" --pages 3 --out listings.csv