A universal web scraper for e-commerce sites with automatic platform detection and intelligent data extraction.
- Auto E-commerce Scraper
- Features
- Architecture
- Installation
- Usage
- How It Works
- API Reference
- Examples
- Tested Sites
- Configuration
- Links
- Disclaimer
- Troubleshooting
- Intelligent Container Detection - Finds repeating product cards without configuration
- HasData API Integration - Optional API support for JavaScript rendering and advanced features
- AI Extraction - Extract structured product data using AI (via HasData)
- Multiple Export Formats - Export to CSV, JSON, and Excel
- User-Friendly Interface - Built with Streamlit for easy interaction
.
├── app.py # Main application entry point
├── scraper/
│ ├── core.py # AutoScraper class and main logic
│ ├── platforms.py # Platform-specific scrapers (WooCommerce, Shopify)
│ └── utils.py # Helper functions for container detection
└── ui/
├── layout.py # Sidebar and results rendering
└── export.py # Data export functionality
# Clone the repository
git clone https://github.com/hasdata/auto-ecommerce-scraper.git
cd auto-ecommerce-scraper
# Install dependencies
pip install -r requirements.txtstreamlit>=1.30.0
pandas>=2.0.0
beautifulsoup4>=4.12.0
requests>=2.31.0
openpyxl>=3.1.0streamlit run app.pyThen:
- Enter a product listing page URL
- Click "Start Scraping"
- Review extracted data
- Export to your preferred format
For JavaScript-heavy sites or AI extraction:
- Check "Use HasData API"
- Enter your API key
- Configure scraping options:
- JS Rendering
- Proxy settings
- AI product extraction
- Screenshot capture
- Email/link extraction
The scraper automatically identifies the platform:
- WooCommerce - Looks for
woocommerce,product_type_, etc. - Shopify - Detects
shopify,product-card, etc. - Generic - Falls back to container detection
Scores potential product containers based on:
- Presence of images and links
- Text content length
- HTML structure complexity
- CSS class keywords
- Price indicators
Extracts:
- Product titles
- Prices and original prices
- Product URLs
- Images
- Stock status
- Categories and tags
- SKUs
- Ratings and reviews
- Groups similar fields
- Removes low-frequency selectors
- Creates human-readable field names
- Handles links and images properly
from scraper.core import AutoScraper
# Initialize scraper
scraper = AutoScraper(
url="https://example.com/products",
api_key="your_hasdata_key", # Optional
scrape_config={
"jsRendering": True,
"proxyType": "datacenter",
"blockAds": True
}
)
# Scrape data
result = scraper.scrape(
container_index=0, # Select container group
force_generic=False # Force generic method
)
# Access results
print(result['platform']) # 'woocommerce', 'shopify', or 'generic'
print(result['data']) # List of extracted itemsfrom scraper.platforms import scrape_woocommerce, scrape_shopify
# For WooCommerce sites
result = scrape_woocommerce(scraper)
# For Shopify sites
result = scrape_shopify(scraper)scraper = AutoScraper("https://scrapeme.live/shop/")
result = scraper.scrape()
# result will contain:
# {
# 'platform': 'woocommerce',
# 'count': 48,
# 'data': [
# {
# 'title': 'Product Name',
# 'price': '$19.99',
# 'url': 'https://...',
# 'image': 'https://...',
# 'stock_status': 'In Stock'
# },
# ...
# ]
# }from scraper.core import UNIVERSAL_PRODUCT_RULES
scraper = AutoScraper(
url="https://example.com/products",
api_key="your_key",
scrape_config={
"jsRendering": True,
"aiExtractRules": UNIVERSAL_PRODUCT_RULES
}
)
result = scraper.scrape()
ai_data = scraper.get_hasdata_extras()['aiResponse']- ✅ WooCommerce stores
- ✅ Shopify stores
- ✅ Custom e-commerce platforms
- ✅ Book stores (books.toscrape.com)
- ✅ Fashion marketplaces (Vinted)
- ✅ And many more...
| Option | Description | Default |
|---|---|---|
jsRendering |
Enable JavaScript rendering | true |
screenshot |
Capture page screenshot | false |
extractEmails |
Extract email addresses | false |
extractLinks |
Extract all links | false |
blockResources |
Block images/fonts/media | true |
blockAds |
Block advertisements | true |
proxyType |
Proxy type (datacenter/residential/mobile) | datacenter |
proxyCountry |
Proxy country code | US |
Define custom extraction schemas:
custom_rules = {
"products": {
"type": "list",
"description": "all products",
"output": {
"name": {"type": "string"},
"price": {"type": "string"},
"rating": {"type": "number"}
}
}
}- How to Scrape E-Commerce in 2026
- HasData API Documentation
- Streamlit Documentation
- BeautifulSoup Documentation
This tool is for educational purposes only. Learn more about the legality of web scraping.
"Could not find repeating structures"
- Try checking "Use generic method"
- Ensure the page has multiple similar items
- Try with HasData API for JS-rendered content
"API Error"
- Verify your HasData API key
- Check your API quota
- Ensure the URL is accessible
Empty or incomplete data
- Select a different container group
- Enable JS rendering
- Check if the site requires authentication
Made with ❤️ for the web scraping community

