Auto E-commerce Scraper

A universal web scraper for e-commerce sites with automatic platform detection and intelligent data extraction.

Auto E-commerce Scraper
Features
Architecture
Installation
- Requirements
Usage
- Basic Usage
- With HasData API
How It Works
API Reference
- AutoScraper Class
- Platform-Specific Scrapers
Examples
- Scraping a WooCommerce Store
- Using AI Extraction
Tested Sites
Configuration
- Scraping Options
- AI Extraction Rules
Links
Disclaimer
Troubleshooting
- Common Issues

Features

Intelligent Container Detection - Finds repeating product cards without configuration
HasData API Integration - Optional API support for JavaScript rendering and advanced features
AI Extraction - Extract structured product data using AI (via HasData)
Multiple Export Formats - Export to CSV, JSON, and Excel
User-Friendly Interface - Built with Streamlit for easy interaction

Architecture

.
├── app.py                  # Main application entry point
├── scraper/
│   ├── core.py            # AutoScraper class and main logic
│   ├── platforms.py       # Platform-specific scrapers (WooCommerce, Shopify)
│   └── utils.py           # Helper functions for container detection
└── ui/
    ├── layout.py          # Sidebar and results rendering
    └── export.py          # Data export functionality

📦 Installation

# Clone the repository
git clone https://github.com/hasdata/auto-ecommerce-scraper.git
cd auto-ecommerce-scraper

# Install dependencies
pip install -r requirements.txt

Requirements

streamlit>=1.30.0
pandas>=2.0.0
beautifulsoup4>=4.12.0
requests>=2.31.0
openpyxl>=3.1.0

Usage

Basic Usage

streamlit run app.py

Then:

Enter a product listing page URL
Click "Start Scraping"
Review extracted data
Export to your preferred format

With HasData API

For JavaScript-heavy sites or AI extraction:

Check "Use HasData API"
Enter your API key
Configure scraping options:
- JS Rendering
- Proxy settings
- AI product extraction
- Screenshot capture
- Email/link extraction

How It Works

1. Platform Detection

The scraper automatically identifies the platform:

WooCommerce - Looks for woocommerce, product_type_, etc.
Shopify - Detects shopify, product-card, etc.
Generic - Falls back to container detection

2. Container Detection (Generic Mode)

Scores potential product containers based on:

Presence of images and links
Text content length
HTML structure complexity
CSS class keywords
Price indicators

3. Data Extraction

Extracts:

Product titles
Prices and original prices
Product URLs
Images
Stock status
Categories and tags
SKUs
Ratings and reviews

4. Data Cleaning

Groups similar fields
Removes low-frequency selectors
Creates human-readable field names
Handles links and images properly

API Reference

AutoScraper Class

from scraper.core import AutoScraper

# Initialize scraper
scraper = AutoScraper(
    url="https://example.com/products",
    api_key="your_hasdata_key",  # Optional
    scrape_config={
        "jsRendering": True,
        "proxyType": "datacenter",
        "blockAds": True
    }
)

# Scrape data
result = scraper.scrape(
    container_index=0,      # Select container group
    force_generic=False     # Force generic method
)

# Access results
print(result['platform'])   # 'woocommerce', 'shopify', or 'generic'
print(result['data'])       # List of extracted items

Platform-Specific Scrapers

from scraper.platforms import scrape_woocommerce, scrape_shopify

# For WooCommerce sites
result = scrape_woocommerce(scraper)

# For Shopify sites
result = scrape_shopify(scraper)

Examples

Scraping a WooCommerce Store

scraper = AutoScraper("https://scrapeme.live/shop/")
result = scraper.scrape()

# result will contain:
# {
#   'platform': 'woocommerce',
#   'count': 48,
#   'data': [
#     {
#       'title': 'Product Name',
#       'price': '$19.99',
#       'url': 'https://...',
#       'image': 'https://...',
#       'stock_status': 'In Stock'
#     },
#     ...
#   ]
# }

Using AI Extraction

from scraper.core import UNIVERSAL_PRODUCT_RULES

scraper = AutoScraper(
    url="https://example.com/products",
    api_key="your_key",
    scrape_config={
        "jsRendering": True,
        "aiExtractRules": UNIVERSAL_PRODUCT_RULES
    }
)

result = scraper.scrape()
ai_data = scraper.get_hasdata_extras()['aiResponse']

Tested Sites

✅ WooCommerce stores
✅ Shopify stores
✅ Custom e-commerce platforms
✅ Book stores (books.toscrape.com)
✅ Fashion marketplaces (Vinted)
✅ And many more...

Configuration

Scraping Options

Option	Description	Default
`jsRendering`	Enable JavaScript rendering	`true`
`screenshot`	Capture page screenshot	`false`
`extractEmails`	Extract email addresses	`false`
`extractLinks`	Extract all links	`false`
`blockResources`	Block images/fonts/media	`true`
`blockAds`	Block advertisements	`true`
`proxyType`	Proxy type (datacenter/residential/mobile)	`datacenter`
`proxyCountry`	Proxy country code	`US`

AI Extraction Rules

Define custom extraction schemas:

custom_rules = {
    "products": {
        "type": "list",
        "description": "all products",
        "output": {
            "name": {"type": "string"},
            "price": {"type": "string"},
            "rating": {"type": "number"}
        }
    }
}

🔗 Links

Disclaimer

This tool is for educational purposes only. Learn more about the legality of web scraping.

Troubleshooting

Common Issues

"Could not find repeating structures"

Try checking "Use generic method"
Ensure the page has multiple similar items
Try with HasData API for JS-rendered content

"API Error"

Verify your HasData API key
Check your API quota
Ensure the URL is accessible

Empty or incomplete data

Select a different container group
Enable JS rendering
Check if the site requires authentication

Made with ❤️ for the web scraping community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Auto E-commerce Scraper

Table of Contents

Features

Architecture

📦 Installation

Requirements

Usage

Basic Usage

With HasData API

How It Works

1. Platform Detection

2. Container Detection (Generic Mode)

3. Data Extraction

4. Data Cleaning

API Reference

AutoScraper Class

Platform-Specific Scrapers

Examples

Scraping a WooCommerce Store

Using AI Extraction

Tested Sites

Configuration

Scraping Options

AI Extraction Rules

🔗 Links

Disclaimer

Troubleshooting

Common Issues

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
media		media
scraper		scraper
ui		ui
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

HasData/auto-ecommerce-scraper

Folders and files

Latest commit

History

Repository files navigation

Auto E-commerce Scraper

Table of Contents

Features

Architecture

📦 Installation

Requirements

Usage

Basic Usage

With HasData API

How It Works

1. Platform Detection

2. Container Detection (Generic Mode)

3. Data Extraction

4. Data Cleaning

API Reference

AutoScraper Class

Platform-Specific Scrapers

Examples

Scraping a WooCommerce Store

Using AI Extraction

Tested Sites

Configuration

Scraping Options

AI Extraction Rules

🔗 Links

Disclaimer

Troubleshooting

Common Issues

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages