🚀 WebTagContentExtractor v4.0

Advanced dual-mode web scraping tool with intelligent exhibition directory extraction

✨ NEW in v4.0

🎯 Enhanced Selenium Scraper with intelligent "Load More" button handling
🏢 Exhibition Directory Specialization optimized for InfoSecurity Europe, Milipol, etc.
🔄 Advanced Pagination with scroll detection and content stability checks
📊 Comprehensive Coverage ensuring 95%+ exhibitor extraction
🎪 Preset Management with 25+ pre-configured exhibition scrapers
🚀 Executable Build - Ready-to-use .exe application

🛠️ Features

🔧 Dual Scraping Modes

⚡ Simple Mode: Fast scraping for static HTML sites (requests + BeautifulSoup)
🌐 Selenium Mode: Full JavaScript rendering for modern websites
🔄 Auto Mode: Automatically detects the best scraping method

🎯 Exhibition & Directory Support

Intelligent "Load More" handling with multiple selector fallbacks
Scroll-triggered content loading detection
Pagination automation for multi-page directories
25+ Pre-configured exhibition presets (InfoSecurity Europe, Milipol, Eurosatory, etc.)
Company name extraction with UI element filtering

🖥️ User Interface

Modern GUI with tkinter
Real-time progress tracking
Preset management system
CSV preview before export
One-click executable

📊 Export & Data Management

CSV export with timestamps
Duplicate removal
Data filtering and cleaning
Batch processing support

🚀 Quick Start

Option 1: Use the Executable (Windows)

Download the latest WebTagContentExtractor.exe from Releases
Run the executable - no installation required!
Load presets and start scraping

Option 2: Run from Source

# Clone the repository
git clone https://github.com/yourusername/WebTagContentExtractor.git
cd WebTagContentExtractor

# Install dependencies
pip install -r requirements.txt

# Run the application
python main_window.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gui		gui
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
presets.json		presets.json
quick_test.py		quick_test.py
requirements.txt		requirements.txt
test_infosecurity_refined.py		test_infosecurity_refined.py
test_milipol.py		test_milipol.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 WebTagContentExtractor v4.0

✨ NEW in v4.0

🛠️ Features

🔧 Dual Scraping Modes

🎯 Exhibition & Directory Support

🖥️ User Interface

📊 Export & Data Management

🚀 Quick Start

Option 1: Use the Executable (Windows)

Option 2: Run from Source

About

Uh oh!

Releases 1

Packages

Languages

APMarzuki/WebTagContentExtractor

Folders and files

Latest commit

History

Repository files navigation

🚀 WebTagContentExtractor v4.0

✨ NEW in v4.0

🛠️ Features

🔧 Dual Scraping Modes

🎯 Exhibition & Directory Support

🖥️ User Interface

📊 Export & Data Management

🚀 Quick Start

Option 1: Use the Executable (Windows)

Option 2: Run from Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages