WebHuntDownloader: Website File Crawler & Downloader

This Python-based program allows you to recursively crawl a given website and download files with specific extensions, preserving the original folder structure. It's designed for users who want to easily retrieve media, documents, or any specific file types from a domain without dealing with manual scraping or downloading.

🔧 Features

✅ Easy to Use

Just provide the starting URL, and the program handles the rest.
Supports both GUI (via .exe) and CLI usage (via Python).

🌐 Smart Crawler

Automatically discovers all linked pages under the provided URL.
Extracts downloadable files from each visited page.
Prevents "backward crawling" (optional): You can stop the crawler from visiting upper-level directories.
- Example: Given https://example.com/folder/subfolder/, it won’t crawl https://example.com/folder/ if this feature is disabled.

🧠 File Filtering

Choose which types of files to download:
- Images (.jpg, .jpeg, .png, .gif, .webp, .svg, etc.)
- Videos (.mp4, .webm, .avi, .mov, etc.)
- Documents (.pdf, .docx, .pptx, .xlsx, etc.)
- Audio Files (.mp3, .wav, .ogg, etc.)
- Or define your own custom extensions.

📁 Folder Structure Preservation

Files are saved in the same relative path as on the server.
- Example: https://example.com/construction-updates/admin/projects/2020/04/xxx-scaled.jpg → /example.com/construction-updates/admin/projects/2020/04/xxx-scaled.jpg

📊 Advanced Download Reports

A detailed summary is displayed after the crawl is completed or stopped manually.
Includes:
- Total pages discovered
- Total files downloaded
- Number of each file type
- Files that failed to download
- Files larger than 10MB
Reports are stored in a SQLite database for filtering and future reference.

🖼️ GUI Interface

A full graphical interface is now available.
Select URL, choose file types, and start/stop crawling with buttons.
See real-time status and progress in the window.

🛠️ Open Source and Modifiable

You can clone the repository and modify the script as needed.
Clean and well-organized codebase for easy customization.

🖥️ How to Use

Option 1: Use the Pre-Built EXE

Download the .exe file from the Releases section.
Run it (no installation required).
Provide the URL, select file types, and click Start Download.
Results will be saved locally with full directory structure.
View reports in the built-in GUI or from the saved database.

⚠️ Make sure to allow the program through your antivirus/firewall if prompted.

Option 2: Use the Python Source Code

📥 Clone the Repository

git clone https://github.com/aiproje/WebHuntDownloader.git
cd WebHuntDownloader

🐍 Create a Virtual Environment (Optional but Recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

📦 Install Requirements

pip install -r requirements.txt

▶️ Run the Program

python main.py --gui

Adjust the filename if your entry point differs.

📌 Notes

The program obeys robots.txt by default (if implemented).
Supports both depth-first and breadth-first crawling (configurable).
Handles both relative and absolute URLs.
Skips already downloaded files using cache and logs.

🐞 Found a Bug or Want a Feature?

Please open an issue in the Issues section.

Include:

The URL you used
Any error messages
What you expected vs what happened

📄 License

This project is open source and available under the MIT License.

🙌 Contributions

We welcome all contributions! Feel free to fork the repo and submit pull requests for:

Bug fixes
New features
UI/UX improvements
Performance enhancements

Made with ❤️ by AIPROJE

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
screenshots		screenshots
.gitignore		.gitignore
README.TR.md		README.TR.md
README.md		README.md
gui.py		gui.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebHuntDownloader: Website File Crawler & Downloader

🔧 Features

✅ Easy to Use

🌐 Smart Crawler

🧠 File Filtering

📁 Folder Structure Preservation

📊 Advanced Download Reports

🖼️ GUI Interface

🛠️ Open Source and Modifiable

🖥️ How to Use

Option 1: Use the Pre-Built EXE

Option 2: Use the Python Source Code

📥 Clone the Repository

🐍 Create a Virtual Environment (Optional but Recommended)

📦 Install Requirements

▶️ Run the Program

📌 Notes

🐞 Found a Bug or Want a Feature?

📄 License

🙌 Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

aiproje/WebHuntDownloader

Folders and files

Latest commit

History

Repository files navigation

WebHuntDownloader: Website File Crawler & Downloader

🔧 Features

✅ Easy to Use

🌐 Smart Crawler

🧠 File Filtering

📁 Folder Structure Preservation

📊 Advanced Download Reports

🖼️ GUI Interface

🛠️ Open Source and Modifiable

🖥️ How to Use

Option 1: Use the Pre-Built EXE

Option 2: Use the Python Source Code

📥 Clone the Repository

🐍 Create a Virtual Environment (Optional but Recommended)

📦 Install Requirements

▶️ Run the Program

📌 Notes

🐞 Found a Bug or Want a Feature?

📄 License

🙌 Contributions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages