DB URL Checker

A high-performance tool to process image URLs from SQL dump files: it validates URLs, logs broken links, and downloads valid images efficiently.

How it Works

URL Extraction: Reads the SQL dump file and uses regex patterns to extract image URLs.
Validation: Checks each URL by sending HTTP requests to see if the image exists and is valid.
Download: Downloads valid images to a specified output folder using either async or threaded mode.
Logging: Logs failed URLs to failed_downloads.log and failed_downloads.csv.
Summary: Generates a summary of failed downloads in failed_downloads_summary.txt.

Getting SQL Dump

To get the image URLs from your database, you can run a query like:

SELECT media FROM chat_messages 
WHERE media IS NOT NULL 
AND media <> '' 
AND media <> '[]';

Then export the results to a .sql file, which can be used as input for this tool.

Features

Extracts image URLs from large SQL dump files efficiently using threading
Validates URLs and downloads valid images to a specified folder
Logs failed URLs to failed_downloads.log and failed_downloads.csv
Generates a summary of failed URLs by error type (failed_downloads_summary.txt)
Async mode for maximum speed with many concurrent workers
Threaded mode for better compatibility with large files or certain environments
Resume capability for already downloaded images
Progress tracking with processed/total [elapsed<remaining, speed]
Configurable retry mechanisms and rate limiting

Installation

Prerequisites

Python 3.7+
pip package manager

Install Dependencies

pip install -r requirements.txt

# or individually

pip install requests tqdm urllib3 aiohttp aiofiles

Usage

Basic Usage

# Async mode (recommended - fastest)
python db_url_checker.py database.sql

# Threaded mode
python db_url_checker.py database.sql --threaded

Advanced Usage

# Custom output folder and workers
python db_url_checker.py database.sql -o my_images -a 100 -t 20

# Threaded mode with custom workers
python db_url_checker.py database.sql --threaded -t 15 -o downloads

Command Line Options

Option	Description	Default
sql_file	Path to SQL dump file (required)	-
-o, --output	Output folder for downloaded images	downloaded_images
-a, --async-workers	Number of concurrent async workers	50
-t, --thread-workers	Number of concurrent thread workers	10
--threaded	Use threaded mode instead of async	False

Output Files

my_images/ or custom output folder: Successfully downloaded images
image_downloader_production.log: Main execution log
failed_downloads.log: Text log of failed URLs
failed_downloads.csv: Structured CSV of failures
failed_downloads_summary.txt: Error breakdown summary

Log Structure

failed_downloads.log:
[2024-01-15T10:30:45.123456] 404 Not Found (HTTP 404): https://example.com/missing.jpg

failed_downloads.csv:
timestamp,url,error_type,status_code,details
2024-01-15T10:30:45.123456,https://example.com/missing.jpg,404 Not Found,404,URL not found

failed_downloads_summary.txt:
FAILED DOWNLOADS SUMMARY
==============================
404 Not Found: 75 (50.0%)
Non-image content: 50 (33.3%)
Network Error: 20 (13.3%)
403 Forbidden: 5 (3.3%)

Performance Tips

Scenario	Recommendation
Many small images	High async worker count (-a 100-200)
Large images	Threaded mode with moderate workers (--threaded -t 10-20)
Mixed content	Default async mode with 50 workers
Limited bandwidth	Reduce workers to avoid rate limiting

Monitoring

Progress displayed as processed/total [elapsed<remaining, speed], e.g., 37032/40112 [22:25<01:20, 38.07image/s]
Completion summary: total URLs, successful/failed, success rate, time, and speed

Error Types Tracked

404 Not Found
403 Forbidden
Network errors (timeouts, DNS failures)
Non-image content
Other HTTP errors (4xx/5xx)
Suspiciously small files

Support

Check log files for detailed errors
Adjust worker counts for memory or bandwidth limitations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
db_url_checker.py		db_url_checker.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DB URL Checker

How it Works

Getting SQL Dump

Features

Installation

Prerequisites

Install Dependencies

Usage

Basic Usage

Advanced Usage

Command Line Options

Output Files

Log Structure

Performance Tips

Monitoring

Error Types Tracked

Support

About

Uh oh!

Releases

Packages

Languages

RahulPalXDA/DB-URL-Checker

Folders and files

Latest commit

History

Repository files navigation

DB URL Checker

How it Works

Getting SQL Dump

Features

Installation

Prerequisites

Install Dependencies

Usage

Basic Usage

Advanced Usage

Command Line Options

Output Files

Log Structure

Performance Tips

Monitoring

Error Types Tracked

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages