Tracer is a Python tool that follows a given URL, scrapes images from the website along with their metadata (EXIF), and organizes them into a folder tree reflecting the link hierarchy. If the link contains further links, Tracer will recursively follow them up to a specified depth (default is 5, configurable via command-line arguments).
- Recursive Traversal: Follow links up to a configurable maximum depth.
- Image Downloading: Downloads all images found on each page.
- Metadata Extraction: Saves image metadata (format, size, mode, and EXIF data if available) into a corresponding text file.
- Folder Organization: Creates a folder structure that mirrors the link hierarchy.
- Python 3.x
- requests
- beautifulsoup4
- Pillow
-
Clone the Repository:
git clone <repository_url> cd tracr
-
Create a Virtual Environment: On Windows, open PowerShell and run:
python -m venv venv -
Activate the Virtual Environment:
-
Using PowerShell:
If you encounter an execution policy error, open PowerShell as Administrator and run:Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
Then activate the virtual environment:
.\venv\Scripts\Activate.ps1
-
Using Command Prompt:
venv\Scripts\activate
Once activated, you should see
(venv)prefixed on your command line. -
-
Install Dependencies: With the virtual environment activated, install all required packages using:
pip install -r requirements.txt
Run the Tracer tool with the following command:
python tracer.py <starting_url> --depth <max_depth> --output <output_folder><starting_url>: The URL where the tracer begins.--depth <max_depth>: (Optional) The maximum depth to traverse. Default is 5.--output <output_folder>: (Optional) The folder where output will be stored. Default isoutput.
Example:
python tracer.py https://example.com --depth 5 --output tracer_output- PowerShell:
.\venv\Scripts\Activate.ps1
- Command Prompt:
venv\Scripts\activate
To deactivate the virtual environment, simply run:
deactivate