Skip to content

grahamwaters/SceneSort_Clustering

Repository files navigation

📂 Trip Media Sorter by Visual Scene Similarity

Python
PyTorch
OpenAI CLIP
OpenCV
HDBSCAN
License


📌 About

This tool processes media files (images, RAW files, and videos) by extracting visual and temporal features, clustering them into scenes, and organizing them into separate directories. It uses the CLIP model for visual embeddings and supports both DBSCAN and HDBSCAN clustering.

✨ Features

  • 📸 Media Support: Handles images (JPEG, PNG, RAW), and common video formats.
  • 🔍 Hybrid Clustering: Combines visual and temporal features.
  • ♻️ Resumable Processing: Resume an interrupted run with a progress file.
  • 🎞 Video Optimization: Early termination for efficient keyframe extraction.
  • 🛡 Safety Checks: File size limits, duplicate detection, and disk space verification.
  • 🧪 Dry Run Option: Simulate file organization before actual execution.

🔧 Installation

🛠 Requirements

🔽 Install Dependencies

pip install torch torchvision
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
pip install opencv-python pillow rawpy scikit-learn hdbscan pyyaml

🚀 Setup & Usage

📥 Clone the Repository

git clone https://github.com/yourusername/media_scene_sorter.git
cd media_scene_sorter

🌐 (Optional) Virtual Environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

⚙️ (Optional) Custom Configuration

Create a config.yaml file to override default settings:

clustering:
  algorithm: hdbscan
  eps: 0.15
  min_samples: 3
  visual_weight: 0.7
  temporal_weight: 0.3
processing:
  batch_size: 32
  max_workers: 4
  video_interval: 30
  max_file_size_mb: 500
safety:
  dry_run: false
  min_disk_space_gb: 5
  checksum_verify: true
video_certainty_threshold: 0.05

▶️ Run the Script

python media_scene_sorter.py --input /path/to/input_directory --output /path/to/output_directory

🛠 Command-Line Options

Option Description
--input (Required) Input directory path containing media files.
--output (Required) Output directory for organized scenes.
--config Path to custom YAML configuration (default: config.yaml).
--dry-run Simulate file organization without making changes.
--process-large-files Process files exceeding size limit.
--eps Override eps for DBSCAN clustering.
--resume Resume from saved progress (progress.pkl).
--video-certainty-threshold Set early termination threshold for video processing.
--images-first Process images before videos.

📌 Example Usage

python media_scene_sorter.py \
  --input /media/input \
  --output /media/output \
  --process-large-files \
  --video-certainty-threshold 0.05

📜 Logging & Debugging

  • 📂 General logs: Stored in media_sorter.log & printed to console.
  • ⚠️ Error logs: Critical errors are logged in error.log.

🔄 Resuming Interrupted Runs

If the process is interrupted, restart it with the --resume flag:

python media_scene_sorter.py --input /media/input --output /media/output --resume

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.


🚀 Happy Sorting! 🎥📸

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published