This repository contains utility tools to help with dataset preparation and management for machine learning projects.
# Create a virtual environment (recommended)
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install the required dependencies
pip install -r requirements.txtThis tool allows you to batch rename image files in a folder with a consistent naming pattern.
python rename.py /path/to/images/folder --name custom-namefolder: Path to the folder containing images--nameor-n: Base name for renamed images (default: "image")
# Rename all images in the 'dog_photos' folder to 'dog-01.jpg', 'dog-02.jpg', etc.
python rename.py ./dog_photos --name dogThis tool converts CSV files to Excel format, which can be useful for data preparation in machine learning workflows.
python csv_to_excel.py /path/to/data.csv --output /path/to/output.xlsxcsv_file: Path to the input CSV file--outputor-o: Path to the output Excel file (optional, defaults to same name with .xlsx extension)
# Convert train_data.csv to Excel format
python csv_to_excel.py ./data/train_data.csv
# Convert with specific output path
python csv_to_excel.py ./data/train_data.csv --output ./processed/train_data.xlsxThis all-in-one tool helps you create and manage image datasets with custom property labels. It can automatically detect properties using a lightweight CLIP model or allow manual annotation.
# Manual annotation mode
python image_labeler.py /path/to/images/folder --output output.xlsx --fields field1 field2 field3
# Automatic annotation mode
python image_labeler.py /path/to/images/folder --output output.xlsx --fields field1 field2 field3 --autofolder: Path to the folder containing images--outputor-o: Path to the output Excel file--fieldsor-f: Custom fields to add as columns (e.g., "wears_belt" "has_collar")--autoor-a: Use AI to automatically detect properties--updateor-u: Update existing sheet instead of creating a new one--batchor-b: Batch size for processing images (default: 10)
- 1: Property exists in the image
- -1: Property doesn't exist in the image
- 0: Uncertain or not yet annotated (default for manual mode)
# Create a new annotation sheet for manual labeling
python image_labeler.py ./dog_photos --output dog_annotations.xlsx --fields wears_belt has_collar color breed
# Automatically analyze images for specific properties
python image_labeler.py ./dog_photos --output dog_annotations.xlsx --fields wears_belt has_collar --auto
# Update existing annotation sheet with new images
python image_labeler.py ./new_dog_photos --output dog_annotations.xlsx --update
# Analyze a large dataset with smaller batch size
python image_labeler.py ./large_image_collection --output annotations.xlsx --fields has_person is_outdoor --auto --batch 5- Python 3.6 or higher
- pandas
- openpyxl
- Pillow
- torch
- transformers
See requirements.txt for specific version requirements.