This repository provides a comprehensive workflow for augmenting and randomizing both images and their corresponding labels (annotations generated using labelImg). The aim is to prepare a well-structured dataset for training with the Tensorflow Object Detection API.
Create the following organized directory structure:
.
└───annotations
| └───xml
└───data
│ └───split
└───images
| └───train
| └───test
└───sample_images
└───sample_labels
│ AnnotationAugmentation.py
│ Augmentation.py
│ Labeling.py
│ Randomize.py
│ Readme.md
│ Renaming.py
│ TrainTestSplit.py
│ XMLtoCSV.py
-
Prepare Sample Images and Labels:
- Place raw images in the
sample_imagesfolder. - Utilize labelImg to label the sample images and save annotations in the
sample_labelsfolder.
- Place raw images in the
-
Image Augmentation:
- Apply image augmentations using the script
Augmentation.py.
- Apply image augmentations using the script
-
Label Augmentation:
- Apply label augmentations using the script
AnnotationAugmentation.py.
- Apply label augmentations using the script
-
Consolidation:
- Copy the contents of the
sample_imagesfolder to theimagesfolder. - Copy the contents of the
sample_labelsfolder to theannotations/xmlfolder.
- Copy the contents of the
-
Randomization:
- Shuffle the images and labels within the
imagesandannotations/xmlfolders usingRandomize.py.
- Shuffle the images and labels within the
-
Update Annotations:
- Update XML files' contents with updated image and label directories by running
Labeling.py.
- Update XML files' contents with updated image and label directories by running
-
XML to CSV Conversion:
- Convert all XML files to a single CSV using
XMLtoCSV.py.
- Convert all XML files to a single CSV using
-
Train-Test Split:
- Create
trainandtestsubfolders within theimagesdirectory. - Split the dataset into train and test images using
TrainTestSplit.py.
- Create
- The
imagesdirectory is organized intotrainandtestsubdirectories, each containing their respective images for training and testing. - The
datadirectory holds asplitsubdirectory with two CSV files for labels corresponding to train and test images. - All images and labels are sequentially named and their directories are updated as part of the workflow.
By following this structured workflow, you can efficiently augment, randomize, and prepare your dataset for optimal utilization with the Tensorflow Object Detection API.