This 2-in-1 tool analyzes nearly any image-based vision model by testing it in a synthetic world derived from your raw training dataset while never becoming too random. Use analysis results to visually diagnose your model or expand your training/validation set with high quality synthetic mutant images with mutant ground-truths to match. A single realism
hyperparameter controls how mutant or realistic fake images should be.
Some synthetically generated samples at realism=0, 0.5 and 1 respectively:
Model "Hard Mode" fake images (r=0):
Reasonable but not boring (r=0.5):
Very reasonable according to your model (r=1):
"What brightness changes my model's output?"
"Is my model rotation invariant?"
The only Python you need to provide at minimum is a predict(image) -> labels
function used to get model outputs, and works for ANY vision model and MOST label types! If you allow geometric, pixel shifting image augmentation (ex. rotation) and your labels are based on pixel position, deep support for transforming labels to match augmented images is also supported (ex. rotate segmentation label shapes to match the augmented image's rotation) via the Albumentations library.
In slightly more detail it:
- Discovers boundries of randomization (How much can I rotate or blur my training set without affecting model output?)
- Plots how your model reacts to key randomizations at different intensities (What brighness level affects model output across 50% of training samples?)
- Uses failure boundries to generate "infinite" "un/realistic" datasets that can be used for further training or as an infinite validation set!
- AutoML-ish selection of image randomization parameters to safely expand any model's training dataset, synthetically and cheaply
- Auto generate synthetic training data that's as realistic or bizarre as your model can tolerate, under 1 hyperparameter.
- Quickly analyze how your model degrades under many common real-world augmentations (blur, motion, rotation...)
- Your model is in Python or can be system called from Python
- You're using a deep vision model, there are some really nice differentiable algos to deep learn the best data augmentation strategy for your situation, so you may be better suited to deep ML solutions
- You don't care about how brightness, blur, scale-variance etc. affect your model
- Your model can't be called efficiently from Python
- Clone this Git repo to your machine
pip install -r requirements.txt
to install dependecies with Python >= 3.6 installed.- Consider running
python example_basic.py
to make sure the lib works correctly. - Open example_basic.py as a starting template.
Simply instantiate an instance of ImageAugmenter and provide your predict(image_filename) -> labels
model wrapper function. This function is typically a shallow lambda wrapper around an ML or opencv predict(x)
model function and also loads the input image in the model's preferred format.
You can also skip providing a vision model here which will default to a reasonable basic model that detects when randomization bounds for your dataset is going too far, though it's generally much better to use your own model. This may be useful if you want to create very generic-yet-reasonable synthetics from your datasets but independent of your model.
You may also want to specify the 2nd constructor argument diff_error(augmented_labels, original_labels)
which returns an error value from 0 to 1, with 0 signifying both labels are identical, or 1 meaning they are as mislabeled as possible. Default behavior is to treat any string-ified difference between them as error=1 and string equality as error=0. Typically for boolean or categorization it doesn't need modification, whereas a custom lambda for IoU area overlap percentage would be useful for bounding boxes, etc. The labels' type only has meaning to you (unless you use the label_format
option) so define it in whatever way lets you easily implement this function.
If your labels contain positional data (bounding boxes, segmentation), you'll need to take care to transform them (ex. rotate labels' segment shapes when a synthetic image is generated by rotating a training image). This is supported under-the-hood by Albumentations so you can see what label_format
you'll need in ImageAugmenter's constructor which will also dictate the shape of labels you might pass to multiple functions (TODO: link to example code that defines handling COCO segmentation labels).
This step provides clear visibility into what intensity per augmentation degrades your model's performance while also becoming the basis for sane bounds in random data generation. Running the line below will take a long time depending on options and dataset size but will save its progress as it proceeds to analysis.json for resuming later or use in data generation:
from augment import ImageAugmenter
img_aug = ImageAugmenter(on_off_predictor, diff_error=confidence_aware_diff_error)
img_aug.search_randomization_boundries(training_img_filenames, training_labels)
search_randomization_boundries offers plenty of customization (see API section) to focus on only the augmentations your model needs to be resilient to as well as tuning for accuracy vs search speed. analyze.json
is used as a resume-cache so delete this file if you ever want to search_randomization_boundries
from scratch.
With the analysis.json file derived you can now render and open analytics/index.html to view how your model reacts to every augmentation and what intensities start to give your model grief:
img_aug.render_boundries(html_dir="analytics")
After search_randomization_boundries completes, you have everything you need to call synthesize_more in a loop forever! (or run out of disk space):
while (True):
synthetic_img_filenames, synthetic_labels =
img_aug.synthesize_more(training_img_filenames, training_labels)
Your training_img_filenames will be traversed in a loop until count=len(training_img_filenames)
synthetic images are cloned to disk, with each synthetic clone having between min_random_augmentations=3
and max_random_augmentations=8
randomizations applied but only up to intensities seen that individually did not affect model output substantially. If you want to risk exposing your model to training on more extreme datasets such as very bright etc., you can control generating sensible (realism=0.5
) random augmentation ranges between when small model differences start to show all the way up to values right at the edge (realism=0
) of where the model starts to fail 100% of the time (ex. exactly too bright for the model). realism < 0
should be used sparingly as it may worsen realistic cases but could force your model to generalize better. realism
near 1 creates very little variation and is best avoided unless you really need very strict realism. realism
can also be overruled on a per-augmentation feature basis via set_augmentation_intensity
letting you set more narrow unrealistic bounds manually where needed.
To see how your current or future model (after retraining with synthetic data, of course!) is performing against any batch of synthetic data, you can run the ImageAugmenter's convenience function evaluate
:
print(img_aug.evaluate(synthetic_img_filenames, synthetic_labels))
Other than pip dependencies, everything you need will always and forever live in just the augment.py
class file, so after prototyping you can easily paste it into any existing project to use as a library.
ImageAugmenter(my_predict, diff_error, augmentations=ALL_AUGMENTATIONS, label_format=None) class constructor
The main class for grid searching over a training dataset with a model to determine random augmentation limits that the model can tolerate, and store the results.
Returns a JSON blob representing the raw results of each augmentation feature, the same as the contents of analyze.json
.
my_predict
: A function that takes an absolute image filename and runs inference against it, returning prediction labels (the labels' type only has meaning to you as long as it's string-ifiable)
diff_error
: (Optional) Custom error/cost function in the range [0, 1.0] where 0 means both the original unaugmented image labels and augmented image output labels match perfectly or 1 meaning the labels match as little as possible. Default behavior stringifies both raw and augment labels and assumes zero error only if strings are strictly equal, otherwise 1.
augmentations
: (Optional) List of augmentation string types to apply for all downstream operations, defaults to all supported augmentations that are mostly 1-to-1 with those provided in the Albumentations library. You can pick-and-choose each individually if you know your model won't be able to handle certain augmentation types or want to prototype with a smaller/faster feature set. Available types are:
label_format
: (Optional) Make dataset synthesis label-type aware so that for example bounding boxes are geometrically transformed to match the augmentations applied to the image. This option is passed through as-is to the Albumentations library for it to figure out the label transformations, reference their documentation for supported label formats. COCO bounding box example:
Brighten
: Increase brightness
Contrast
: Increase global contrast
Darken
: Decrease brightness
Decontrast
: Decrease global contrast
Desaturate
: Decrease global image saturation
Dehue
: Shift global image hue negative
Hue
: Shift global image hue positive
LessBlue
: Reduce contribution of Blue channel
LessGreen
: Reduce contribution of Green channel
LessRed
: Reduce contribution of Red channel
MoreBlue
: Reduce contribution of Blue channel
MoreGreen
: Reduce contribution of Green channel
MoreRed
: Reduce contribution of Red channel
Saturate
: Increase global image saturation
ElasticTransform
: Randomly stretch the image in 3D
GaussianBlur
: Increase uniform blur
MotionBlur
: Increase camera motion blur effect
SafeRotate
: Rotate the image
Sharpen
: Sharpen the image
Downscale
: Lossily reduce image resolution
MultiplicitiveNoise
: Add random noise to image
PixelDropout
: Percentage of random pixels to blacken
RandomSizedCrop
: Crop an ever smaller random rectangle in the image and stretch it to fill the original frame
Superpixels
: Randomly transplant patches of image elsewhere
search_randomization_boundries(training_img_filenames: list[str], training_labels, step_size_percent: float=0.05)
The main method that examines all training sample images passed in, usually everything you've got. Because it takes a long time to run, it stores intermediate and final results to analysis.json
which you can delete manually to find boundries from scratch (ex. you collected more training data and want to re-run). You should generally feed in as many training_img_filenames as possible to strengthen boundry search confidence and ensure future generated data isn't too unrealistic. On the other hand you may want to limit to ~50000 maximally diverse training samples so analysis completes faster but only if time is a virtue for you.
All other class methods assume you've run this and already computed boundry state.
training_img_filenames
: List of image filenames that will later be passed to my_predict
training_labels
: List of ground truth labels (type agnostic) that match 1-to-1 with training_img_filenames
step_size_percent
: How big of steps to take when finding an augmentation feature's limit, default of 5 means each trial will increase augmentation intensity by 5% until my_predict
starts to differ significantly in its output. Lower values take longer for the 1-time cost of running search_randomization_boundries but will yield more accurate augmentation limit boundries for data generation and graphing, so going down to ~1% step_size granularity can sometimes be worth the investment.
analytics_cache
: The filename to use to store search cache calculations, defaults to analytics.json
set_augmentation_intensity(self, augmentations: str | list[str] = ALL_AUGMENTATIONS, multiplier: float = 1.0, min: int = None, max: int = None, only_shrink_max: bool = False):
Sets the min/max [0, 1] scalar intensities of an augmentation(s) being applied. You can view the webpage output from calling
render_boundries
to easily view what intensities affect your model and how and intelligently set this.
multiplier
: (Optional) Use a single scalar to boost or penalize the default min/max learned intensity boundries. You should prefer
this over setting min/max since it's a single value and it still leverages the good intensity ranges found for your model+dataset.
min
: (Optional) The minimum amount of effect intensity to apply for augmentations
when generating synthetics.
Default: The intensity at which my_predict begins to show >= 2% label differences / error.
max
: (Optional) The maximum amount of effect intensity to apply for augmentations
when generating synthetics.
Default: The intensity at which my_predict begins to show >= 50% label differences / error.
augmentations
: (Optional) A list or single string of augmentation name(s) to apply the min and/or max intensity to.
Default: ALL_AUGMENTATIONS
only_shrink_max
: (Optional) Whether this call should only shrink the range of possible intensity rather than accidently expand it.
Use this if you want to make sure you don't accidently set a higher intensity than what your model can actually handle.
Default: False, affected augmentations will strictly use your min/max values if set.
Sets the probability of an augmentation being applied. weight is relative to other augmentations which are typically 1 (uniform distribution), so a value of 2 would double the odds of selection relative to others while 0.5 cuts in half
augmentations
Single str or list of str of augmentation names to apply this weight to.
weight
: How often this augmentation should be applied [0, inf], value of 1 is uniform, 2 would double the selection likelihood etc.
Render the results of search_randomization_boundries
to HTML for easy visualization of how your model performs against varying degrees of augmention.
html_dir
: (Optional) The directory to write output HTML and image files to, defaults to "analytics" relative directory.
synthesize_more(self, organic_img_filenames: list[str], organic_labels: list = None, realism=0.5, count=None, count_by="per_call", min_random_augmentations=3, max_random_augmentations=6, min_predicted_diff_error=0, max_predicted_diff_error=1, image_namer = verbose_synthetic_namer, log_file="", output_dir="generated", preview_html="__preview.html")
Generate synthetic training/validation samples based on some input set and only use as much randomization as realism
demands. Optionally generates a __preview.html
file that previews all images in the generated output folder.
Returns a quad-tuple of (generated image filenames, generated image labels, generated image's organic source file, generated image's organic label). If count_by
="in_dir" is set, the return values will reflect ALL synthesized images ever generated in output_dir, useful for checkpointing.
organic_img_filenames
: Original (presumably real-world) training images from which to synthesize new datasets, each image will be used in equal quantity.
organic_labels
(Optional) List of ground-truth (presumably real-world) training labels that map 1-to-1 with organic_img_filenames
. Required if my_predict is specified otherwise defaults to derived labels under the hood.
realism
: (Optional) A float between [-∞, 1] to control generated images' realism based on what your model could handle during boundry search. A value of 1 means to steer clear of more intense random values that your model has trouble with while a value of zero pushes to the very limit of what your model can tolerate. Negative values push your current model well into failure territory but may be useful to generate synthetic training data for generalization of your model after retraining.
count
: (Optional) Number of synthetic images to generate, default of None signifies to use len(training_img_filenames)
count_by
: (Optional) Whether the count
parameter should match in_dir
in-directory total image count (checkpoint-friendly) or per_call
absolute generate count. Defaults to per_call
min_random_augmentations
: (Optional) Randomly pick at least this many augmentations to apply. Default: 3
max_random_augmentations
: (Optional) Randomly pick at most this many augmentations to apply. Default: 6
min_predicted_diff_error
: (Optional) The minimum diff error between my_predict
running on original image vs augmented image. Set this if you only want to keep generated images that your model fails at to force it to focus on the outliers it misses. Defaults to zero so all synthesized data is kept.
max_predicted_diff_error
: (Optional) The maximum diff error between my_predict
running on original image vs augmented image. Set this if you want to filter out images that may differ too wildly from the original image. Useful for auto-removing images that end up for example too bright for any model to process; such images can potentially weaken the synthetic dataset for training purposes or make validation on synthetics appear artifically poor. Defaults to 1 so all synthesized data is kept.
image_namer
: (Optional) function that returns the relative image name based on: (raw input relative image path, matching label, uid, applied Albumentation transform summary)
log_file
: (Optional) The filename of the JSON logs of what has been synthesized, useful for synthetic generation checkpointing. Defaults to the output directory name but with ".json" appended, set to None to avoid writing this file at all however count_by
="in_dir" will be disabled.
output_dir
: (Optional) The folder to save images and __preview.html
to.
preview_html
: (Optional) The name of the HTML file that will summarize synthetic images in output_dir
, defaults to __preview.html
. Set to None to disable summarization.
Returned summary object shape:
{
"count": int,
"generated": [{
"synth_filename": str,
"synth_label": any,
"origin_filename": str,
"origin_label": any
}]
}
Run my_predict
against all img_filenames which are typically generated by synthesize_more
as well as the matching synthetic truth labels and compare to the model's output labels ran against img_filenames. This is convenient to test synthetic data against different versions of your model, presumably your model before and after training on the synthetic dataset. Can also be used to compare real-world vs synthetic model performance. If your new model performs poorly on a synthetic batch it was trained on, it suggests your realism
hyperparameter may be too high and you're randomizing training data to the point of mangling it for even the best model (ex. so much extra brightness the image is pure white). If your model performs extremely well on a synthetic batch it was trained on while retaining real-world accuracy, consider increasing realism
to handle more real-world edge cases by training on even stranger synthetic samples. Also consider setting max_predicted_diff_error
to < 1 to task your model with filtering out overly unrealistic synthetic samples.
img_filenames
: list of string filenames to use in evaluation.
img_labels
: The 1-to-1 matching labels of img_filenames
.
Returns an object containing:
{
"avg_diff_error": Average diff error across all evaluated samples,
"output_differs_count": Count of outputs that differed from the label significantly,
"differing_output_errs": All output errors
}
Get a summary of all images generated so far, useful for conditionally calling downstream methods only if needed or iterating over all historically generated image locations and label metadata.
log_file
: (Optional) The filename of the JSON logs of what has been synthesized, useful for synthetic generation checkpointing. Defaults to the output directory name but with ".json" appended, set to None to avoid writing this file at all however count_by
="in_dir" will be disabled.
output_dir
: (Optional) The folder to save images and __preview.html
to.
Returned summary object shape:
{
"count": int,
"generated": [{
"synth_filename": str,
"synth_label": any,
"origin_filename": str,
"origin_label": any
}]
}