GeoSet is a python framework that enables the creation 2D Geometrical Dataset.
This framework aims to enable the creation of different 2D geometrical datasets extending simple polygons' generation.
The target audience is mainly Machine Learning Researchers and Enthusiasts since it's extensible and generates ready-to-use Datasets.
2D Shapes as Lines, Open Triangles, Squares, Ellipses
Augmentation via GLSL Shaders
Parallel/Threaded generation
Numpy .npz output (with train/test split)
Thumbnails sample generation
- python 3.x
- cv 2
- moderngl
- PIL
- progress
- colour
The framework generates (by default) fill-free Grey-scaled images (black edges on white background) of any size.
The generation of the geometries occurs procedurally and randomly (as bias-free as possible) according to the specified parameters.
Finally, the framework enables augmentations such as blur, noise, etc., to avoid a sterile dataset.
Below you can find a sample data-set generated by the framework (more here):
- Add examples
- Finalize unit tests
- Add code comments and inline license
- Make it easier to extend the procedural generation
- Make it available on pip
You first need to create a Dataset by extending the Dataset class and overriding the _generate_image
method as in simple_dataset.py.
After that you only need to run the dataset a in:
# here you are instantiating the dataset
dataset = SimpleDataset(samples_per_category=10, image_size=(28, 28), destination="output", save_images=False)
# the line below will generate the dataset in-memory
# this operation might take a while depending on your settings
# a visual progress bar will present the progress
dataset.generate()
# after the dataset is generated you might choose to save the npz file of it
dataset.save_npz(test_size=.3, file_name='%(name)s_dataset')
# finally if you wish you might choose to save a sample of the thumbnails to publish in your paper
dataset.save_thumbnails(examples_per_category= 10, scale= 1.0, file_name= '%(name)s_thumbnails')
More examples can be found here
If you intend to publish a paper, don't forget to cite the dataset:
@misc{silvestrim2021geoset,
title = {GeoSet},
author = {Ghesla Silvestrim, Filipe},
year = {2021},
howpublished = {\url{https://github.com/fsilvestrim/geoset}},
}