This is a small project to quickly create 2D multiclass-datasets for machine learning applications.
You can install it using pip:
pip install svg_sampler
Then you can use the function sample_from_svg to sample datapoints from the filled in paths/objects of the svg.
from svg_sampler import sample_from_svg
from matplotlib import pyplot as plt
X, y = sample_from_svg(path_to_file, 5000, normalize=True,
sample_setting="based_on_area", overlap_mode="upper_only")
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()sample_from_svg(path, total_samples, sample_setting="equal_over_classes", overlap_mode="all", normalize=False, *, seed)path(str): Path to the SVG file.total_samples(int): Total number of points to sample.sample_setting(str): Sampling mode. Options:"equal_over_classes"– Union shapes by fill and sample equally."equal_over_shapes"– Sample equally from each shape."based_on_area"– Allocate samples proportional to the shape's area.
overlap_mode(str): Overlap handling mode. Options:"all"– Sample from all overlapping shapes."upper_only"– Only sample from the top (upper) shape in overlapping regions.
normalize(bool): If True, normalize the sampled coordinates per axis to [0, 1].seed(int): Seed for random number generation.
X(ndarray): Array of shape (N, 2) with (x, y) coordinates.y(ndarray): Numeric class labels for each sample point.
numpy
shapely
svgpathtools
matplotlib