Skip to content

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets)

License

Notifications You must be signed in to change notification settings

AlexanderVNikitin/tsgm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Time Series Generative Modeling (TSGM)

Create and evaluate synthetic time series datasets effortlessly

Open in Colab Pypi version unit-tests Python 3.8+ codecov arXiv

Get Started โ€ข Tutorials โ€ข Augmentations โ€ข Generators โ€ข Metrics โ€ข Datasets โ€ข Contributing โ€ข Citing

๐Ÿงฉ Get Started

TSGM is an open-source framework for synthetic time series dataset generation and evaluation.

The framework can be used for creating synthetic datasets (see ๐Ÿ”จ Generators ), augmenting time series data (see ๐ŸŽจ Augmentations ), evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see ๐Ÿ“ˆ Metrics ), using common time series datasets (TSGM provides easy access to more than 140 datasets, see ๐Ÿ’พ Datasets ).

We provide:

  • Documentation with a complete overview of the implemented methods,
  • Tutorials that describe practical use-cases of the framework.

Install TSGM

pip install tsgm

M1 and M2 chips:

To install tsgm on Apple M1 and M2 chips:

# Install tensorflow
conda install -c conda-forge tensorflow=2.9.1

# Install tsgm without dependencies
pip install tsgm --no-deps

# Install rest of the dependencies (separately here for clarity)
conda install tensorflow-probability scipy antropy statsmodels dtaidistance networkx optuna prettytable seaborn scikit-learn yfinance tqdm

Train your generative model

import tsgm

# ... Define hyperparameters ...
# dataset is a tensor of shape n_samples x seq_len x feature_dim

# Zoo contains several prebuilt architectures: we choose a conditional GAN architecture
architecture = tsgm.models.architectures.zoo["cgan_base_c4_l1"](
    seq_len=seq_len, feat_dim=feature_dim,
    latent_dim=latent_dim, output_dim=0)
discriminator, generator = architecture.discriminator, architecture.generator

# Initialize GAN object with selected discriminator and generator
gan = tsgm.models.cgan.GAN(
    discriminator=discriminator, generator=generator, latent_dim=latent_dim
)
gan.compile(
    d_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
    g_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
    loss_fn=keras.losses.BinaryCrossentropy(from_logits=True),
)
gan.fit(dataset, epochs=N_EPOCHS)

# Generate 100 synthetic samples
result = gan.generate(100)

โš“ Tutorials

For more examples, see our tutorials.

๐ŸŽจ Augmentations

TSGM provides a number of time series augmentations.

Augmentation Class in TSGM Reference
Gaussian Noise / Jittering tsgm.augmentations.GaussianNoise -
Slice-And-Shuffle tsgm.augmentations.SliceAndShuffle -
Shuffle Features tsgm.augmentations.Shuffle -
Magnitude Warping tsgm.augmentations.MagnitudeWarping Data Augmentation of Wearable Sensor Data for Parkinsonโ€™s Disease Monitoring using Convolutional Neural Networks
Window Warping tsgm.augmentations.WindowWarping Data Augmentation for Time Series Classification using Convolutional Neural Networks
DTW Barycentric Averaging tsgm.augmentations.DTWBarycentricAveraging A global averaging method for dynamic time warping, with applications to clustering.

๐Ÿ”จ Generators

TSGM implements several generative models for synthetic time series data.

Method Link to docs Type Notes
Structural Time Series sts.STS Data-driven Great for modeling time series when prior knowledge is available (e.g., trend or seasonality).
GAN GAN Data-driven A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators.
WaveGAN GAN Data-driven WaveGAN is the model for audio synthesis proposed in Adversarial Audio Synthesis. To use WaveGAN, set use_wgan=True when initializing the GAN class and use the zoo["wavegan"] architecture from the model zoo.
ConditionalGAN ConditionalGAN Data-driven A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one.
BetaVAE BetaVAE Data-driven A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series.
cBetaVAE cBetaVAE Data-driven Conditional version of BetaVAE. It supports temporal a scalar condiotioning.
TimeGAN TimeGAN Data-driven TSGM implementation of TimeGAN from paper
SineConstSimulator SineConstSimulator Simulator-based Simulator-based synthetic signal that switches between constant and periodics functions.
Lotka Volterra LotkaVolterraSimulator Simulator-based Simulator-based synthetic signal that switches between constant and periodics functions.
PdM Simulator PdMSimulator Simulator-based Simulator of predictive maintenance with multiple pieces of equipment from paper

๐Ÿ“ˆ Metrics

TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from our paper for more detail on the evaluation of synthetic time series.

Metric Link to docs Type Notes
Distance in the space of summary statistics tsgm.metrics.DistanceMetric Distance Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those.
Maximum Mean Discrepancy (MMD) tsgm.metrics.MMDMetric Distance This metric calculated MMD between real and synthetic samples
Discriminative Score tsgm.metrics.DiscriminativeMetric Distance The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets.
Demographic Parity Score tsgm.metrics.DemographicParityMetric Fairness This metric assesses the difference in the distributions of a target variable among different groups in two datasets. Refer to this paper to learn more.
Predictive Parity Score tsgm.metrics.PredictiveParityMetric Fairness This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. Refer to this paper to learn more.
Privacy Membership Inference Attack Score tsgm.metrics.PrivacyMembershipInferenceMetric Privacy The metric measures the possibility of membership inference attacks.
Spectral Entropy tsgm.metrics.EntropyMetric Diversity Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies.
Shannon Entropy tsgm.metrics.ShannonEntropyMetric Diversity Shannon Entropy calculated over the labels of a dataset.
Pairwise Distance tsgm.metrics.PairwiseDistanceMetric Diversity Measures pairwise distances in a set of time series.
Downstream Effectiveness tsgm.metrics.DownstreamPerformanceMetric Downstream Effectiveness The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data.
Qualitative Evaluation tsgm.utils.visualization Qualitative Various tools for visual assessment of a generated dataset.

๐Ÿ’พ Datasets

Dataset API Description
UCR Dataset tsgm.utils.UCRDataManager https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/
Mauna Loa tsgm.utils.get_mauna_loa() https://gml.noaa.gov/ccgg/trends/data.html
EEG & Eye state tsgm.utils.get_eeg() https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State
Power consumption dataset tsgm.utils.get_power_consumption() https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption
Stock data tsgm.utils.get_stock_data(ticker_name) Gets historical stock data from YFinance
COVID-19 over the US tsgm.utils.get_covid_19() Covid-19 distribution over the US
Energy Data (UCI) tsgm.utils.get_energy_data() https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction
MNIST as time series tsgm.utils.get_mnist_data() https://en.wikipedia.org/wiki/MNIST_database
Samples from GPs tsgm.utils.get_gp_samples_data() https://en.wikipedia.org/wiki/Gaussian_process
Physionet 2012 tsgm.utils.get_physionet2012() https://archive.physionet.org/pn3/challenge/2012/
Synchronized Brainwave Dataset tsgm.utils.get_synchronized_brainwave_dataset() https://www.kaggle.com/datasets/berkeley-biosense/synchronized-brainwave-dataset

TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the documentation

๐Ÿ› ๏ธ Contributing

We appreciate all contributions. To learn more, please check CONTRIBUTING.md.

For contributors

git clone github.com/AlexanderVNikitin/tsgm
cd tsgm
pip install -e .

Run tests:

python -m pytest

To check static typing:

mypy

๐Ÿ’ป CLI

We provide two CLIs for convenient synthetic data generation:

  • tsgm-gd generates data by a stored sample,
  • tsgm-eval evaluates the generated time series.

Use tsgm-gd --help or tsgm-eval --help for documentation.

๐Ÿ” Citing

If you find this repo useful, please consider citing our paper:

@article{
  nikitin2023tsgm,
  title={TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series},
  author={Nikitin, Alexander and Iannucci, Letizia and Kaski, Samuel},
  journal={arXiv preprint arXiv:2305.11567},
  year={2023}
}

License

Apache License 2.0