Please help us to improve our benchmark: hitsz-ids#82
Benchmarks aim to measure the performance of the library.
- Performance: Processing time, Training time of model, Simpling rate...
- Memory Consumption
- Others, like cache hit rate...
Now we provide a simple benchmark for our CTGAN implementation against the original one. Fit them with a big ramdom dataset, and compare their memory consumptions.
# Clone and install latest version
# You can also use our latest image: docker pull idsteam/sdgx:latest
git clone https://github.com/hitsz-ids/synthetic-data-generator.git
cd synthetic-data-generator && pip install -e ./
# Setup benchmark
cd benchmarks
pip install -r requirements.txt
Generate a dataset with python generate_dataset.py
, you can use python generate_dataset.py --help
to see the usage.
We use memory_profiler to benchmark our implementation.
mprof run python ./sdgx_ctgan.py
Plot the results with mprof plot
or mprof plot --output=sdgx_ctgan.png
to save the plot.
pip install ctgan
mprof run python ./sdv_ctgan.py
Plot the results with mprof plot
or mprof plot --output=sdv_ctgan.png
to save the plot.
In default settings, our implementation can fit 1,000,000 x 50 size dataset in 32GB(usable nearly 20GB) memory mechine. And the original implementation need more than 20GB memory and crashed during training.