Course project in Deep generative modelling (Fall 2022) | DTU
Heterogeneous data contains different types of features (continuous, ordinal, nominal etc). Deep generative models, such as Variational Autoencoders (VAEs), can have a hard time learning the different underlying probability distributions for each type of feature. Common probability distributions are the Gaussian distribution for continuous features, categorical distribution for categorical features, Bernoulli distribution for binary features etc.
These probability distributions belong to the exponential family. All distributions in the exponential family can be written on the same form:
parameterised by their natural parameters
Gaussian | Categorical | |
---|---|---|
1 | ||
0 |
where
We benchmark the models on four UCI datasets: Avocado Sales, Bank marketing, Boston Housing, and Energy Efficiency.
Create conda environment
conda create --name DGM python=3.9
conda activate DGM
pip install -r requirements.txt
Or setup environment on HPC
module load python3/3.9.6
module load cuda/11.7
python3 -m venv DGM
source DGM/bin/activate
pip3 install -r requirements.txt
To train and test on for instance the bank dataset, run the following or submit the shellscript submit.sh
.
python main.py --seed 3407 --device cuda --write --mode "traintest" --experiment "bank" --dataset "bank" --scale "normalize" --max_epochs 500 --max_patience 100 --prior "vampPrior" --beta 0.01
Models are available in the following link