This project aims to build a conditional variational autoencoder (CVAE) to generate arbitrary handwritten letters/digits based on the keyboard input. Based on the EMNIST dataset, the CVAE model is trained to encode the handwritten letters/digits into a latent vector space. With a random sampling or interpolation technique, imaginary letters and digits are generated.
- Loss: binary crossentropy
- Optimizer: Adam
- Latent dimension: 6
- Image normalization: [0, 1]
- Last activation function of the decoder: sigmoid
- Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(14,14,16)]-[(7,7,32)]-[1568]-[64]-[6] // [6,62]-[64]-[1568]-[(7,7,32)]-[(14,14,32)]-[(28,28,16)]-[(28,28,1)]-[784]
- Multi-layer CVAE layers: [784,62]-[256]-[128]-[6] // [6,62]-[128]-[256]-[784]
A command-line letters/digits generator based on the ldg_v3 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.
- label inputs to both encoder and decoder
- Loss: MSE
- Optimizer: Adam
- Latent dimension: 10
- Image normalization: [-1, 1]
- Last activation function of the decoder: tanh
- Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(28,28,16)]-[(28,28,32)]-[(28,28,64)]-[12544]-[128]-[10] // [10,62]-[128]-[12544]-[(14,14,64)]-[(28,28,32)]-[(28,28,16)]-[(28,28,1)]-[784]
- Multi-layer CVAE layers: [784,62]-[512]-[256]-[10] // [10,62]-[256]-[512]-[784]
A command-line letters/digits generator based on ldg_v2 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.
- label inputs to both encoder and decoder
Initial convolutional conditional variational autoencoder model.
- label inputs only to decoder
- training/test data reconstructions were satisfactory, but generation of specific string input was somewhat difficult.
While the model architecture seems to be okay, the standford dogs datasets may not be suitable to train VAE.