Skip to content

This notebook shows a basic implementation of a transformer (decoder) architecture for image generation in TensorFlow 2.

License

Notifications You must be signed in to change notification settings

GregorKobsik/ImageTransformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImageTransformer

This notebook shows a basic implementation of a transformer (decoder) architecture for image generation in TensorFlow 2.

It demonstrates how to use a transformer decoder to learn a generative representation of the MNIST dataset and perform an autoregressive image reconstruction.

Mnist dataset examples:

MNIST examples

To reduce the number of color values, we perform a color quantization, e.g. we compute k-means clustering to get 8 color clusters and thus reduce our color palette.

Quantized examples:

MNIST quantized

Afterwards we serialize the images to obtain linear sequences of length 784 per image, which can be fed into the model as used in NLP.

See the notebook to get an in-depth explanation of the model.

Results

We perform image reconstruction, e.g. we take mnist images, remove the bottom half of the image, quantize it and let our model reconstruct the missing part. Afterwards we can revert the quantization and obtain a new generated mnist image. Compare the output and the input to see that the model does not memorize the inputs but creates new images.

Input data:

mnist input

Bottom half removed:

mnist top half

Generated output:

model output

Further reading

Keywords

Autoregressive Image Generation, MNIST, Transformers, Transformer Decoder, ImageGPT, Generative Methods, Generative Loss, Deep Learning, Machine Learning, TensorFlow 2