This DeepTrackAI repository contains a dataset of transformed MNIST digits paired with natural-language descriptions.
Images are generated using the notebook create_transformed_images.ipynb, which applies randomized color, rotation/flip, and contrast transformations to the original MNIST digits and saves both the transformed images and corresponding text prompts.
The dataset is designed for benchmarking text-to-image alignment and multimodal learning tasks.
- Number of images: equal to the number of MNIST images (60,000), each with one transformed version
- Image size: 28 × 28 pixels
- Image format: 8-bit RGB PNG
- Metadata: JSON file mapping filenames to natural-language transformation descriptions
- Transformations included:
- Digit colorization (e.g., “in red on a blue background”)
- Rotations (±90°, 180°)
- Horizontal mirror / vertical flip
- Auto-contrast normalization
-
Base dataset
- Title: The MNIST Database of Handwritten Digits
- Authors: Yann LeCun, Corinna Cortes, Christopher J.C. Burges
- Source: Official MNIST Website
- Reference article: LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. Proceedings of the IEEE 86(11): 2278-2324 (1998). DOI: 10.1109/5.726791
- License: Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
- Note: The original MNIST website is no longer actively maintained. An archived version is available, and the dataset is widely mirrored (e.g., on Kaggle).
-
Derivative dataset
- Title: Text-to-Image Transformed MNIST Dataset
- Authors: Carlo Manzo
- Source: This repository
- License: Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
If you use this dataset in your research, please follow the licensing requirements and properly attribute the original authors.
/text2image_dataset
└── train/
├── images/ # Transformed digit images
│ ├── 0_000000.png
│ ├── 0_000001.png
│ └── ...
└── image_descriptions.json # Mapping of filenames to natural-language transformation descriptions
Each filename inimages/
begins with the digit label (0–9), followed by a sequential numerical identifier.
The JSON file provides the corresponding text description for each transformed image.
git clone https://github.com/DeepTrackAI/text2image_dataset
cd text2image_dataset
Run the notebook to generate transformed images and their descriptions:
jupyter notebook create_transformed_images.ipynb
When using this dataset or the code, please cite the original MNIST dataset, the reference article, and the text-to-image transformation repository.
LeCun Y, Cortes C, Burges CJC. The MNIST Database of Handwritten Digits. Retrieved from http://yann.lecun.com/exdb/mnist/
@misc{lecun1998mnist,
title = {The MNIST Database of Handwritten Digits},
author = {LeCun, Yann and Cortes, Corinna and Burges, Christopher J.C.},
year = {1998},
howpublished = {\url{http://yann.lecun.com/exdb/mnist/}}
}
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324 (1998). DOI: 10.1109/5.726791
@article{lecun1998gradient,
title = {Gradient-based learning applied to document recognition},
author = {LeCun, Yann and Bottou, L{\'e}on and Bengio, Yoshua and Haffner, Patrick},
journal = {Proceedings of the IEEE},
volume = {86},
number = {11},
pages = {2278--2324},
year = {1998},
publisher = {IEEE},
doi = {10.1109/5.726791}
}
Carlo Manzo. Text-to-Image Transformed MNIST Dataset. GitHub (2025). GitHub
@misc{text2image2025,
author = {Carlo Manzo},
title = {Text-to-Image Transformed MNIST Dataset},
year = {2025},
howpublished = {\url{https://github.com/DeepTrackAI/text2image_dataset}}
}
As a derivative dataset, this repository is shared under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) License, following the original licensing terms.