Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
- Walkthrough: Install and run.
- IPython notebook: Get a hands-on experience.
- Overview: How all parts of T2T code are connected.
- New Problem: Train T2T models on your data.
- New Model: Create your own T2T model.
Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8-GPU machines; you might need to modify the hyperparameters if you run on a different setup.
For image classification, we have a number of standard data-sets:
- ImageNet (a large data-set):
--problem=image_imagenet
, or one of the re-scaled versions (image_imagenet224
,image_imagenet64
,image_imagenet32
) - CIFAR-10:
--problem=image_cifar10
(or--problem=image_cifar10_plain
to turn off data augmentation) - CIFAR-100:
--problem=image_cifar100
- MNIST:
--problem=image_mnist
For ImageNet, we suggest to use the ResNet or Xception, i.e.,
use --model=resnet --hparams_set=resnet_50
or
--model=xception --hparams_set=xception_base
.
Resnet should get to above 76% top-1 accuracy on ImageNet.
For CIFAR and MNIST, we suggest to try the shake-shake model:
--model=shake_shake --hparams_set=shakeshake_big
.
This setting trained for --train_steps=700000
should yield
close to 97% accuracy on CIFAR-10.
For language modeling, we have these data-sets in T2T:
- PTB (a small data-set):
--problem=languagemodel_ptb10k
for word-level modeling and--problem=languagemodel_ptb_characters
for character-level modeling. - LM1B (a billion-word corpus):
--problem=languagemodel_lm1b32k
for subword-level modeling and--problem=languagemodel_lm1b_characters
for character-level modeling.
We suggest to start with --model=transformer
on this task and use
--hparams_set=transformer_small
for PTB and
--hparams_set=transformer_base
for LM1B.
For the task of recognizing the sentiment of a sentence, use
- the IMDB data-set:
--problem=sentiment_imdb
We suggest to use --model=transformer_encoder
here and since it is
a small data-set, try --hparams_set=transformer_tiny
and train for
few steps (e.g., --train_steps=2000
).
For speech-to-text, we have these data-sets in T2T:
- Librispeech (English speech to text):
--problem=librispeech
for the whole set and--problem=librispeech_clean
for a smaller but nicely filtered part.
For summarizing longer text into shorter one we have these data-sets:
- CNN/DailyMail articles summarized into a few sentences:
--problem=summarize_cnn_dailymail32k
We suggest to use --model=transformer
and
--hparams_set=transformer_prepend
for this task.
This yields good ROUGE scores.
There are a number of translation data-sets in T2T:
- English-German:
--problem=translate_ende_wmt32k
- English-French:
--problem=translate_enfr_wmt32k
- English-Czech:
--problem=translate_encs_wmt32k
- English-Chinese:
--problem=translate_enzh_wmt32k
- English-Vietnamese:
--problem=translate_envi_iwslt32k
- English-Spanish:
--problem=translate_enes_wmt32k
You can get translations in the other direction by appending _rev
to
the problem name, e.g., for German-English use
--problem=translate_ende_wmt32k_rev
.
For all translation problems, we suggest to try the Transformer model:
--model=transformer
. At first it is best to try the base setting,
--hparams_set=transformer_base
. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
--hparams_set=transformer_base_single_gpu
setting. For very good results
or larger data-sets (e.g., for English-French), try the big model
with --hparams_set=transformer_big
.