TPU Tutorial

Advantage of Using TPU

Efficiency: TPU v3-8 (the smallest TPU which has 8 cores, 16GB*8) is faster than 8 V100 GPUs that have the same amount of memory as shown in (https://github.com/allenai/tpu_pretrain).
Convenient: Unlike using GPU in the school, you don't need to compete for computation resources with other members, then you can start the experiment any time.
Free: If you have Google TPU research cloud membership, it is totally free.

Disadvantage of Using TPU

Not easy to use for TF 1.X and Pytorch, only easy for TF 2.X, I don't know about JAX/Flax.

How to use it freely:

I find there are five feasible ways:

Apply for Google TPU Research Cloud membership (I don't know the difficulty) (https://sites.research.google/trc/).
(Highly recommended) Participate in the competition organized by Google. (I participated in a Kaggle competition organized by Google in Dec 2019, after the competition, I can use TPU freely in my own research, I can still use it now, I can run up to 5 TPU-v3 at the same time).
Use the free TPU in Colab/Kaggle notebook.
Use Google Cloud 300$ free trial (If you are a new customer of Google Cloud)
Apply for Google Cloud Research Credits (1000$ GCP credits, only for doctoral students).

Code example

Actually, you can choose TF, Pytorch, and JAX/Flax for using TPU, I prefer TF 2.X, I will show some useful code examples for TF 2.X and Pytorch. If you use TPU in Google Cloud Platform, please read the Cloud TPU document at first. It is highly recommended to use the newly released TPU Virtual Machine.

TF 2.X

When I started to use TPU, it was the summer of 2019, I could use TPU only in TF 1.X, I just tried to run the fine-tuning code in BERT/XLNET official repository. I found it was really hard to use TPU in TF 1.X. Fortunately, at the end of 2019, TPU support for TF 2.X was released, it became easy to use TPU. Since TF 2.X was just released, it was unstable, I just used it in the Kaggle competition, not in the research. This year, I find Google Research has implemented many research papers using TF 2.X, I think it is the right time to use TF 2.X in the research now.

In TF 2.X, the simplest way to for using TPU is using the model.fit() in the training process. You can also write your custom training loop. Here are some useful examples:

When using TPU in TF 2.X, please keep in mind that:

Every input sequence should be the same length.
Don't use tf.keras.layers.Embedding, please refer to HuggingFace TFBertEmbeddings.

Some papers of Google Research that have used TPU in TF 2.X:

Pytorch

You can use TPU by torch_xla package or accelerate:

torch_xla: All examples in Huggingface Pytorch examples without suffix no_trainer, such as run_glue.py.
accelerate: All examples in Huggingface Pytorch examples with suffix no_trainer, such as run_glue_no_trainer.py

Also, in Pytorch, every input sequence should be the same length.

JAX/Flax

I am studying it now, I find many researchers in Google and HuggingFace are using it. HuggingFace also has some examples about it.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPU Tutorial

Advantage of Using TPU

Disadvantage of Using TPU

How to use it freely:

Code example

TF 2.X

Pytorch

JAX/Flax

About

Releases

Packages

guozhiyu/tpu_tutorial

Folders and files

Latest commit

History

Repository files navigation

TPU Tutorial

Advantage of Using TPU

Disadvantage of Using TPU

How to use it freely:

Code example

TF 2.X

Pytorch

JAX/Flax

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages