Skip to content

guozhiyu/tpu_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

TPU Tutorial

Advantage of Using TPU

  • Efficiency: TPU v3-8 (the smallest TPU which has 8 cores, 16GB*8) is faster than 8 V100 GPUs that have the same amount of memory as shown in (https://github.com/allenai/tpu_pretrain).
  • Convenient: Unlike using GPU in the school, you don't need to compete for computation resources with other members, then you can start the experiment any time.
  • Free: If you have Google TPU research cloud membership, it is totally free.

Disadvantage of Using TPU

  • Not easy to use for TF 1.X and Pytorch, only easy for TF 2.X, I don't know about JAX/Flax.

How to use it freely:

I find there are five feasible ways:

  • Apply for Google TPU Research Cloud membership (I don't know the difficulty) (https://sites.research.google/trc/).
  • (Highly recommended) Participate in the competition organized by Google. (I participated in a Kaggle competition organized by Google in Dec 2019, after the competition, I can use TPU freely in my own research, I can still use it now, I can run up to 5 TPU-v3 at the same time).
  • Use the free TPU in Colab/Kaggle notebook.
  • Use Google Cloud 300$ free trial (If you are a new customer of Google Cloud)
  • Apply for Google Cloud Research Credits (1000$ GCP credits, only for doctoral students).

Code example

Actually, you can choose TF, Pytorch, and JAX/Flax for using TPU, I prefer TF 2.X, I will show some useful code examples for TF 2.X and Pytorch. If you use TPU in Google Cloud Platform, please read the Cloud TPU document at first. It is highly recommended to use the newly released TPU Virtual Machine.

TF 2.X

When I started to use TPU, it was the summer of 2019, I could use TPU only in TF 1.X, I just tried to run the fine-tuning code in BERT/XLNET official repository. I found it was really hard to use TPU in TF 1.X. Fortunately, at the end of 2019, TPU support for TF 2.X was released, it became easy to use TPU. Since TF 2.X was just released, it was unstable, I just used it in the Kaggle competition, not in the research. This year, I find Google Research has implemented many research papers using TF 2.X, I think it is the right time to use TF 2.X in the research now.

In TF 2.X, the simplest way to for using TPU is using the model.fit() in the training process. You can also write your custom training loop. Here are some useful examples:

When using TPU in TF 2.X, please keep in mind that:

Some papers of Google Research that have used TPU in TF 2.X:

Pytorch

You can use TPU by torch_xla package or accelerate:

Also, in Pytorch, every input sequence should be the same length.

JAX/Flax

I am studying it now, I find many researchers in Google and HuggingFace are using it. HuggingFace also has some examples about it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published