layout	published	title
prediction_post	false	The Illustrated Generalist Agent (Gato)

Could you train one machine learning model to learn hundreds of tasks spanning text, computer vision, and playing video games and controlling robots? In this post and video we go over DeepMind’s GATO that does this with a model that is simpler and smaller than you may think. It’s a GPT-like model that learns over 600 tasks. It opens the door to World Scope 4 as discussed in the Experience Grounds Language Video.

Figure 1 from the paper

Figure 2 from the paper

Modalities map

GPT

BERT

GAN

CLIP

DallE / Stable Diffusion

Gato

Gato sequences

Figure 3 from the paper

Table 1 from the paper - datasets

Figure 4 from the paper

Performance and results

Figure 5 from the paper

Figure 5 from the paper at 0

Figure 5 from the paper at 50

Figure 5 from the paper at 100

GATO vs. Experts scoring

Tokenization

Text tokenization

Image tokenization

Text + Image tokenization

Text + Image tokenization - image captioning

Discrete values

Text + Image tokenization - image captioning

Timesteps & episodes

Image + controller

Image + controller vector sequence

Continuous values

Native and non-native modalities

[Translating ]

Expert sequences

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2022-09-06-generalist-agent.md

2022-09-06-generalist-agent.md

Modalities map

Performance and results

Tokenization

Discrete values

Timesteps & episodes

Continuous values

Native and non-native modalities

Files

2022-09-06-generalist-agent.md

Latest commit

History

2022-09-06-generalist-agent.md

File metadata and controls

Modalities map

Performance and results

Tokenization

Discrete values

Timesteps & episodes

Continuous values

Native and non-native modalities