Skip to content

Latest commit

 

History

History
256 lines (164 loc) · 4.53 KB

2022-09-06-generalist-agent.md

File metadata and controls

256 lines (164 loc) · 4.53 KB
layout published title
prediction_post
false
The Illustrated Generalist Agent (Gato)

Could you train one machine learning model to learn hundreds of tasks spanning text, computer vision, and playing video games and controlling robots? In this post and video we go over DeepMind’s GATO that does this with a model that is simpler and smaller than you may think. It’s a GPT-like model that learns over 600 tasks. It opens the door to World Scope 4 as discussed in the Experience Grounds Language Video.



Figure 1 from the paper

Figure 2 from the paper

Modalities map


GPT

BERT

GAN

CLIP

DallE / Stable Diffusion

Gato

Gato sequences

Figure 3 from the paper

Table 1 from the paper - datasets

Figure 4 from the paper

Performance and results


Figure 5 from the paper

Figure 5 from the paper at 0

Figure 5 from the paper at 50

Figure 5 from the paper at 100

GATO vs. Experts scoring

Tokenization


Text tokenization

Image tokenization

Text + Image tokenization

Text + Image tokenization - image captioning

Discrete values


Text + Image tokenization - image captioning



Timesteps & episodes


Image + controller

Image + controller

Image + controller vector sequence

Continuous values

Native and non-native modalities

[Translating ]


Expert sequences