|
| 1 | +--- |
| 2 | +layout: prediction_post |
| 3 | +published: False |
| 4 | +title: The Illustrated Generalist Agent (Gato) |
| 5 | +--- |
| 6 | + |
| 7 | + |
| 8 | +Could you train one machine learning model to learn hundreds of tasks spanning text, computer vision, and playing video games and controlling robots? In this post and video we go over DeepMind’s GATO that does this with a model that is simpler and smaller than you may think. It’s a GPT-like model that learns over 600 tasks. It opens the door to World Scope 4 as discussed in the Experience Grounds Language Video. |
| 9 | + |
| 10 | + |
| 11 | +<div class="img-div" markdown="0"> |
| 12 | + <img src="/images/gato/.png" /> |
| 13 | + <br /> |
| 14 | + |
| 15 | +</div> |
| 16 | + |
| 17 | + |
| 18 | +<div class="img-div" markdown="0"> |
| 19 | + <img src="/images/gato/gato-paper-figure-1.png" /> |
| 20 | + <br /> |
| 21 | + Figure 1 from the paper |
| 22 | +</div> |
| 23 | + |
| 24 | + |
| 25 | +<div class="img-div" markdown="0"> |
| 26 | + <img src="/images/gato/gato-paper-figure-2.png" /> |
| 27 | + <br /> |
| 28 | + Figure 2 from the paper |
| 29 | +</div> |
| 30 | + |
| 31 | + |
| 32 | +# Modalities map |
| 33 | + |
| 34 | + |
| 35 | +<div class="img-div" markdown="0"> |
| 36 | + <img src="/images/gato/GPT-modalities.png" /> |
| 37 | + <br /> |
| 38 | +GPT |
| 39 | +</div> |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +<div class="img-div" markdown="0"> |
| 44 | + <img src="/images/gato/bert - modalities.png" /> |
| 45 | + <br /> |
| 46 | + BERT |
| 47 | +</div> |
| 48 | + |
| 49 | + |
| 50 | +<div class="img-div" markdown="0"> |
| 51 | + <img src="/images/gato/GAN - modalities.png" /> |
| 52 | + <br /> |
| 53 | + GAN |
| 54 | +</div> |
| 55 | + |
| 56 | +<div class="img-div" markdown="0"> |
| 57 | + <img src="/images/gato/clip modalities.png" /> |
| 58 | + <br /> |
| 59 | + CLIP |
| 60 | +</div> |
| 61 | + |
| 62 | +<div class="img-div" markdown="0"> |
| 63 | + <img src="/images/gato/Dalle stable diffusion image gen modalities.png" /> |
| 64 | + <br /> |
| 65 | + DallE / Stable Diffusion |
| 66 | +</div> |
| 67 | + |
| 68 | + |
| 69 | +<div class="img-div" markdown="0"> |
| 70 | + <img src="/images/gato/gato modalities.png" /> |
| 71 | + <br /> |
| 72 | + Gato |
| 73 | +</div> |
| 74 | + |
| 75 | + |
| 76 | +<div class="img-div" markdown="0"> |
| 77 | + <img src="/images/gato/gato-modalities-sequences.png" /> |
| 78 | + <br /> |
| 79 | + Gato sequences |
| 80 | +</div> |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | + |
| 85 | +<div class="img-div" markdown="0"> |
| 86 | + <img src="/images/gato/gato-modalities-sequences.png" /> |
| 87 | + <br /> |
| 88 | + Figure 3 from the paper |
| 89 | +</div> |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | + |
| 94 | +<div class="img-div" markdown="0"> |
| 95 | + <img src="/images/gato/table-1-datasets-gato.png" /> |
| 96 | + <br /> |
| 97 | + Table 1 from the paper - datasets |
| 98 | +</div> |
| 99 | + |
| 100 | + |
| 101 | +<div class="img-div" markdown="0"> |
| 102 | + <img src="/images/gato/gato-paper-figure-4.png" /> |
| 103 | + <br /> |
| 104 | + Figure 4 from the paper |
| 105 | +</div> |
| 106 | + |
| 107 | + |
| 108 | + |
| 109 | +## Performance and results |
| 110 | + |
| 111 | +<div class="img-div" markdown="0"> |
| 112 | + <img src="/images/gato/gato-paper-figure-5.png" /> |
| 113 | + <br /> |
| 114 | + Figure 5 from the paper |
| 115 | +</div> |
| 116 | + |
| 117 | + |
| 118 | +<div class="img-div" markdown="0"> |
| 119 | + <img src="/images/gato/figure-5-explainer-1-at-0.png" /> |
| 120 | + <br /> |
| 121 | + Figure 5 from the paper at 0 |
| 122 | +</div> |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | +<div class="img-div" markdown="0"> |
| 127 | + <img src="/images/gato/figure-5-explainer-2-at-50.png" /> |
| 128 | + <br /> |
| 129 | + Figure 5 from the paper at 50 |
| 130 | +</div> |
| 131 | + |
| 132 | +<div class="img-div" markdown="0"> |
| 133 | + <img src="/images/gato/figure-5-explainer-3-at-100.png" /> |
| 134 | + <br /> |
| 135 | + Figure 5 from the paper at 100 |
| 136 | +</div> |
| 137 | + |
| 138 | +<div class="img-div" markdown="0"> |
| 139 | + <img src="/images/gato/experts-vs-gato-scores.png" /> |
| 140 | + <br /> |
| 141 | + GATO vs. Experts scoring |
| 142 | +</div> |
| 143 | + |
| 144 | + |
| 145 | +## Tokenization |
| 146 | + |
| 147 | + |
| 148 | +<div class="img-div" markdown="0"> |
| 149 | + <img src="/images/gato/text-tokens.png" /> |
| 150 | + <br /> |
| 151 | + Text tokenization |
| 152 | +</div> |
| 153 | + |
| 154 | + |
| 155 | +<div class="img-div" markdown="0"> |
| 156 | + <img src="/images/gato/image-tokens.png" /> |
| 157 | + <br /> |
| 158 | + Image tokenization |
| 159 | +</div> |
| 160 | + |
| 161 | + |
| 162 | + |
| 163 | +<div class="img-div" markdown="0"> |
| 164 | + <img src="/images/gato/text-plus-images.png" /> |
| 165 | + <br /> |
| 166 | + Text + Image tokenization |
| 167 | +</div> |
| 168 | + |
| 169 | + |
| 170 | + |
| 171 | +<div class="img-div" markdown="0"> |
| 172 | + <img src="/images/gato/image-captioning.png" /> |
| 173 | + <br /> |
| 174 | + Text + Image tokenization - image captioning |
| 175 | +</div> |
| 176 | + |
| 177 | + |
| 178 | +## Discrete values |
| 179 | + |
| 180 | +<div class="img-div" markdown="0"> |
| 181 | + <img src="/images/gato/text-images-discrete-inputs.png" /> |
| 182 | + <br /> |
| 183 | + Text + Image tokenization - image captioning |
| 184 | +</div> |
| 185 | + |
| 186 | +<div class="img-div" markdown="0"> |
| 187 | + <img src="/images/gato/discrete-actions.png" /> |
| 188 | + <br /> |
| 189 | + |
| 190 | +</div> |
| 191 | + |
| 192 | + |
| 193 | +<div class="img-div" markdown="0"> |
| 194 | + <img src="/images/gato/actions-embeddings.png" /> |
| 195 | + <br /> |
| 196 | + |
| 197 | +</div> |
| 198 | + |
| 199 | + |
| 200 | + |
| 201 | +<div class="img-div" markdown="0"> |
| 202 | + <img src="/images/gato/hadoken-sequence.png" /> |
| 203 | + <br /> |
| 204 | + |
| 205 | +</div> |
| 206 | + |
| 207 | + |
| 208 | + |
| 209 | + |
| 210 | +# Timesteps & episodes |
| 211 | + |
| 212 | +<div class="img-div" markdown="0"> |
| 213 | + <img src="/images/gato/atari-image-action.png" /> |
| 214 | + <br /> |
| 215 | + Image + controller |
| 216 | +</div> |
| 217 | + |
| 218 | + |
| 219 | +<div class="img-div" markdown="0"> |
| 220 | + <img src="/images/gato/image-action-timesteps.png" /> |
| 221 | + <br /> |
| 222 | + Image + controller |
| 223 | +</div> |
| 224 | + |
| 225 | + |
| 226 | + |
| 227 | +<div class="img-div" markdown="0"> |
| 228 | + <img src="/images/gato/.png" /> |
| 229 | + <br /> |
| 230 | + Image + controller vector sequence |
| 231 | +</div> |
| 232 | + |
| 233 | + |
| 234 | +## Continuous values |
| 235 | + |
| 236 | + |
| 237 | + |
| 238 | +## Native and non-native modalities |
| 239 | + |
| 240 | + |
| 241 | +[Translating ] |
| 242 | + |
| 243 | +<div class="img-div" markdown="0"> |
| 244 | + <img src="/images/gato/.png" /> |
| 245 | + <br /> |
| 246 | + Expert sequences |
| 247 | +</div> |
| 248 | + |
| 249 | + |
| 250 | + |
| 251 | + |
| 252 | +<div class="img-div" markdown="0"> |
| 253 | + <img src="/images/gato/.png" /> |
| 254 | + <br /> |
| 255 | + |
| 256 | +</div> |
0 commit comments