You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Dual PatchNorm](https://arxiv.org/abs/2302.01327), by Manoj Kumar, Mostafa Dehghani, Neil Houlsby.
61
+
-[Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design](https://arxiv.org/abs/2305.13035), by
62
+
Ibrahim Alabdulmohsin*, Xiaohua Zhai*, Alexander Kolesnikov, Lucas Beyer*.
63
+
- (partial) [Scaling Vision Transformers to 22 Billion Parameters](https://arxiv.org/abs/2302.05442), by
64
+
Mostafa Dehghani*, Josip Djolonga*, Basil Mustafa*, Piotr Padlewski*, Jonathan Heek*, *wow many middle authors*, Neil Houlsby*.
65
+
- (partial) [Finite Scalar Quantization: VQ-VAE Made Simple](https://arxiv.org/abs/2309.15505), by
66
+
Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen.
54
67
55
68
### Multimodal research
56
69
@@ -64,21 +77,28 @@ codebase:
64
77
-[Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343), by
65
78
Xiaohua Zhai*, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer*\
66
79
Resources: [colab and models](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb), code TODO.
80
+
-[A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision](https://arxiv.org/abs/2303.17376), by
81
+
Lucas Beyer*, Bo Wan*, Gagan Madan*, Filip Pavetic*, Andreas Steiner*, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai*.
82
+
-[Image Captioners Are Scalable Vision Learners Too](https://arxiv.org/abs/2306.07915), by
83
+
Michael Tschannen*, Manoj Kumar*, Andreas Steiner*, Xiaohua Zhai, Neil Houlsby, Lucas Beyer*.
84
+
-[Three Towers: Flexible Contrastive Learning with Pretrained Image Models](https://arxiv.org/abs/2305.16999), by Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou.
85
+
- (partial) [PaLI: A Jointly-Scaled Multilingual Language-Image Model](https://arxiv.org/abs/2209.06794), by Xi Chen, Xiao Wang, Soravit Changpinyo, *wow so many middle authors*, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut.
86
+
- (partial) [PaLI-3 Vision Language Models: Smaller, Faster, Stronger](https://arxiv.org/abs/2310.09199), by Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut.
67
87
68
-
### Knowledge distillation
88
+
### Training
69
89
70
90
-[Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237), by
71
91
Lucas Beyer*, Xiaohua Zhai*, Amélie Royer*, Larisa Markeeva*, Rohan Anil,
-[Tuning computer vision models with task rewards](https://arxiv.org/abs/2302.08242), by
99
+
André Susano Pinto*, Alexander Kolesnikov*, Yuge Shi, Lucas Beyer, Xiaohua Zhai.
100
+
- (partial) [VeLO: Training Versatile Learned Optimizers by Scaling Up](https://arxiv.org/abs/2211.09760) by
101
+
Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein.
82
102
83
103
### Misc
84
104
@@ -118,7 +138,7 @@ details, but generally speaking, running on a GPU machine involves calling
118
138
`python -m COMMAND` while running on TPUs, including multi-host, involves
In the past, we recommended writing checkpoints to a Google Cloud Bucket. With the latest update, this is very slow because of technical issues with the checkpointing format.
351
+
We are working on a solution, but in the meantime, we have updated our instructions to write checkpoints to a local folder on the TPU machine. Don't forget to copy useful checkpoints elsewhere after training.
352
+
328
353
## Run the train script on TPU VMs
329
354
330
355
To train your own big_vision models on a large dataset,
331
356
e.g. `imagenet2012` ([prepare the TFDS dataset](https://www.tensorflow.org/datasets/catalog/imagenet2012)),
0 commit comments