Language Vision Model (LVM)
Classic Paper List ¶
-
ResNet
-
Transformer
-
- It is used as a pre-trained model on large language dataset
- It uses WordPiece for tokens
- Unlike for machine translation (using single direction information), pre-trained model could use Bidirectional information.
-
ViT
-
MAE
-
Swin Transformer
-
CLIP
-
GPT
-
DALL-E
-
- It combines ideas from BERT(language features) and ViT(visual features)
This blog is converted from language-vision-model.ipynb
Written on April 1, 2023