Language Vision Model (LVM)

Classic Paper List

  • AlexNet

  • ResNet

  • Transformer

  • BERT

    • It is used as a pre-trained model on large language dataset
    • It uses WordPiece for tokens
    • Unlike for machine translation (using single direction information), pre-trained model could use Bidirectional information.
  • ViT

  • MAE

  • Swin Transformer

  • CLIP

  • GPT

  • DALL-E

  • ViLT

    • It combines ideas from BERT(language features) and ViT(visual features)
This blog is converted from language-vision-model.ipynb
Written on April 1, 2023