0:00 Introduction 5:54 Transformers: Prediction back in input space 11:12 Prediction in Latent Space 22:25 Stable Diffusion and Latent Space 29:17 Vision Transformer (ViT) 44:57 Swin Transformer 50:12 ViT’s positional encoding may not be good! 51:38 I-JEPA 1:09:26 Discussion on how to improve I-JEPA
0:00 Introduction
5:54 Transformers: Prediction back in input space
11:12 Prediction in Latent Space
22:25 Stable Diffusion and Latent Space
29:17 Vision Transformer (ViT)
44:57 Swin Transformer
50:12 ViT’s positional encoding may not be good!
51:38 I-JEPA
1:09:26 Discussion on how to improve I-JEPA