ViT Transformers and CLIP Part 1

The best reference for studying ViT Transformers and CLIP is the free textbook: Foundations of Computer Vision - Chapter 26.

Please note that there is an overlap between this video and the previous youtube videos on Transformer attention mechanisms. Having said that this video introduces some more details.