Introduction to BLIP: Making Text Generation Models Multimodal
Вставка
- Опубліковано 10 лют 2025
- ============================================================
Our live courses starting from January 2025 (taught by IITians and MIT PhDs): vizuara.ai/spit/
============================================================
In this video, you will learn about BLIP-2 (Bootstrapping Language-Image Pre-training).
In particular, we will learn about the following topics:
(1) How can we make text generation models multimodal?
(2) What is BLIP-2: Bootstrapping Language-Image Pre-training
(3) The BLIP-2 architecture
(4) How does BLIP-2 work?
(5) Running a BLIP-2 model in Python
(6) Advancements after BLIP-2
Links:
Original BLIP-2 paper: arxiv.org/pdf/...
Code file:
colab.research...
============================================================
(1) Hands on LLM playlist link (this video belongs to this playlist): • Hands on Large Languag...
(2) LLM from scratch playlist link: • Building LLMs from scr...
(3) Build Neural Networks from scratch playlist link: • Building Neural Networ...
(4) ML Teach by Doing playlist link:
• Machine Learning: Teac...
Register for our live course starting from January 2025: vizuara.ai/spit/
============================================================
✉️ Join our FREE Newsletter: www.vizuaranew...
Great lecture, your voice elements and screens are in sync. Thank you for sharing this stuff.
Hi @vizuara can you also please cover an overview concept of distillation and mixture of experts. I mean it will be good to understand how the layers differ in those compared to standard decoder only type gpt model. SO far I am loving all your videos. You have nailed the explaination so well compared to any videos available on youtube. thank you so much for sharing the knowledge.
Wow another amazing lecture
Can u pls do a video deepseek from scratch