OpenAI CLIP - Connecting Text and Images | Paper Explained

Aleksa Gordić - The AI Epiphany

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 тра 2024
❤️ Become The AI Epiphany Patreon ❤️ ► / theaiepiphany
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
In this video, I cover the CLIP paper - Learning Transferable Visual Models from Natural Language Supervision.
You'll learn about:
✔️ How the contrastive learning behind CLIP works
✔️ All the nitty-gritty details behind the paper
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Bitter lessons by Sutton: www.incompleteideas.net/IncIde...
✅ CLIP paper: cdn.openai.com/papers/Learnin...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 OpenAI's CLIP
02:10 Detailed explanation of the method
06:00 Comparision with SimCLR
12:55 How does the zero-shot part work
20:45 WIT dataset
21:30 Why this method, hint efficiency
28:35 Zero-shot - generalizing to new tasks
31:30 Prompt programming and ensembling
34:00 Zero-shot perf
36:20 Few-shot comparison with best baselines
38:20 How good the zero-shot classifier is?
40:45 Compute error correlation
41:20 Quality of CLIP's embedding space
43:05 Robustness to distribution shift
49:10 Limitations (MNIST failure)
50:30 A short recap
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💰 BECOME A PATREON OF THE AI EPIPHANY ❤️
If these videos, GitHub projects, and blogs help you,
consider helping me out by supporting me on Patreon!
The AI Epiphany ► / theaiepiphany
One-time donation:
www.paypal.com/paypalme/theai...
Much love! ❤️
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
💡 The AI Epiphany is a channel dedicated to simplifying the field of AI using creative visualizations and in general, a stronger focus on geometrical and visual intuition, rather than the algebraic and numerical "intuition".
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
👋 CONNECT WITH ME ON SOCIAL
LinkedIn ► / aleksagordic
Twitter ► / gordic_aleksa
Instagram ► / aiepiphany
Facebook ► / aiepiphany
👨‍👩‍👧‍👦 JOIN OUR DISCORD COMMUNITY:
Discord ► / discord
📢 SUBSCRIBE TO MY MONTHLY AI NEWSLETTER:
Substack ► aiepiphany.substack.com/
💻 FOLLOW ME ON GITHUB FOR COOL PROJECTS:
GitHub ► github.com/gordicaleksa
📚 FOLLOW ME ON MEDIUM:
Medium ► / gordicaleksa
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#clip #openai #nlpsupervision

КОМЕНТАРІ • 24

@TheAIEpiphany 3 роки тому ⁺³
I love that OpenAI is pushing towards these more general methods in computer vision as well.
Unsupervised learning is about to become super mainstream in CV.
@heejuneAhn Місяць тому
in fact, they call it Natural language "supervised" learning. (section 2.1)
@heejuneAhn Місяць тому
Yes, it is a kind of parameterized classification though embedding vector from text input as the last Fully connected layers in the classification network of image classification network.
@DistortedV12 3 роки тому ⁺³
One of the favorite projects I’ve seen in a long time thanks for covering it.
@TheAIEpiphany 3 роки тому
It's awesome I agree!
@christophermarais5253 3 роки тому ⁺³
Geez man do you ever take a holiday lol. I appreciate all your videos.
@TheAIEpiphany 3 роки тому
I do! As a matter of fact now! 😂 But since it's this fantastic pandemic+winter combo I'm making the most out of it.
Thanks!!
@idoroth 6 місяців тому
very well presented - greatly helped me improve my understanding. Thank you very much.
@samirelzein1095 11 місяців тому
Great job! thanks!
@leorabetesh3800 2 роки тому
Thank you for also giving the background on ConVIRT
@maulberto3 Рік тому
Hi. Much about the value of a NN depends on the signal we give to it. In this example, as you said and very clever by the authors, if we treat it as classification task, we are giving the net a signal that some text belong to some image -not other images-, do that across millions of examples and create/learn intelligence.
@sasucarefree4694 Місяць тому
How does the fine-tuning actually work here? They don't talk about it in the paper, right?
@vivekpadman5248 Рік тому
what is the app you use to annotate these papers btw?
@er-wl9sy 3 роки тому ⁺¹
Thanks for the video. I was wondering if you would also focus on the visual slam/ visual odometry as well. Thanks
@TheAIEpiphany 3 роки тому
Hey! You mean like the more classical CV algos or you have something more concrete in mind?
@er-wl9sy 3 роки тому ⁺¹
@@TheAIEpiphany I meant deep slam such as superpoints or super glue
@abudawood_phd 3 роки тому ⁺¹
Yes I agree, hope you have some time to review some vSLAM that uses DL especially features extraction and depth detection in monocular cams
@dbzkidkev2 3 роки тому ⁺¹
@@abudawood_phd Yes would be really interesting! SLAM is a whole other beast
@chandlertimm8243 Рік тому
What is actually the main contribution of the CLIP paper? Data augmentation, zero shot, and using large datasets?
@oscarllerena2980 5 місяців тому
Thanks for this nice explanation. Let me please ask the following question: @ 05:40 you mentioned that CLIP paper was heavily inspired in the ConVIRT paper but the ConVIRT approach is only mentioned 2 times and it does not appear in the references. Is it that they intentionally did not referenced it?
@oscarllerena2980 5 місяців тому
Apologies. I found the reference in the Introduction. My CTRL+F could not find the Con-VIRT splited because of the paragraph length.
@razvanrotaru2285 2 роки тому ⁺¹
i love you
@wzyjoseph7317 11 місяців тому
Aleksa really looks like the khan in GOT series XD
@wzyjoseph7317 11 місяців тому
And this Khan is reaaaaaly smart!!

Наступне

Автоматичне відтворення

OpenAI CLIP: ConnectingText and Images (Paper Explained)