OpenAI CLIP Explained | Multi-modal ML

OpenAI's CLIP for Zero Shot Image Classification

Gemini 2 Multimodal and Spatial Awareness in Python

Правильный подход к детям

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Fast Zero Shot Object Detection with OpenAI CLIP

James Briggs

Переглядів 11 718

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 гру 2024

КОМЕНТАРІ • 33

@BradNeuberg Рік тому ⁺⁵
Since this video was released, it looks like the image rescaling assumptions of the CLIP model being used has changed. In the existing code in this videos notebook when the image is fed to the processor() function, it’s values have been scaled to 0-1. Unfortunately this breaks some newer CLIP assumptions. Everything will break for you, so you should add big_patches*255. before passing it into the processor() call for things to work correctly.
@drewholmes9946 11 місяців тому ⁺¹
@jamesbriggs Can you pin this and/or update the Pinecone article?
@rogerganga Рік тому ⁺²
Hey James! As someone with 0 coding experience in Computer Vision and new to OpenAI's clip, I found this video incredibly valuable. Thank you so much!
@jamesbriggs Рік тому ⁺¹
glad it was helpful!
@ceegee9064 2 роки тому ⁺¹
What an incredibly approachable breakdown of a very complicated topic- thank you!
@jamesbriggs 2 роки тому ⁺¹
Thanks!
@manumaminta6131 Рік тому ⁺³
Hi! Love the content. I was just wondering, since we are passing patches of images to the CLIP Visual Encoder (and each patch has a dimension), does that mean we have to resize the patch size so that it fits the input dimension of the CLIP visual encoder? :) Looking forward to your reply
@hariypotter8 Рік тому ⁺²
Using your code line for line I'm having trouble with this, no matter what prompt I use my output image looks the exact same in regards to localization and the dimming of patches based on score. It looks like I'm only seeing the most frequently visited patches rather than the highest CLIP score. Any ideas?
@ITAbbravo Рік тому ⁺¹
I might be a bit late to the party, but it seems that the major issue is that the variabile "runs" is initialized with torch.ones instead of torch.zeros. The localization is still not as good as the one in the video though...
@andrer.6127 Рік тому ⁺¹
I have been trying to figure out how to change it from one class and one instance to one class and many instances, but I can't seem to figure out how to do it. What should I do?
@AthonMillane Рік тому ⁺¹
Hi James, thanks for the fantastic tutorial. How do you think this would work for e.g. drawing bounding boxes around multiple books on a bookshelf. They are next to each other, and so the image patches will all probably correspond to "book" but which individual book is not clear. Would making the patches smaller improve things? Any ideas how to address this use case would be much appreciated. Cheers!
@henkhbit5748 2 роки тому
Realty amazing the advances in ai . Thanks for showing the hybrid approach for "object detection" using text👍
@jamesbriggs 2 роки тому
Glad you liked it, I'm always impressed with how quick things are moving in AI, it's fascinating
@andy111007 Рік тому
Hi James, how did you create the dataset. Did you need to do annotation of images? convert to yolo or coco format? before forming the dataset? love to hear more? . Thanks,
Ankush Singal
@stevecoxiscool Рік тому
What models and technology would one use to "scan" a directory of images and then text of what the model found in each image ?
@hridaymehta893 Рік тому
Thanks for your efforts James!
Amazing video!!
@jamesbriggs Рік тому
thanks a ton!
@Helkier Рік тому
Hello James, the colab link is not available anymore in your pinecone article
@hchautrung Рік тому
Might I kow the total runtime If we put in a production mode?
@khalilsabri7978 Рік тому
thanks for the video, amazing work !!!
@AIfitty-xs7qn Рік тому
Hello James! I have a use case for CLIP. I think. If it works. I am not a computer programmer and have never used colab, but I have a few months to learn if learning all that can be done in that amount of time. I also have about 30k-40k photos that I would like to tag every day in the summer - tagged either blue shirt or white shirt (sports). Every tutorial I have seen uses a data set that is located online. Can I direct clip to my local server to perform object detection? Do the photos need to be in any particular format for optimum results? Well. Let me back up. Can you direct me to a resource that will give me the background I need to be able to follow along with you in these videos? After that, I should be able to ask more relevant questions. Thank you for the videos!
@SinanAkkoyun Рік тому
Thank you! Can you get the vectors right out of CLIP without supplying a prompt? So, that you get embeddings for every patch and then can derive what is being detected?
@jamesbriggs Рік тому ⁺¹
You can get embeddings but they’re after the clip encoder stage, the image patches are what are fed into the model and aren’t very easily interpretable - it’s the clip encoding layers that encode ‘meaning’ into them
@lorenzoleongutierrez7927 Рік тому
Great tutorial ! 👏
@TheArkLade Рік тому
Does anyone know why [IndexError: list index out of range] appears when trying to detect more than 2 objects? For example: detect(["cat eye", "butterfly", "cat ear"], img, window=6, stride=1)
@abhishekchintagunta8731 2 роки тому
excellent explanation kudos james
@jamesbriggs 2 роки тому
glad it helped!
@Ahmad-H5 2 роки тому
Hello, thank you so much for creating this video it is quite easy to follow for a beginner like me☺. I was also wondering if clip can connect images to text instead text to images.
@jamesbriggs 2 роки тому
Yes 100% - after you process the images and text with CLIP it just outputs vectors, and with vector search it doesn't matter whether those were produced from text or images, see here:
ua-cam.com/video/fGwH2YoQkDM/v-deo.html
Hope that helps!
@shaheerzaman620 2 роки тому
Fascinating!
@jamesbriggs 2 роки тому
thanks as always Shaheer!
@papzgaming9412 Рік тому
Thanks
@andy111007 Рік тому ⁺¹
The code does not work for forming bounding box around object localization

Наступне

Автоматичне відтворення

OpenAI CLIP Explained | Multi-modal ML

OpenAI CLIP Explained | Multi-modal ML

OpenAI's CLIP for Zero Shot Image Classification

OpenAI's CLIP for Zero Shot Image Classification

Gemini 2 Multimodal and Spatial Awareness in Python

Gemini 2 Multimodal and Spatial Awareness in Python

Правильный подход к детям

Правильный подход к детям

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

Dominating an Online Game with Object Detection Using OpenCV - Template Matching.

Dominating an Online Game with Object Detection Using OpenCV - Template Matching.

RAG Retrieval Mechanism | LLM | Keyword, Semantic, Hybrid Search in detail

RAG Retrieval Mechanism | LLM | Keyword, Semantic, Hybrid Search in detail

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Object Detection introduction and an overview | Essentials of Object Detection

Object Detection introduction and an overview | Essentials of Object Detection

Ilya Sutskever (OpenAI Chief Scientist) - Building AGI, Alignment, Spies, Microsoft, & Enlightenment

Ilya Sutskever (OpenAI Chief Scientist) - Building AGI, Alignment, Spies, Microsoft, & Enlightenment

Fast intro to multi-modal ML with OpenAI's CLIP

Fast intro to multi-modal ML with OpenAI's CLIP

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

How Strong Is Tape?

How Strong Is Tape?