10L - Self-supervised learning in computer vision

Alfredo Canziani (冷在)

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 січ 2025

КОМЕНТАРІ • 63

@khushpatelmd 3 роки тому ⁺¹⁷
One of the best lectures on SSL ever. Thank you, Alfredo and Ishan for making this available for everyone.
@alfcnz 3 роки тому ⁺²
🥳🥳🥳
@AIwithAniket 3 роки тому ⁺²
awesome lecture covering all different method of unsupervised learning! Thank you for making these video public.
@alfcnz 3 роки тому ⁺¹
💪🏻💪🏻💪🏻
@dpetrini_ 2 роки тому ⁺¹
I am sure students from all over the world thanks a Lot.
@alfcnz Рік тому
🤗🤗🤗
@AdityaSanjivKanadeees 3 роки тому
I have a question regarding the Barlow twins. Q1: For a batch of B samples, the output of the projector networks will be BxD. We have two such projections A and B. We know that rank(AxB)
@jetnew_sg 3 роки тому ⁺¹
Thanks for this! Can't wait to see how the best of all worlds can be combined for SSL!
@alfcnz 3 роки тому
🔥🔥🔥
@buoyrina9669 2 роки тому
Thanks, Ishan. This is excellent.
@alfcnz 2 роки тому
🥳🥳🥳
@QuanNguyen-oq6lm 3 роки тому
This is actually my research is focusing on, hopefully I can finish it on time to apply for phD at NYU and join with you Alfredo.
@alfcnz 3 роки тому
😍😍😍
@IdiotDeveloper 3 роки тому
A really informative lecture on self-supervised learning.
@SY-me5rk 3 роки тому
Excellent presentation. Thanks
@aristoi 3 роки тому
Thanks for this. Really terrific content.
@alfcnz 3 роки тому
🥳🥳🥳
@ashishgor2163 2 роки тому
Thank you so much sir
@alfcnz 2 роки тому
You're welcome 🤗🤗🤗
@sami9323 Рік тому
This is excellent, thank you!
@alfcnz Рік тому
You're very welcome! 😇😇😇
@saeednuman 3 роки тому
Very informative as usual. Thank you @Alfredo
@alfcnz 3 роки тому
🤓🤓🤓
@NS-te8jx 3 роки тому ⁺¹
every 2 minutes he referes work of others and his as a reference. I was shocked and overwhelmed by the number of papers refereed in this just 1 lecture. haha.
@alfcnz 3 роки тому
😅😅😅
@NS-te8jx 3 роки тому
@@alfcnz I enjoyed this session, very good content. thanks for organizing it.
@filippograzioli3641 3 роки тому
Beautiful lecture! Thanks :)
@alfcnz 3 роки тому
Prego 😇😇😇
@prakharthapak4229 3 роки тому ⁺¹
Basically an informative video :-)
@alfcnz 3 роки тому
🤓🤓🤓
@vadimborisov4824 3 роки тому
basically agree :)
@alfcnz 3 роки тому
🤓🤓🤓
@bisheshworneupane7996 3 роки тому
Great Video
@alfcnz 3 роки тому
😎😎😎
@НиколайНовичков-е1э 3 роки тому ⁺¹
Thank you, Alfredo :) It will be very difficult for me to read all the materials indicated in the video for a week. )
@alfcnz 3 роки тому ⁺¹
I'm aiming now at two videos per week.
Haha, sorry 😅😅😅
@НиколайНовичков-е1э 3 роки тому
So it will be very, very, very difficult for me to read everything, but I will try. Thank you for videos, Alfredo :)
@harrypotter1155 3 роки тому
Hi, are you planning on to add subtitles or enable the automatic caption?
@alfcnz 3 роки тому ⁺¹
Automatic captions should be enabled by default. I'll check this later if and why this is not working. Thanks for the feedback. 🙏🏻🙏🏻🙏🏻
@alfcnz 3 роки тому
I'm in touch with UA-cam support team. They have identified the issue and are currently working on it. I'll let you know when there is any update.
Thank you for your patience. 😇😇😇
@harrypotter1155 3 роки тому
@@alfcnz THANK YOU VERY MUCH!! I really appreciate the length that you're going through just to make sure the auto caption is on 😭 Once again, thank you very much!
@alfcnz 3 роки тому
😇😇😇
@alfcnz 3 роки тому
They replied and… I'm losing my patience. UA-cam support is not cooperating. I'm escalating this soon.
I'm not sure what part of “feed the audio stream to your text-to-speech model” is hard to comprehend.
@tchlux 3 роки тому ⁺⁵
People really need to stop using linear classifiers to gauge the “correctness” of representations learned at different layers!!
Use something like a Silhouette score, or anything that measures *local* consistency of the representation (could also use a k-fold Delaunay interpolant approximation if you’re attached to things being locally linear).
Neural networks (ReLU) are capturing linearly seperable subsets of data at each layer, which means even the layer two before the output could have a highly nonlinear representation of data that is easily transformed with the right set of selections. You won’t succumb to this problem if you just measure local continuity of a representation with respect to your target output instead of using a global linear approximation.
@imisra_ 3 роки тому ⁺⁴
Thanks for the comment! I agree that evaluating representations with linear classifiers is not sufficient. Like you suggest, there are many different ways to evaluate them, and each of them tests different aspects of representations. Depending on the comparison/final application, the methodology for evaluating them will change.
@khushpatelmd 3 роки тому
How to implement it? Do you have any use case? I understand the rationale but don't understand how to use something like Silhouette score over here.
@tchlux 3 роки тому ⁺²
@@khushpatelmd great questions. A simplified example: consider a binary classification problem where the model outputs a single number (the truth is either 0 or 1). Suppose we want to evaluate the amount of information captured by an embedding relative to this downstream prediction task.
Option 1: We could measure the mean squared error of a best-fit linear function over the embedded data. In effect, this measures how "linearly separable" our embedded data is for this classification problem.
Option 2: You compute the average distance to the nearest point of a different class (for all points) minus the average distance to the nearest point of the same class. (similar to the concept behind Silhouette scores, which answer the question "how near is each point to its own cluster relative to other clusters?")
Now imagine that the embedding has data placed perfectly in a separated "three stripe" pattern, where the left stripe is all 0's, the middle stripe is all 1's, and the right stripe is all 0's. The pure linear evaluation (option 1) will tell us that the embedding yields about ~66% accuracy (not so good). However, a nearness approach (option 2) would tell us that the embedding is very good and yields all nearest points in the same class (distance to other class - distance to same class >> 0). Realistically option 2 is correct here, because there is a very simple 2-hidden-node MLP that can *perfectly* capture the binary classification problem given this embedding.
I realize that some people might say, "well option 2 is irrelevant if you always know you're going to use a linear last layer." But that's against the point. In general we are trying to evaluate how representative the newly learned geometry is for downstream tasks. Restricting ourselves to only linearly-good geometries for evaluation is unnecessary and can be misleading. In the end most people care how difficult it would be to take an embedding and train a new accurate model given the embedding. I assume few people will arbitrarily restrict themselves to linear models in practice.
@khushpatelmd 3 роки тому
@@tchlux Thanks a lot Thomas. This is so clearly explained by you.
@XX-vu5jo 2 роки тому
Wait for our CVPR paper that will solve the memory problem. We hope it will be accepted.
@alfcnz 2 роки тому
🤞🏻🤞🏻🤞🏻
@hoseinhashemi3680 3 роки тому
I was wondering about sth. In contrastive learning, if one uses a self-attention transformer encoder within the batch dimension, before feeding the representations to the contrastive loss, will it ruin the objective of contrastive learning? I am saying this since the transformer encoder over the batch will basically reweight the representation of each sample with respect to the dot-product similarity between each other.
Thank you for the wonderful introduction btw.
@alfcnz 3 роки тому
Why would you want to use a transformer “within the batch dimension” (whatever this means)?
Can you clarify what you're trying to do? 🤔🤔🤔
@hoseinhashemi3680 3 роки тому
@@alfcnz I sent an email to you. tnx
@alfcnz 3 роки тому
I don't have the bandwidth to reply to emails, I'm sorry. I haven't checked them in a few months by now, I think.
@charchitsharma8902 3 роки тому
Thanks Alf :)
@alfcnz 3 роки тому ⁺¹
You're welcome 😺😺😺
@hedu5303 3 роки тому
awesom
@alfcnz 3 роки тому
😻😻😻
@ChuanChihChou 3 роки тому
So I was watching the "Scaling machine learning on graphs" @Scale talk the other day, for which they used the contrastive method w/ massive parallelism and negative sampling to prevent the trivial solution collapse:
fb.watch/v/1pqXNP5au/
After this lecture now I wonder if we can use the other options (clustering, distillation, and redundancy reduction) in the arsenal instead. Has anyone at Facebook tried those for graph embedding training yet?
@alfcnz 3 роки тому ⁺¹
Yup, we can indeed use the other techniques, where the positive pairs are defined by the adjacency matrix (connectivity defined by the graph). For the question about whether FB has tried these, I'll let Adam reply. (Let me ping him.)
@alerer1 3 роки тому ⁺¹
Hi Chuan-Chih, thanks for watching my talk! I don't know of anyone at Facebook who has applied these unsupervised methods to the problem of learning node features for graphs. The graph embeddings problem is a little different than learning unsupervised image features so I don't immediately see how these methods would apply, but I wouldn't be surprised if there was a way!
In the type of unsupervised learning described in this talk, you are learning a function f that converts a high-dimensional feature vector x_i into low-dimensional semantic feature z_i. In the graph embedding setting, the nodes don't have input features - you *learn the input features* in order to approximate the adjacency matrix. There are probably ways to apply these methods if you think of the one-hot edge list (aka each node's row of the adjacency matrix) as the features, but I haven't thought about it.
Maybe a better place to start is the graph neural network setting where nodes *do* have input features and you're learning a function f that combines the features over the graph neighborhood to predict some supervised labels. I haven't seen any work on unsupervised graph neural networks, but there probably is some and some of these same approaches may work well!
@dexlee7277 3 роки тому
that bear, he knows everything.
@alfcnz 3 роки тому ⁺¹
Indeed he does. He's been present to all my lessons! 🐻🐻🐻

Наступне

Автоматичне відтворення

#55 Dr. ISHAN MISRA - Self-Supervised Vision Models