Lecture 6 "Perceptron Convergence Proof" -Cornell CS4780 SP17

Kilian Weinberger

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 лип 2018
Lecture Notes: www.cs.cornell.edu/courses/cs4...

КОМЕНТАРІ • 75

@sebastianb6982 4 роки тому ⁺⁴⁸
These lectures are gold! Thank you so much for putting them online! :-)
@rodas4yt137 4 роки тому ⁺¹¹
Love that you ask to raise hands "if you understand". It really shows the will of teaching's there.
@marijastojchevska9193 4 роки тому ⁺⁸
Very comprehensive and clear! Thanks for sharing this video with us.
@user-nm7mf7uu3j 5 років тому ⁺⁴
Great lecture! Now i can understand it. Great thanks from South Korea
@samarthmeghani2214 2 роки тому
Thanks a lot sir for making this video. I loved the way you explain each and every step of proof in an easy way. Again Thank you @Kilian Weinberger Sir.
@hypermeero4782 Рік тому
Professor Kilian, I really wish I get to meet you someday, I can't express how much I appreciate you and value those lectures
@mikkliish Рік тому
this is the best expanation of th e convergence proof on youtube at the moment
@jamespearson5470 10 місяців тому
Great lecture! Thank you Dr. Weinberger!
@j.adrianriosa.4163 3 роки тому ⁺¹
Amazing explanation!!
@arddyk 2 роки тому
It was amazing professor. really helpfull
@junjang7020 10 місяців тому
currently taking 4780, and I still come home and watch your videos!
@shuaige2712 3 роки тому ⁺¹
great lectures. great teacher
@sankalpbhamare3759 2 роки тому
Amazing lectures!!
@blackstratdotin 3 роки тому
brilliant lecture indeed!
@dantemlima 5 років тому ⁺¹⁶
Great Teacher! I've never heard of any of this a week a go and I'm able to keep up at each step. Danke schön prof. Weinberger. Is it possible to make the placement exam available? Thank you.
@omalve9454 Рік тому
This proof is beautiful!
@tonightifeellikekafka 4 роки тому ⁺⁹
I'm really hoping you still view the comments on these videos.
Is there any way to know what the programming projects involved? The assignments and lecture notes are obviously incredibly useful, but I don't feel confident without doing any coding. Would appreciate it so much if the programming projects, at least the description of them, were made available
@nipo4144 3 роки тому
is it possible to detect divergence of algorithm during learning, i. e. the case when data is not linearly separable? Can we infer gamma from data to check wheter we exceeded 1/gamma^2 updates?
@kirtanpatel797 4 роки тому ⁺²
Does anyone know which 5 inequalities professor is talking about ?
@rezasadeghi4475 3 роки тому ⁺¹
I'm really enjoying your lectures professor. Is there anyway I can access the projects?
@hrushikeshvaidya1786 3 роки тому
At 8:38, why do we rescale w star? Can we not just leave it with a norm of 1?
@connorfrankston5548 Рік тому
The intuition (for me) is that wTw* and wTw both grow (at least and at most) linearly in the number of updates M, but wTw* is linear in w while wTw is quadratic in w.
@tudormanoleasa9439 Рік тому
32:39 what are the other 4 inequalities that everyone should know?
@lkellermann 4 роки тому
I am watching these lectures and wondering if it will be any moment on data science journey where the matrices will be self adjoints
@bumplin9220 2 роки тому
Thank you sir
@vatsan16 4 роки тому ⁺⁷
Where is the valentines poem of the prooff??!?!?!
@hdang1997 4 роки тому ⁺¹
17:54 "The HOLY GRAIL weight vector that we know actually separates the data"
@yangyiming1985 3 роки тому ⁺²
love the last story haha
@Alien-cr1zb Рік тому
did anyone manage to find the projects or anything related to this class
@omkarchavan2259 3 роки тому
why didnt you wrote second constraint as wT(w+yx) instead of (w+yx)T(w+yx)? im confused
@ghoumrassi 4 роки тому ⁺²
I understand up to the point that w^tw* increases by at least gamma and w^tw increases by at most 1 but I do not understand how this proves that w necessarily converges to w*, could someone help me out please?
@shivammaheshwary2570 4 роки тому ⁺³
I think he means that if the second condition is true then the only way the inner product of w and w* increases is if they align themselves better than before( cos theta increases), so w is indeed moving towards w*.
@shashihnt 3 роки тому ⁺¹
Really enjoy watching your lecture, thank you very much. Do you plan to put this course (along with projects) on Coursera or any online platform.
@kilianweinberger698 3 роки тому ⁺¹
Cornell offers an online version through their eCornell program. ecornell.cornell.edu/
@shashihnt 3 роки тому
@@kilianweinberger698 Thank you very much. I will have a look.
@hrt7180828 2 роки тому
if our data is sparse, after scaling it to a circle with a radius of 1, won't it shift to dense data distributions and cause problems When we scale our data?
@consumidorbrasileiro222 4 роки тому
there's one thing I couldn't get: why is gamma defined from the "best" hyperplane? if M is bounded by 1/gamma² and gamma could be arbitrarily close to zero (if you pick the worst possible hyperplane for instance), then the proof is spoiled.
@consumidorbrasileiro222 4 роки тому
oh okay I get it. finding other bigger bounds for M says nothing about the lowest bound you found.
@jiviteshsharma1021 4 роки тому ⁺²
how is y^2xTx smaller than one when we know that y^2 is equal to 1 , so if the xTx term is less than one but positive the whole term becomes greater than 1 and the inequality that the expression is greater than
@consumidorbrasileiro222 4 роки тому ⁺¹
y^2 = 1
xTx y^2xTx
@jiviteshsharma1021 4 роки тому ⁺²
@@consumidorbrasileiro222 Oh yea I misse the fact that were raising the whole term to a power and if xTx is less than 1 the whole term will be
@HhhHhh-et5yk 3 роки тому ⁺¹
Professor or anyone please tell why we need to consider the effect of update on
w transpose w star ,
w transpose w.
Please reply!
@XoOnannoOoX 3 роки тому ⁺²
wTw* increasing means that they are getting more similar. But there is another case, in which w is just scaled. by showing that wTw is not increasing, we can show that w is not being scaled, so wTw* must be getting more similar
@HhhHhh-et5yk 3 роки тому
@@XoOnannoOoX whoah! Thank u so much 💯
@abs4413 Рік тому
Hi Professor. At 32:04, you write that w.T dot w_star = abs( w.T dot w_star ). How does it follow that the dot product of those two vectors is necessarily positive? My intuition says that the first update of w will point w in the direction of w_star making the dot product positive. It makes sense, but it does not seem a trivial statement to me. w.T and w_star could be pointing in opposite directions and thus yield a negative dot product. What am I missing? :) Thanks.
@abs4413 Рік тому
I have somehow figured out the answer, minutes after posting this question. w starts as the zero vector and w.T dot w_star can only increase after each iteration, by at least gamma. Thus, making w.T dot w_star positive and making the following statement true: w.T dot w_star = abs( w.T dot w_star )
@jumpingcat212 4 місяці тому
Hi Professor Weinberger, it looks like this lecture is about a smart algorithm created by smart people which can classify datas into two classes. But in the first introduction lecture you mentioned that machine learning is about computer learning to design a program by itself to achieve our goal. So I' confused what's the relationship between this perceptron hyperplane algorithm with machine learning? It looks like we human just design this algorithm and code it into a program and feed it into a computer to solve the classification problem...
@kilianweinberger698 2 місяці тому
So the Perceptron algorithm is the learning algorithm which is designed by humans. However, given a data set, this learning algorithm generates a classifier and you can view this classifier as a program that is learned from data. The program code is stored inside the weights of the hyperplane. You could put all that in automatically generated C code if you want to and compile it. Hope this helps.
@AhmadMSOthman 5 років тому ⁺¹
In 27:05 you write:
2y(w^T • x) < 0
Why is it not
@kilianweinberger698 5 років тому ⁺²
Oh yes, good catch,
@dariannwankwo2718 5 років тому ⁺²
2y(w^T * x) < 0 ==> that the data point was classified incorrectly. Even if it was
@KulvinderSingh-pm7cr 5 років тому ⁺⁸
What are other 4 inequalities in computer science ??
@StarzzLAB 4 роки тому
Wondering the same thing
@MrWhoisready 3 роки тому ⁺¹
I can't understand something:
M is positive, Gamma is positive then M times Gamma is positive.
After 1 update, M = 1.
(w^T)*(w^*) can be negative, since (w^T) might have started pointing the opposite direction of (w^*)
Then how can (w^T)*(w^*), a negative number be greater than a positive number of 1 times gamma (M*gamma)?
@tama8092 2 роки тому
Here, w is initialized as 0 vector, so, (w^T)*(w^*) would be 0. Thus, after first update, it will be at least gamma. And like this (atleast) gamma keeps getting added at each update, thus making it a positive value.
Refer, ua-cam.com/video/vAOI9kTDVoo/v-deo.html, to see how it converges when w is initialized randomly.
@architsrivastava8196 3 роки тому
Why is the minimum distance between the point and the hyperplane = inner product between the point and w*?
@kilianweinberger698 3 роки тому ⁺¹
Because (w^*)T(w^*)=1. (For details check out the detailed proof here: www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote03.html )
@architsrivastava8196 3 роки тому
@@kilianweinberger698 Thank you Professor!! You're a legend!!!
@vincentxu1964 5 років тому
I have a question about convergence. From my understanding, since there are different satisfiable margin, there would be a set of w*. So w* is a random variable, which means, if there exists a set of hyperplane, the algorithm will converge w to a random variable w*, but not a fixed w*. Not sure if I understand correctly.
@gregmakov2680 2 роки тому
hahahaha, see! sitting in the class is not always the optimal choice :D:D:D:D thay dan dat my dan nhoi so mot hoi deo ai hieu gi het :D:D:D bang chung hung hon, chinh thay nhan thay luon nha, deo phai tui pd noi xau thay nha :D:D:D
@DommageCollateral 10 місяців тому
wäre cool wenn du auch einen deutschen kurs hättest
@TrackLab 5 років тому ⁺¹
0:40 Hang on, Mr. Weinberger are you german? You`re last name might gives a hint but you never know. But that German was perfect! Not only by the choosen words, also by the way you pronounced them all was absolutley perfect German.
@kilianweinberger698 5 років тому ⁺⁵
Ja, ich bin in Bayern aufgewachsen. :-)
@gregmakov2680 2 роки тому
why dont we have Nobel prices for Computer Science??? This algorithm is worth of 10 Nobel prices, indeed.
@roronoa_d_law1075 4 роки тому
plot twist
gamma = 0
M
@yrosenstein 5 років тому
you defined the margin wrongly.
xt•w is the projection of the vector xt on w.
the distance is ||x-w||
@vincentxu1964 5 років тому
I think what he said was that margin is the minimum distance of x to hyper plane. Since the direction of hyper plane is w, so xTw is projection of x on w, which is the distance from x to hyperplane.
@yrosenstein 5 років тому
@@vincentxu1964 only assuming that ||w||=1
@vincentxu1964 5 років тому
@@yrosenstein Yeah I think so. You can take a look at lecture 14. I think he redefines the margin with any w.
@ivanvignolles6665 5 років тому
he said the distance to the hyperplane defined by w, not the distance to w itself, now the distance of x to the hyperplane is equal to the projection of x on w
@sudhanshuvashisht8960 4 роки тому
No, Prof is correct. The margin is defined correctly as well (i.e. the distance of shortest point from hyper plane). Read the proof here: www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote09.html#:~:targetText=Margin,closest%20point%20across%20both%20classes.
@abunapha 5 років тому ⁺¹
Starts at 0:53
@mrobjectoriented 4 роки тому
At 43:02, that face on the blackboard

Наступне

Автоматичне відтворення

Lecture 7 "Estimating Probabilities from Data: Maximum Likelihood Estimation" -Cornell CS4780 SP17