Hey, nice lecture. I'm studying from Foundations of Machine Learning from Mohri, Rostamizadeh and Talwalkar, but some parts can be a bit tough when you don't have a formal math background. Your lecture really helped with the intuitions behind the math, thanks for putting it online!
9:42 for Rm(H), what is the use of taking expectation over all samples? As we saw previously, like from 6:12, calculation of empirical Rademacher does not use the true label of samples, rather just the size of the sample.
Hi Prof. Nice lecture. One Q. In Rademacher extrema slide, How can one take h(x_i) out of the summation? Since h(x_i) is dependent on i, it should be within summation.
+Chandresh Kumar I think you are right, however the result is the same.. I think correct would be that the expectation goes inside since the expectation of the sum of RV's is the same as the sum of expected values of RV's. Correct expression would be \sum_{i}^{m} h(x_i) E[ \sigma_i ] = 0. Correct me if I'm wrong :)
@@mychannel8307 Oh, that's a good catch. You're right, I can't do that! But the logic holds: you have something multiplied by an expectation that's zero, so you get zero in the end. I got the "something" wrong, but thankfully anything times zero is zero.
19:30, I don't understand the equation under the bottom of this slide. Why is the left hand side equivalent to the right hand side after adding the expectation of sigma? Could you please add some explanation or recommend some supplementary materials?
Very nice explanation of a difficult topic. I have the following questions: @27.25 min - for VC dimension d, for the upper bound, the slide says "no set of (d+1)..." is it no set or every set? I ask this because for the [a,b] hypothesis class you can think of 3 points that are say + - - and you can put the interval around either the + or -,. Is the test perhaps an "exist " as in "there exist sets of size d+ 1 that cannot be shattered". On another is there a video of the PAC learning and the lecture about applying VC dimensions to SVM's ? A very nice job of explaining a difficult topic
+Rajiv Sambasivan +Rajiv Sambasivan This is the trickiest part of VC dimension. For lower bound, you must simply show existence of some set. For the upper bound, you must show that no set of size d+1 can be shattered. I.e., that every set of points (no matter what they are) can take on all possible labelings. So for your example, while +-- works, +-+ would not.
The best introduction to this topic I have seen so far. Very instructive and pedagogical, many thanks.
Hey, nice lecture. I'm studying from Foundations of Machine Learning from Mohri, Rostamizadeh and Talwalkar, but some parts can be a bit tough when you don't have a formal math background. Your lecture really helped with the intuitions behind the math, thanks for putting it online!
+Giovanni Sirio Carmantini Thanks!
OMG, this is the best explanation for Sharpie's notes.
+Zhenpeng Zhao Thanks! I did first learn it in Rob's class.
Crisp explanation, thank you for this.
9:42 for Rm(H), what is the use of taking expectation over all samples? As we saw previously, like from 6:12, calculation of empirical Rademacher does not use the true label of samples, rather just the size of the sample.
Hi Prof. Nice lecture. One Q. In Rademacher extrema slide, How can one take h(x_i) out of the summation? Since h(x_i) is dependent on i, it should be within summation.
+Chandresh Kumar I think you are right, however the result is the same.. I think correct would be that the expectation goes inside since the expectation of the sum of RV's is the same as the sum of expected values of RV's. Correct expression would be \sum_{i}^{m} h(x_i) E[ \sigma_i ] = 0. Correct me if I'm wrong :)
@@mychannel8307 Oh, that's a good catch. You're right, I can't do that!
But the logic holds: you have something multiplied by an expectation that's zero, so you get zero in the end. I got the "something" wrong, but thankfully anything times zero is zero.
Thank you, Jordan! This was great explanation of both complexity measures. =)
19:30, I don't understand the equation under the bottom of this slide. Why is the left hand side equivalent to the right hand side after adding the expectation of sigma? Could you please add some explanation or recommend some supplementary materials?
Notice as someone else noted, u cant pull the h(x_i) cuz it depends on the summand. The answer might be correct but the logic is not.
Yes, the sum should also have come out of the expectation.
Very nice explanation of a difficult topic. I have the following questions: @27.25 min - for VC dimension d, for the upper bound, the slide says "no set of (d+1)..." is it no set or every set? I ask this because for the [a,b] hypothesis class you can think of 3 points that are say + - - and you can put the interval around either the + or -,. Is the test perhaps an "exist " as in "there exist sets of size d+ 1 that cannot be shattered". On another is there a video of the PAC learning and the lecture about applying VC dimensions to SVM's ? A very nice job of explaining a difficult topic
+Rajiv Sambasivan +Rajiv Sambasivan This is the trickiest part of VC dimension. For lower bound, you must simply show existence of some set. For the upper bound, you must show that no set of size d+1 can be shattered. I.e., that every set of points (no matter what they are) can take on all possible labelings. So for your example, while +-- works, +-+ would not.
+Jordan Boyd-Graber Thanks!
Super clear!😀😀
thank you for this!