thank you Dr Esarey, this is the most clear demo of some of the things I've used for a while and just realized that my understanding was not completely thorough.
Thank You Dr Esarey for the great lesson! May I know if we can implement a multivariate kernel NW estimator code from scratch?(by using the formula @ 2:30:33)?
It can be done, but the "curse of dimensionality" kicks in quickly. Generally speaking, you modify the code to calculate kernel weights according to distances in multiple dimensions, not just one.
I’m trying to understand how the values of the estimated PDF at the tail ends were predicted, where there aren't many data points nearby. At about 37:25 you create a bandwidth of length 1 to estimate x1. There are 2 data points in this bandwidth. But, how do you create an estimate at x = -2 (at the far left)? There are no data points within the bandwidth of length 1. Would you have to expand the bandwidth to ~1.8?
Erin Hunt I believe the answer to your question is that the graphic (which is taken from this book: sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/spm/spmhtmlframe52.html) does not actually use a finite limit on its bandwidth, or uses a wider one. My discussion there was meant as an illustrative example of how such a bandwidth could be applied, not a comment on how that graphic was generated. Indeed, the code to generate that kernel density used to be at the link above (sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/spm/spmhtmlframe52.html), but it seems to be missing now.
Since there is said to be a tradeoff when choosing h and the density function, could we not use a validation dataset to tune those better? I am just borrowing an idea from machine learning, where you partition your data into a holdout set (or several) for training (and cross validation), and validation, and testing. The testing set is for error estimation only at the end, and is not used during model development at all.
thanks alot Dr Esarey..Very clear presentation. Could you please do a video on Gaussian Markov Random Fields if possible..Would love to learn it from you.
Isn't the bandwidth (i.e. h) in the multivariate kernel case better approximated by a circle rather than a square in the 3-dimensional space? (1:04:38) You, sir, have just got a new subscriber by the way!
+GUEYE Nono Ghislain That's an interesting question! I think that, in principle, you could implement it either way. Certainly the normal kernel function I used in that part of the video has circular level curves, and so in that sense you are completely correct that the kernel weighting declines relative to the distance from the central point in a way that results in circular curves of equal weighting. But your question really bears on whether you "cut off" the weighting, or set weights of zero at some critical distance from the target point. That cutoff could be a (hyper-)square, or a (hyper-)circle. The way I set it up, it's a square. But you *could* do it either way (e.g., by setting h as being a fixed euclidean distance from the central point, thereby making h_{i} and h_{j} in two dimensions i and j variable depending on the particular angle of a data point relative to the point at which the function is being estimated, so that h = sqrt(h_i^2 + h_j^2) at any point x(i,j). Hope that helps!.
Hello teacher , thank you so much for the explanation . could i ask you a few thing for my homework ? can we use the nonparametric kernel regression to forecast a data ? for example i have data from (1 variable) from january 2000- january 2016 . i want to forecast using kernel regression until january 2017 ? can i do that ? if yes? do you have some references where i could learn it ? Thank you so much
Unfortunately, I'm kind of committed to teaching in R for the foreseeable future. There *are* packages in Python for most of the stuff I'm teaching, but I don't know much about them!
Regarding the code at 1:07:10, calling ggplot2 resets the random number generator, so the results won't be replicable. You can fix this just be resetting the seed after calling ggplot2. (See the discussion on this stackoverflow exchange: stackoverflow.com/questions/15261619/sample-gives-different-values-with-same-set-seed.)
I am going through your entire ML playlist and I must say that it's been an incredible journey so far. Thank you so much for sharing your knowledge.!!
Excellent lecture - this explains very nicely why we need non-parametric models in certain situations. Thanks for recording and sharing this.
thank you Dr Esarey, this is the most clear demo of some of the things I've used for a while and just realized that my understanding was not completely thorough.
This is way much better than what I learnt in the class!! thank you!!!
Thank you for a thorough, long, and very insightful video. This is very useful to someone just starting out!!
You're welcome!
Justin Esarey Thank you so much!
Prof You're an amazing professor! I would totally love to go to your class!
Professor at 2:45:00 I think you make a typo at GCV. You might have wanted to write (1-h)^(-1/2) instead of (1-h)^(-2)
Thank You Dr Esarey for the great lesson! May I know if we can implement a multivariate kernel NW estimator code from scratch?(by using the formula @ 2:30:33)?
for the multivariate kernel regression (NW estimator), how do we modify the single variate code?
It can be done, but the "curse of dimensionality" kicks in quickly. Generally speaking, you modify the code to calculate kernel weights according to distances in multiple dimensions, not just one.
thanks you Dr. Esarey! your presentation is awesome!! very clear and helpful!!!
I’m trying to understand how the values of the estimated PDF at the tail ends were predicted, where there aren't many data points nearby. At about 37:25 you create a bandwidth of length 1 to estimate x1. There are 2 data points in this bandwidth. But, how do you create an estimate at x = -2 (at the far left)? There are no data points within the bandwidth of length 1. Would you have to expand the bandwidth to ~1.8?
Erin Hunt I believe the answer to your question is that the graphic (which is taken from this book: sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/spm/spmhtmlframe52.html) does not actually use a finite limit on its bandwidth, or uses a wider one. My discussion there was meant as an illustrative example of how such a bandwidth could be applied, not a comment on how that graphic was generated. Indeed, the code to generate that kernel density used to be at the link above (sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/spm/spmhtmlframe52.html), but it seems to be missing now.
Justin Esarey Thanks for the link and speedy reply!
Since there is said to be a tradeoff when choosing h and the density function, could we not use a validation dataset to tune those better? I am just borrowing an idea from machine learning, where you partition your data into a holdout set (or several) for training (and cross validation), and validation, and testing. The testing set is for error estimation only at the end, and is not used during model development at all.
thanks alot Dr Esarey..Very clear presentation. Could you please do a video on Gaussian Markov Random Fields if possible..Would love to learn it from you.
Isn't the bandwidth (i.e. h) in the multivariate kernel case better approximated by a circle rather than a square in the 3-dimensional space? (1:04:38) You, sir, have just got a new subscriber by the way!
+GUEYE Nono Ghislain That's an interesting question! I think that, in principle, you could implement it either way. Certainly the normal kernel function I used in that part of the video has circular level curves, and so in that sense you are completely correct that the kernel weighting declines relative to the distance from the central point in a way that results in circular curves of equal weighting. But your question really bears on whether you "cut off" the weighting, or set weights of zero at some critical distance from the target point. That cutoff could be a (hyper-)square, or a (hyper-)circle. The way I set it up, it's a square. But you *could* do it either way (e.g., by setting h as being a fixed euclidean distance from the central point, thereby making h_{i} and h_{j} in two dimensions i and j variable depending on the particular angle of a data point relative to the point at which the function is being estimated, so that h = sqrt(h_i^2 + h_j^2) at any point x(i,j). Hope that helps!.
It certainly does. Thank you very much.
Hello teacher , thank you so much for the explanation . could i ask you a few thing for my homework ?
can we use the nonparametric kernel regression to forecast a data ? for example i have data from (1 variable) from january 2000- january 2016 . i want to forecast using kernel regression until january 2017 ? can i do that ? if yes? do you have some references where i could learn it ? Thank you so much
Great tutorial! It explained so much of the intuition behind the nonparametric models to me. Really helpful!
Are there further videos?:)
There are! Check out my UA-cam channel.
This really helps... It is indeed amazing to know that this is intended for Political science majors... :D
Where to learn his machine learning class online?
Thanks for the clear explanation. Any packages in python ?
Unfortunately, I'm kind of committed to teaching in R for the foreseeable future. There *are* packages in Python for most of the stuff I'm teaching, but I don't know much about them!
loess :Does it works well for out of sample prediction/(extrapolation).Assuming that it is only good for in sample prediction.
Regarding the code at 1:07:10, calling ggplot2 resets the random number generator, so the results won't be replicable. You can fix this just be resetting the seed after calling ggplot2. (See the discussion on this stackoverflow exchange: stackoverflow.com/questions/15261619/sample-gives-different-values-with-same-set-seed.)
Thanks for the tip!
Veery helpful for the econometrics enthusiasts of the world
Any chance we can get the R code you refer to in your lecture? It's going at the perfect pace, btw.
+Austin Land jee3.web.rice.edu/nonparametrics-lecture.r
+Justin Esarey Thanks a ton!
thanks you Dr. Esarey! was amazing!
hey awesome serious of videos, watched most of them . Thanks a lot. One suggestion: it would be awesome if you can post the codes used.
Thanks for your interest! The code for most of my videos (including this video) is posted on my teaching website, jee3.web.rice.edu/teaching.htm.
Thank you, indeed very much helpful. Please keep uploading more and more videos and keep sharing knowledge.
Justin Esarey Dear Esarey is it possible to have the codes in Matlab
great teacher!
You are the best of the best ...
This is really helpful. Thanks so much for your sharing
Thank you so much! Very helpful video!
Thank you so much! Your video really helps!!
You're welcome!
Thank You, that's very helpful.
Thanks a lot, Sir
Thanks!