Very good explanation, you made it a lot simpler than my teachers ever could. My undergraduate thesis was in gaussian processes, so it was pretty nostalgic seen you dive into this topic. A note I'd like to make is: the choice of the μ prior is very very important depending on the distance and number of data points you have; your model may be very dependent on it. It makes sense to set it at zero to develop the intuition behind it, but as you try to apply it, you see that the model may just tend to zero if your data is too further a part, or if you make a poor choice of L or don't enough data. So, to get the orange dashed line in the video, you'd also need to have a regression on the data to have your μ prior. But the problem is that you are adding more uncertainty to the model, since you are assuming that the mean for you distribution lies in that linear relation. But as you said it, it's great to have a distribution estimate and not only rely on point estimate models; this is a great alternative.
thank you very much, you make even the hardest topics understandable and fun to watch! could you delve a little bit deeper the mathematical steps of marginalization and conditional probability that you are talking about between 15:00 and 18:00?
If we try to predict the mean for the unknown points in between the data we have, would the mean always follow a straight line (ex: 0:45 one straight line, 24:05 two lines between 3 data points)?
Definitely not! That’s a great question; I drew the straight lines out of simplicity and if you work out the math, the straight line would imply a mean of 13.75 for x=30 but as we see on the second page we actually got a mean of 13.9 there. The shape of the means curve will likely be nonlinear and will depend on the kernel that you choose.
@@ritvikmath ahh i see. so I can get something like polynomial interpolation of μ'(x) if I pick the right kernel? thinking about it, straight line for the mean makes sense if our known data vector is the only thing that matters, but to get something "curvier" it makes sense that the distribution at one point is affected by the points nearby
Thank you for the video. it was nicely explained. There are a lot of simplifications. Could you also talk about how best select sigma and l - is it all done empirically? also do you have any example of implementation?
Sorry I am new to this topic and the math behind it.. If this is a covariance matrix, then higher values should mean greater variance --> less correlation right? But in this case higher values mean more correlation for points closer together. I am confused as to why this is the case.
Man your channel would blow up spectacularly if you invested the time into learning how to make really nice visuals.. the whole poorly hand drawn example thing is really 2005 && screams laziness and/or amateur..
My word! You are a fantastic communicator.
Really appreciate that!
Very good explanation, you made it a lot simpler than my teachers ever could. My undergraduate thesis was in gaussian processes, so it was pretty nostalgic seen you dive into this topic.
A note I'd like to make is: the choice of the μ prior is very very important depending on the distance and number of data points you have; your model may be very dependent on it. It makes sense to set it at zero to develop the intuition behind it, but as you try to apply it, you see that the model may just tend to zero if your data is too further a part, or if you make a poor choice of L or don't enough data.
So, to get the orange dashed line in the video, you'd also need to have a regression on the data to have your μ prior. But the problem is that you are adding more uncertainty to the model, since you are assuming that the mean for you distribution lies in that linear relation.
But as you said it, it's great to have a distribution estimate and not only rely on point estimate models; this is a great alternative.
Happy to see you back here making great videos as always!
Hey, thanks!
Love it! Makes so much more sense to me now 😊
This video couldn't come at a better time, I have a statistical learning exam next week. Thank you so much!!!
I remember asking for this a while back. Thank you!!!
Of course!
Glad you cover this topic!
Hope you enjoyed it!
thank you very much, you make even the hardest topics understandable and fun to watch! could you delve a little bit deeper the mathematical steps of marginalization and conditional probability that you are talking about between 15:00 and 18:00?
This is elegantly explained!
If we try to predict the mean for the unknown points in between the data we have, would the mean always follow a straight line (ex: 0:45 one straight line, 24:05 two lines between 3 data points)?
Definitely not! That’s a great question; I drew the straight lines out of simplicity and if you work out the math, the straight line would imply a mean of 13.75 for x=30 but as we see on the second page we actually got a mean of 13.9 there. The shape of the means curve will likely be nonlinear and will depend on the kernel that you choose.
@@ritvikmath ahh i see. so I can get something like polynomial interpolation of μ'(x) if I pick the right kernel?
thinking about it, straight line for the mean makes sense if our known data vector is the only thing that matters, but to get something "curvier" it makes sense that the distribution at one point is affected by the points nearby
Definitely interested in the math!
Thank you for another remarkable exposition Ritvik...
Glad you liked it!
Great explanation- really appreciate your effort in explaining this
Glad it was helpful!
Thanks for the great explanation!
Of course!
Thank you for the video. it was nicely explained. There are a lot of simplifications. Could you also talk about how best select sigma and l - is it all done empirically? also do you have any example of implementation?
"Thanks for teaching me Gaussian processes - you're mean-t to be my tutor! And trust me, that's no variance from the truth." ChatGPT StatsDad Joke.
I'm a simple man. When Ritvik posts, I watch.
Thanks for this explanation. Ah, now if I can just convince the fish to swim in a normal distribution when I fish...
You and me both!
Well explained , Timely need
Thanks, hope it helped!
Sorry I am new to this topic and the math behind it.. If this is a covariance matrix, then higher values should mean greater variance --> less correlation right? But in this case higher values mean more correlation for points closer together. I am confused as to why this is the case.
Thank you so much
I only took Intro to Stats, so I never learned about kernels or even conditional distributions. Nonetheless, this is very interesting!
modelling an integer quantity which must be greater than zero and chooses gaussian over poisson.....tsk tsk.
Haha fair point! That’s what I get for trying to use a too-simple example 😆
@@ritvikmath Oh of course, but it wouldn't be youtube without the trolls. I felt I needed to truely be part of the community.
Poisson is the perfect distribution for fish
@@johningham1880 sounds like something so crazy only a Frenchman would say it.
I love you.
Ypu are good, but you should just stick to the white board version
Man your channel would blow up spectacularly if you invested the time into learning how to make really nice visuals.. the whole poorly hand drawn example thing is really 2005 && screams laziness and/or amateur..