Guys, realize for a sec how cool is that we are living in the time of the Internet. I got a topic for my seminar (Monte Carlo samplings) where I need to elaborate the topic of Metropolis - Hastings sampling among others. So I started to read the book my prof recommended me, couldn't understand a sh*t so I am going to UA-cam, searching for the corresponding videos, finding this one and understand EVERYTHING. 30 years ago I would have to go to the library and ask there another book and spent there ages until I'll understand it. Now it is simple as that! Bro, thank you sooo much for the way you are explaining the stuff! Those parts with the toy examples and the intuition behind it are so helpful! This is not the first time you are saving my ass!!! From Belarus with Love!
I have tried to understand what hacks the relationship between MC and posterior probability is for the whole day; but after looking at your video, just in 20 min, I understand it. The teaching is so clear and easy to understand! Very high-quality teaching!
This is an impressive alloy of math and intuition behind it - not something you get to see very often in short educational videos like this, because it's really REALLY hard to do. But you sir are one of the few exceptions. Bravo! Please never stop. I'm sorry for my English, just wanted to say how impressed I am. Have a good one!
You are awesome. I watched from Inverse Transformation Method, Rejection Method to this Metropolis-Hasting Method. I was previously confused about those concepts taught by my university lecturer but now I fully understands them all. Thank you so much for your wonderful and insightful teaching video.
I watched a lot of videos for this topic and at around 15:51 thanks to your intution it flipped the switch in me and finally the reason behind all of this makes sense - feels good. Thank you so much!
The first thing I do now when I don't understand a concept is to see if you have a video on it. You make the best videos on the most complicated topics and make them so easy to understand. Simply the best! Thank you for your efforts!
Came here thinking I understood Metropolis-Hastings, enriched myself with doubts during the lecture, wrapped everything up with you at the end. I'm now leaving with a more full understanding. You are an amazing teacher!
Landed here after watching a couple of videos on M-H, and none of them were remotely as clear as your explanation! and your explanation made me really appreciate the intuitive simplicity and beauty of the math. Great work! Really wish I had a teacher like you during my bachelors :D
Hey there! Just would like to thank you for all these wonderful high-quality work you've made and shared with us. I've seen bunch of different versions of videos covering similar topics, but yours is definitely my favorite so far! Great pace control, clear explanation and wonderful teaching style. Well done man. Please keep it up! cheers! 💪👏🙏
So clearly explained. After so many years, I finally understood this. Thank you so much! It would be really great if you can explain on how we can differentiate on samplings!
In my statistics course they first presented the markov chain and then proved that its stationary distribution is the one we are looking for which was very confusing. What helped me a lot in this video is that you showed the derivation of the chain. Thanks for the great explanation and intuition at the end!
hihi thanks for the video, i paused before 10:23 and working on the intuition of this, then i realize it shoud not be max of the two, and then i drag the bar and found you secretly change max to min. But the explanation is perfect and helped a lot!!!
Thank you for the video. Every math is based on intuition and you give it back when I'm about to loose mine. I paused a while and put attention on the max, then I was surprised when it suddenly changed to min. LOL..
This is the best explanation I've came across this. I've been trying to build the intuition outside of the math. In implementing this, say through a computer simulation, I frequently see that if the acceptance probability is between 0 and 1, it's compared to a random draw of the uniform distribution. I'm missing a link in the intuition/math about this component specifically. Can you elaborate a bit more? I kind of get it, but kind of don't. Looking forward to checking out the rest of your videos!
that's actually a tricky concept to grasp; it took me some time too. Pretend the acceptance probability is 0.1. That means we want to accept this event 10% of the time and reject it 90% of the time. Now suppose we generate some uniform random number u between 0 and 1. Consider the two cases: 1) u < 0.1 : this happens with probability exactly 10% (since it came from a uniform random distribution) 2) u >= 0.1 : this happens with probability exactly 90% (since it came from a uniform random distribution) So we can exactly use the value of u to decide whether to accept or reject.
@@ritvikmath What you describe above makes that step in the implementation so much more clear! Thanks for circling back to this (and so quickly), I really appreciate it.
A question about the unnormalized distribution f(x): In a practical situation can f(x) consist of empirical data, for example, formulated as a histogram of occurrences of some quantity?
At 10:20, A(a->b) = MAX(1, rfrg) is flipped to A(a->b) = MIN(1, rfrg), but still A(a->b) is MAX(1, f(b)/f(a)). Why? Should have been MIN(1, f(b)/f(a)) because going down should happen with the chance of f(b) / f(a) when f(b) < f(a)? Otherwise MAX(1, f(b)/f(a)) gives always TRUE and only climb happen. It would work for single mode but how about multi mode? To be able to handle multi mode, we should also go down, not just climb to find the global optima?
How do we know that the chain will collaplse to this steady state? Shouldn't we check that the eigen values of the transition are all negative or something similar? Or is that guarenteed by the dominant balance? Great video by the way, thank you very much!
Honestly I have a hard time thinking of those continuous markov chains. In the past videos we had discrete markov chains with stochastic matrices describing the transition probabilities of going from one state to the other. Here we have a continuous distribution giving us endless transition probabilities. I just don't get how a markov chain is behind all of this. So if I am not incorrect in the MCMC case we yet don't have a Transition probability (matrix), because such would be given by p(x) - as it follows the condition that either rows or columns must add up to one, which is equivalent to having a NC in the continuous case. So our goal is to come up with a steady state distribution that samples from a p(x). In other words we find a pi for a transition matrix which we don't know? Is there maybe a good explanation for a discrete case? I don't get what we already know about the transition matrix, because obviously we do know something - here f(x). But what is that f(x) in a discrete sense? I would highly appreciate an answer.
Hi, very amazing explanation! Do you have an intuition behind how the Hastings factor (rg in your video) works in case of the proposal distribution is asymmetric?
First of all many thanks for the nice and useful content and teaching approach. Secondly, could you introduce any textbook related your video series on Montecarlo calro, Markov chain,..., thanks in advance
Look where you are currently at, look where you have been proposed to go. If the place where you have been proposed to go is of higher probability then you better go there. 👏👏❣
Could you do a video on the NUTS sampler and Hamiltonian samplers in general? Supposedly they are state-of-the-art and superior to Metropolis-Hastings.
I got the impression that alpha satisfies the detailed balance condition. And since detailed balance implies x is from the true distribution, our first sample should already come from the true distribution. So no need to discard the first few samples. What am I thinking wrong?
hey man I love watching your videos I learn a lot from each one of them. I have noticed that I'm more likely to watch the video if the thumbnail contains you. Black background is probably not good as well. Just wanted to share it with you, maybe you should change the thumbnail format. The format of the videos themselves is really nice in my opinion, no need for change there.
Really effective presentation! I wonder how does the MHMCMC handle a bimodal distribution? Would it fail to measure next peak if the jumping parameter is too small?
Thanks for the video! I have a question. From the graph, it's visible when function f(x) gets its higher value. Why do we have to draw samples? Why can't we say immediately at these points f(x) gets the high values? Thanks for your attention!
It is my understanding MCMC works, namely asymptotically converges to p(x), ultimately based on ergodic theorem. Correct me if I am wrong, please. And could you please make some videos on ergodic theory? I find it fascinating to think about. It might be right up your alley.
Guys, realize for a sec how cool is that we are living in the time of the Internet.
I got a topic for my seminar (Monte Carlo samplings) where I need to elaborate the topic of Metropolis - Hastings sampling among others. So I started to read the book my prof recommended me, couldn't understand a sh*t so I am going to UA-cam, searching for the corresponding videos, finding this one and understand EVERYTHING. 30 years ago I would have to go to the library and ask there another book and spent there ages until I'll understand it. Now it is simple as that!
Bro, thank you sooo much for the way you are explaining the stuff! Those parts with the toy examples and the intuition behind it are so helpful!
This is not the first time you are saving my ass!!!
From Belarus with Love!
This is seriously next level teaching. I’ve never heard such a clear explanation of M-H before! Amazing job.
Thanks!
This is a topic that has a lot of layers, but you did a great job of taking it apart and putting it back together! You’re a great teacher.
Thank you so much!
I have tried to understand what hacks the relationship between MC and posterior probability is for the whole day; but after looking at your video, just in 20 min, I understand it. The teaching is so clear and easy to understand! Very high-quality teaching!
Awesome. The last five minutes on intuition is especially good
This is an impressive alloy of math and intuition behind it - not something you get to see very often in short educational videos like this, because it's really REALLY hard to do. But you sir are one of the few exceptions. Bravo! Please never stop.
I'm sorry for my English, just wanted to say how impressed I am. Have a good one!
thanks for the kind words! Also, I really like how you used the word "alloy"; I'm going to start using that :)
wow I didn't notice how 18 minutes passed by... well done! thanks so much !
You are awesome. I watched from Inverse Transformation Method, Rejection Method to this Metropolis-Hasting Method. I was previously confused about those concepts taught by my university lecturer but now I fully understands them all. Thank you so much for your wonderful and insightful teaching video.
You did make the person who doesn’t have english as mother tongue understand the topic!! You have so much talent at teaching! Great job!
You explain this incredibly well! I cant put into words how helpful your videos have been
I watched a lot of videos for this topic and at around 15:51 thanks to your intution it flipped the switch in me and finally the reason behind all of this makes sense - feels good. Thank you so much!
The first thing I do now when I don't understand a concept is to see if you have a video on it. You make the best videos on the most complicated topics and make them so easy to understand. Simply the best! Thank you for your efforts!
i agree veeray... this guy has the magic touch!
Haha absolutely! 😁
Came here thinking I understood Metropolis-Hastings, enriched myself with doubts during the lecture, wrapped everything up with you at the end. I'm now leaving with a more full understanding. You are an amazing teacher!
You did a fantastic job by explaining so many things within 20 minutes and with no jargon!
Landed here after watching a couple of videos on M-H, and none of them were remotely as clear as your explanation! and your explanation made me really appreciate the intuitive simplicity and beauty of the math. Great work! Really wish I had a teacher like you during my bachelors :D
Mate, this is the most amazing and clear content re MCMC ive yet seen. incredible. thank you so much!
Hey there! Just would like to thank you for all these wonderful high-quality work you've made and shared with us. I've seen bunch of different versions of videos covering similar topics, but yours is definitely my favorite so far! Great pace control, clear explanation and wonderful teaching style. Well done man. Please keep it up! cheers! 💪👏🙏
thanks so much!!
Your explanation of the proposal density is the best I ever found! Thank you so much for your sharing!
Thanks!
Amazing explanation! I usually do not comment on UA-cam but here I make an exception. Good job!
You clarify complex concepts to make them easier to understand; this will significantly help me in my Advanced Workbook assignment, thanks.
You definitely deserve more exposure!! Thanks a lot for these great explanations:)
You're a saint. Thanks to people like you, the world has a chance.
aw thanks!
Awesome explanation, best resource i have found to really understand the intuition behind MH. Thank for your effort!
Glad it was helpful!
The series of your videos is indeed amazing! Thank you so so much!
This is so great. Best video I have found on this topic by far.
Amazing, deserves more views and could easily replace many of the lectures on MH out there!
thanks!
So clearly explained. After so many years, I finally understood this. Thank you so much! It would be really great if you can explain on how we can differentiate on samplings!
Kindly remind, there is a typo that the MAX(1, r_{f}r_{g}) should be MIN(1, r_{f}r_{g}). Many thanks, Ritvik, your video helped me a lot.
In my statistics course they first presented the markov chain and then proved that its stationary distribution is the one we are looking for which was very confusing. What helped me a lot in this video is that you showed the derivation of the chain. Thanks for the great explanation and intuition at the end!
Very clear explaination! Specifically, I love the intuition part at the end so much. Thanks for your excellent work!
You're so welcome!
Absolutely best math teacher on this planet. Everytime I am searching for a math concept , if there's a video by ritvikmath, I know I am saved.
Watched two years ago, when I was a undergrad. Now I came back watched it again and again when I am grad. Great video!
Fantastic job. This is the best explanation and description of MH that I've ever heard.
Your channel is super helpful. I finally understand MCMC and successfully programmed!
Great to hear!
Man, this is the best presentation of Metropolis-Hastings I have seen, yet. Respect - keep up the good work!
Ritvikmath is the only person who was able to finally explain Bayes to me. By far the best explanation I have ever seen. A+
Thanks for sharing. I think I understand MH algorithm. You are so cool to explain profound theories in simple words!
The best explanation of Metropolis Hastings on the internet.
Dang thanks!
Wrapping up a statistics PhD and I still come back to this video every few months to re-calibrate my intuition
Very very clear summary of MH algorithm with explanation of every step. Really great and helpful work, thanks a lot !
Glad it was helpful!
This succeeded for me where all other videos failed.. great explanation!
Love this explenation!
I'm binging your videos. God tier teaching!
Great job Ritvik..such a cool explanation..love it!! Keep up the good work. Cheers!!
Great video and explanation. Wish those articles and videos dumping math formulas watch this video and learn now to explain.
Thank you so much!! This is the clearest explanation of MH I have ever seen.
Wonderful job Ritvik. Thank you.
Thanks!!
seriously you are saving me for upcoming exams
thank you!
of course!
Truly increadible clarity, thank you very much!
best video on MH. you make a great teacher!
This is extremely helpful! Thank you so much!! Also I appreciate your sharing your own experience learning this!
hihi thanks for the video, i paused before 10:23 and working on the intuition of this, then i realize it shoud not be max of the two, and then i drag the bar and found you secretly change max to min. But the explanation is perfect and helped a lot!!!
Great video, the intuition part is amazing. Thanks!
man, you are so gifted as a teacher, keep up the good work :)
Thank you for the video. Every math is based on intuition and you give it back when I'm about to loose mine. I paused a while and put attention on the max, then I was surprised when it suddenly changed to min. LOL..
The explanation of ituition is great!
Your explanation is next level. Thank you very much!
You're very welcome!
You are great!!! keep going, finally, I understood the metropolis hastings algorithm idea xD
This is the best explanation I've came across this. I've been trying to build the intuition outside of the math.
In implementing this, say through a computer simulation, I frequently see that if the acceptance probability is between 0 and 1, it's compared to a random draw of the uniform distribution. I'm missing a link in the intuition/math about this component specifically. Can you elaborate a bit more? I kind of get it, but kind of don't.
Looking forward to checking out the rest of your videos!
that's actually a tricky concept to grasp; it took me some time too.
Pretend the acceptance probability is 0.1. That means we want to accept this event 10% of the time and reject it 90% of the time. Now suppose we generate some uniform random number u between 0 and 1. Consider the two cases:
1) u < 0.1 : this happens with probability exactly 10% (since it came from a uniform random distribution)
2) u >= 0.1 : this happens with probability exactly 90% (since it came from a uniform random distribution)
So we can exactly use the value of u to decide whether to accept or reject.
@@ritvikmath What you describe above makes that step in the implementation so much more clear!
Thanks for circling back to this (and so quickly), I really appreciate it.
@@ritvikmath Why not sample from a binomial distribution with p = 0.1?
Wish there was a triple-like button. Perfect explanation. Thanks a lot!
Crystal clear! Thank you! :)
Glad it was helpful!
Thanks for your explanations!! Very useful and clear to help the understanding!!
Glad it was helpful!
I found this video very helpful after I got confused in my course. Thank you very much!
Great presentation and thanks for the intuition!
I'm doing a master on data science and you are saving me on bayesian stats! Thanks
this whole thing basically collapses to a moving mean distribution
Thank you! Really amazing lesson. I really appreciate the intuition part at the end!
A question about the unnormalized distribution f(x): In a practical situation can f(x) consist of empirical data, for example, formulated as a histogram of occurrences of some quantity?
Noticed your change from MAX to MIN at around 10:23. HAHAHAH, great move!
At 10:20, A(a->b) = MAX(1, rfrg) is flipped to A(a->b) = MIN(1, rfrg), but still A(a->b) is MAX(1, f(b)/f(a)). Why? Should have been MIN(1, f(b)/f(a)) because going down should happen with the chance of f(b) / f(a) when f(b) < f(a)? Otherwise MAX(1, f(b)/f(a)) gives always TRUE and only climb happen. It would work for single mode but how about multi mode? To be able to handle multi mode, we should also go down, not just climb to find the global optima?
Wow this was such a nice explanation, kudos!
insane quality video
very well done... thank u!
How do we know that the chain will collaplse to this steady state? Shouldn't we check that the eigen values of the transition are all negative or something similar? Or is that guarenteed by the dominant balance?
Great video by the way, thank you very much!
Incredible explanation.
Honestly I have a hard time thinking of those continuous markov chains. In the past videos we had discrete markov chains with stochastic matrices describing the transition probabilities of going from one state to the other. Here we have a continuous distribution giving us endless transition probabilities. I just don't get how a markov chain is behind all of this. So if I am not incorrect in the MCMC case we yet don't have a Transition probability (matrix), because such would be given by p(x) - as it follows the condition that either rows or columns must add up to one, which is equivalent to having a NC in the continuous case. So our goal is to come up with a steady state distribution that samples from a p(x). In other words we find a pi for a transition matrix which we don't know? Is there maybe a good explanation for a discrete case? I don't get what we already know about the transition matrix, because obviously we do know something - here f(x). But what is that f(x) in a discrete sense? I would highly appreciate an answer.
Hi, very amazing explanation! Do you have an intuition behind how the Hastings factor (rg in your video) works in case of the proposal distribution is asymmetric?
Amazing explanation! MH was magic to me until I watched this! Thank you 🙏
First of all many thanks for the nice and useful content and teaching approach. Secondly, could you introduce any textbook related your video series on Montecarlo calro, Markov chain,..., thanks in advance
Look where you are currently at, look where you have been proposed to go. If the place where you have been proposed to go is of higher probability then you better go there. 👏👏❣
Nice video! One question: would we risk getting stuck at one of the higher-density areas when there are several peaks in p(x)
Could you do a video on the NUTS sampler and Hamiltonian samplers in general? Supposedly they are state-of-the-art and superior to Metropolis-Hastings.
Amazing and super helpful video! 👏🏻👏🏻
Glad it was helpful!
This explanation is golden. Can you do a coding example to solidify understanding, as future video suggestion.
I got the impression that alpha satisfies the detailed balance condition. And since detailed balance implies x is from the true distribution, our first sample should already come from the true distribution. So no need to discard the first few samples. What am I thinking wrong?
I learned something! Very good video!
at 17:19 how it this "might accept the move" performed in practice, awesome video btw
Thank you so so much that I finally understand metropolis hasting.🎉
I'm so glad!
Yes , I was wondering why you had max instead of min at start. But you made the correction. Thanks
You're welcome!
hi, I have a question, why the transform is balanced for p(x) do make the MC chain lead to that stationary distribution?
it is a very amazing lecture. you are really a very good gifted teacher. pls make more videos go on educating us
Wow, it's clear the best tutorial for me. Thanks
hey man I love watching your videos I learn a lot from each one of them. I have noticed that I'm more likely to watch the video if the thumbnail contains you. Black background is probably not good as well. Just wanted to share it with you, maybe you should change the thumbnail format. The format of the videos themselves is really nice in my opinion, no need for change there.
thank you for the feedback! I've been experimenting with different styles and direct feedback like this means so much!
Really effective presentation! I wonder how does the MHMCMC handle a bimodal distribution? Would it fail to measure next peak if the jumping parameter is too small?
That is precisely what I was wondering. Will it get stuck in a local maximum density and only uncover a portion of the density distribuiton?
Thanks for the video! I have a question. From the graph, it's visible when function f(x) gets its higher value. Why do we have to draw samples? Why can't we say immediately at these points f(x) gets the high values? Thanks for your attention!
I love your videos! thank you so much! One question: are the NCs for f(a) and f(b) the same values? I noticed that you canceled NC out.
Thank you for the intuitive explanation.
It is my understanding MCMC works, namely asymptotically converges to p(x), ultimately based on ergodic theorem. Correct me if I am wrong, please. And could you please make some videos on ergodic theory? I find it fascinating to think about. It might be right up your alley.
Super well explained!
simply amazing