The algorithm that (eventually) revolutionized statistics -

Поділитися
Вставка
  • Опубліковано 10 січ 2025

КОМЕНТАРІ • 148

  • @jkzero
    @jkzero 5 місяців тому +61

    This is brilliant, as a physicist turned pseudo-statistician, the physics-motivated approach of MCMC makes more sense to me than the statistical approach.
    Fun story: despite bearing his name, Nicholas Metropolis didn't contribute at all to the algorithm. The concept was originally developed by physicist Marshall Rosenbluth at Los Alamos for studying thermonuclear explosions. His wife Arianna Rosenbluth (also a physicist) turned the concept into code. The Rosenbluths are the true pioneers. They struggled with a consistent way to pick an appropriate acceptance function. One day having dinner with Edward Teller and his wife Augusta Teller, they described the problem, the implementation, and the challenge. The Tellers being a couple of physicist+mathematician suggested the obvious choice for a physicist: simply use the Boltzmann distribution. And the first MCMC algorithm was born. What did Metropolis do? He had the keys of the computer lab, and the alphabetical order used for naming authors made his last name the name of the algorithm. I personally call it the Rosenbluth algorithm.

    • @brian.westersauce
      @brian.westersauce 5 місяців тому +1

      JORGE!!! ❤

    • @jkzero
      @jkzero 5 місяців тому +1

      @@brian.westersauce 😮

    • @csaracho2009
      @csaracho2009 5 місяців тому

      "Having the keys" is important!

    • @yoanfardella3421
      @yoanfardella3421 3 місяці тому

      Other fun fact: they were actually inspired by the Buffon needle problem where one can determine pi by throwing needles on the ground.
      Ps: funily enough it actually does not work directly on a computer because we need pi to generate the needle orientation (there is a way to go over that though)

  • @jjmyt13
    @jjmyt13 5 місяців тому +57

    That "coming clean" bit really highlights the difference between theory and reality; to operationalize Metropolis-Hastings requires a lot of extra work.

  • @rugbybeef
    @rugbybeef 5 місяців тому +59

    I really respect the way the 'like' and 'sub' requests weren't explicit. It was made clear they were also important and prioritized them by making their consideration a natural part of the exposition. Bravo.

  • @josephlunderville3195
    @josephlunderville3195 5 місяців тому +12

    Not a statistician, but a machine learning person -- one of the more fascinating things about this algorithm to me is you can use it to explore a solution space by turning it on its head and inventing a probability distribution that makes better solutions more likely. It's not super widely used but it has applications. Important work that deserves better visibility!

    • @very-normal
      @very-normal  5 місяців тому +1

      That’s cool! This channel’s eventually going to have to get into machine learning at some point, and it sounds like there’s some cool stuff I’ve never heard of

    • @very-normal
      @very-normal  5 місяців тому

      That’s cool! This channel’s eventually going to have to get into machine learning at some point, and it sounds like there’s some cool stuff I’ve never heard of

    • @ramulakshmanan4906
      @ramulakshmanan4906 20 днів тому

      Sounds very interesting. When would you choose to use metropolis over other optimisation techniques like simulated annealing?

  • @barttrudeau9237
    @barttrudeau9237 5 місяців тому +9

    That was the best explanation of MCMC I have heard. I'm going to listen to it ten more times and see I start to understand. This stuff is really hard for people with no stats or higher math background, but it's fascinating and fun to learn, however slowly

  • @berjonah110
    @berjonah110 5 місяців тому +39

    Happy to see you covering this topic. MCMC is what made Bayesian statistics viable beyond conjugate priors and simple distributions. There's a lot more to dig into including Hamiltonian Monte Carlo (what NUTS is based on) and Gibbs sampling, as well as convergence metrics. Bayesian Data Analysis 3rd ed. (Gelman, Carlin, Stern, Dunson, Vehtari, & Rubin) is the go to book for this.

    • @mickaiba71
      @mickaiba71 5 місяців тому +1

      Is that book readable for someone who is just starting out in that area? Cause this got me hooked

    • @berjonah110
      @berjonah110 5 місяців тому +5

      @@mickaiba71 I'd say it's targeted towards someone with a Master's. There's a lot more besides just MCMC. It's really meant to be used more like a reference than something you read cover to cover. "Statistical Rethinking" by McElreath gives one of the best introductions to Bayesian Statistics out there. There's a few chapters on MCMC in that one as well.

    • @mickaiba71
      @mickaiba71 5 місяців тому +1

      @@berjonah110thanks a bunch, I’ll check it out!

    • @cameronkhanpour3002
      @cameronkhanpour3002 5 місяців тому +2

      ⁠@@mickaiba71 you can also tackle from the point of view of some of its applications. I had first learned of MH algorithm through learning about probabilistic graphical models. Bayesian Reasoning and Machine Learning by David Barber did a great job for me introducing and connecting these topics together

    • @mickaiba71
      @mickaiba71 5 місяців тому

      @@cameronkhanpour3002
      Yea actually I am taking a course in statistical modelling right now 🙂‍↕️
      Thank you also for the recommendation

  • @karansgarg
    @karansgarg 5 місяців тому +38

    I've never seen the acceptance rate expression explained/derived so intuitively - great job!
    By the way, to get your RWM to work, judging by the traceplots it seems that you would have needed to add some sort of pre-conditioning to change the variance of the proposal distribution for each dimension of your target separately, since alpha and beta seem to be on a different "scale" to the p variables, which is why finding a common "h" that allowed the chain to mix fast enough in all directions without getting stuck would be borderline impossible. RWM (even with a Gaussian proposal rather than uniform) tends to struggle with hierarchical models and high-dimensional models more generally because of this idea, but thankfully we have things like STAN using HMC. Would love to see an explainer on NUTS/HMC in your explanatory style if you ever feel like it!

    • @very-normal
      @very-normal  5 місяців тому +18

      yeah, I saw someone else implement a hierarchical beta-binomial, but they chose to update alpha and beta separately. Based on what you said, I can definitely see why now. At least it was a good learning experience on my part 😅

    • @karansgarg
      @karansgarg 5 місяців тому +7

      @@very-normal Learning the "hard way" is almost always the most valuable in that regard :)

  • @somashreechakraborty1129
    @somashreechakraborty1129 3 місяці тому

    BRILLIANT EXPLANATION! FINALLY I UNDERSTOOD WHERE THE INTEGRATION WENT!!! THANK YOU :)

  • @markkarlov1042
    @markkarlov1042 5 місяців тому +3

    The audio quality on your videos is progressively better. Keep up the good work!

  • @sjpbrooklyn7699
    @sjpbrooklyn7699 5 місяців тому +2

    Thank you for a stimulating video on an important statistical method. The hand-written equation appearing on-screen from 0:10 to 0:22 and briefly captioned “Despite how it looks, this is just an average,” is a fundamental building block of thermodynamics. It was at the core of my doctoral dissertation project in polymer physical chemistry in the 1960s. A generation earlier the average geometric properties of a polymer such as its end-to-end distance or radius of gyration (average distance of component atoms from center of mass) were simulated with pencil and paper as a Brownian motion type random walk with a fixed step length representing the distance between atoms along the connected chain, often constrained to a 2- or 3-dimensional spatial lattice. Their mathematical properties were delineated by Einstein and others. However, a Brownian particle can visit the same lattice point indefinitely, while two real atoms cannot occupy the same point in space. This gave rise to the concept of the self-avoiding random walk, a much more difficult mathematical problem. The 1950s saw the proliferation of electronic computers like the Illiac, that made it possible to simulate more realistic average polymer properties with self-avoiding walks, but it was quickly found that the proportion of walks without self-intersections decreased rapidly as the chain length (no. of steps) increased. A few clever biasing techniques involving re-starting a self-intersecting walk improved things slightly but these were only minor tweaks.
    In the 1960s Moti Lal, a chemist at Unilever Labs in the UK, became aware of Metropolis’s paper and applied it to the polymer problem. However, his available computing power (IBM 360/50) confined his polymers to 30 monomer units on a 2-dimensional spatial lattice. As a graduate student at NYU in 1968 I had access to a Control Data Corp. CDC 6600 Cray Supercomputer at the Courant Institute of Mathematical Sciences (they let us chemists use their facilities). I used the Metropolis-Lal method to generate polymers with free rotation about their backbone bonds, i.e., not restricted to an artificial lattice. I used your hand-written equation with Boltzmann weighting exp(-E/kT) to generate average geometric and thermodynamic properties from thousands of Monte Carlo samples. The Metropolis importance sampling algorithm generated samples from more “important” regions of polymer conformation space so it took fewer samples and far less computer time to get stable averages. I could even generate numerical distributions of end-to-end distances of polymers with sufficient accuracy to discriminate among several distribution functions proposed by theoreticians.

    • @csaracho2009
      @csaracho2009 5 місяців тому

      "Eniac"... (not Illiac).
      Regards.

  • @atlas4074
    @atlas4074 5 місяців тому +3

    Finally the MCMC video! I like your explanation of the acceptance function and the comind clean section

  • @Mystic2122
    @Mystic2122 5 місяців тому +1

    You explain everything so clearly, It's crazy

  • @halcyon107
    @halcyon107 5 місяців тому +7

    Awesome job at explaining the Metropolis-Hastings algorithm! For the hyperparameter tuning problem of finding the best half-width h of your proposal distribution there is the classic paper "Weak convergence and optimal scaling of random walk Metropolis algorithms" by A Gelman, WR Gilks, GO Roberts from 1997, where they assume a Gaussian proposal distribution and in the limit of a high dimensional parameter space, the optimal acceptance ratio comes out at ~0.23. So you can tune your h so that the average acceptance ratio is about 1/4 as a good heuristic to start with. It can get more complicated if the different parameter dimensions have vastly different optimal standard deviations for the gaussian proposal distribution and there are lots of elaborate papers about the topic that came afterwards, but it's a place to start and a nice simple heuristic.

  • @Khashayarissi-ob4yj
    @Khashayarissi-ob4yj 4 місяці тому

    👏👏👏
    excellent lecture.
    Thank you, Doctor, your scientists and your colleagues.
    and
    With luck and more power to you.
    hoping for more videos.

  • @Xylos144
    @Xylos144 5 місяців тому +1

    There's a cool application of this underlying principle with markov chains in that you can coordinate uncoordinated agents without having to track or communicate with them in real time.
    Taking your Portland Example, you would get your 100 people stationed at each activity in roughly the correct proportions by giving them a list of the other activities and the probability that they should stay put or go to each other place. They'll all shuffle around each hour, but they'll adhere closely to the target.
    This has applications for things like, say, a swarm of Drones spelling out a word in the sky. Sure, you could calculate a specific 3d coordinate for each specific drone and transmit that info to each of them 1 by 1 every time you want to change the word. Or you could just broadcast the 'pixel' locations you want the drones to occupy, and how densely filled each pixel is, and let them individually, randomly pick which space to occupy.
    Especially if you let them constantly cycle their positions, it'll do a good job with the task.

  • @bnmac7426
    @bnmac7426 5 місяців тому +1

    Dude, I love this. Keep doing what your doing.👍

  • @gma7205
    @gma7205 5 місяців тому

    Thank you for the video! Extremely insightful and well done. Keep up the good work!

  • @walterreuther1779
    @walterreuther1779 5 місяців тому

    Great video!
    At 6:35 it was said that one can take a continuous function and divide it by it’s integral to get a probability distribution.
    I think we are implicitly taking two more assumptions here to make it work:
    1. The function is non-negative (which we can usually assume if it has a minimum, as we can just subtract the minimum to make it non-negative (a function is a pdf if it is non-negative and it integrates to 1)).
    But more importantly
    2. The integral of the function needs to be a finite constant, dividing by infinity makes little sense. This is a given if the state space is bounded, but if it’s the real number line, it’s not. (The integral over the reals of exp(x) is divergent.)

    • @ps.2
      @ps.2 4 місяці тому

      Yes, but that seems to me like another way to say the function you're trying to convert to a pdf simply ... isn't suitable as one. To me it indicates that in deriving a function that you think should represent a probability space in some way - if you end up with negative numbers, or the area under the curve diverges to infinity - something has gone very wrong with your thinking or modeling. Possibly as simple as neglecting to specify that your actual problem domain is not infinite but has a known, fixed minimum and maximum.

    • @walterreuther1779
      @walterreuther1779 4 місяці тому

      @@ps.2 Yes, I just wanted to add this because - as we are in mathematics here - it's good to be precise about the assumptions we are making.
      If the continuous function is non-negative and its integral over the sample space Ⲭ exists then one can always do this normalisation (if I didn't miss anything).
      Otherwise, we'd need to make some adjustments (which will always be possible if Ⲭ is bounded). (afaik)

  • @kevinndayishimiye934
    @kevinndayishimiye934 5 місяців тому +9

    i didn’t really understand what was going on in the last part of the video with your hierarchical model. what should i search for to learn about that stuff?

    • @very-normal
      @very-normal  5 місяців тому +10

      I couldn’t get it to run with Metropolis’ original implementation, so I had to use another algorithm to do the analysis. I used Stan, which implements what’s called the algorithm called the No U-Turn Sampler. It’s an advanced descendent of Metropolis

  • @AIvarogarciaa
    @AIvarogarciaa 5 місяців тому +1

    Grest video, I hope see the explanation of Gibs Sampling 🙌🏻

  • @anilanderson5978
    @anilanderson5978 3 місяці тому

    Nice explanation of the algorithm!
    Looking at your code for the acceptance ratio, it appears you are computing the ratio of the log_posterior’s instead of computing their difference and exponentiating. Maybe adjusting that would improve performance?
    I worked on a project for which I implemented an MCMC sampler and also found it was difficult to tune it properly. One thing that helped was to use the adaptive Metropolis Hastings Algorithm from (Haario et al., 2001) so that the proposal distribution, a multivariate normal, is updated at each iteration using the covariance matrix of the previous draws. The variance term for each parameter should eventually settle at an appropriate value automatically without you having to play around with each one, which is nice when you’re dealing with a lot of them. I also found I needed a burn-in period of about 20,000 draws and about 70,000 total draws to get stable inference, so just running your sampler for longer may improve things. The computational burden is definitely a drawback of this algorithm. Thank goodness for Stan!

  • @AntonioRonde
    @AntonioRonde 5 місяців тому

    I like your style of explaining. You could maybe slow down a little during the math parts but pairing it with core made it understandable for me again as I am a software engineer by trade.

  • @pyrotas
    @pyrotas 5 місяців тому +1

    At 6:34: your f(X) must be non negative. Also, care must be taken about how f(X) behaves at infinity (see heavy tail distributions).

  • @PerriPaprikash
    @PerriPaprikash 4 місяці тому

    I heard Portland is pretty nice this time of year. I even think they did a show about Portland called Portlandia. Also based on what I've heard the "drinking craft brew" percentage should be increased by roughly 50 points.

  • @Trizzer89
    @Trizzer89 5 місяців тому

    This algorithm is amazing AF. I'm gonna use it in the travel example for sure

  • @Eloss69
    @Eloss69 5 місяців тому +1

    I’m a student working with estimation of covariance data, this video helped me take a step further into applied Bayesian statistics. It had just the right amount of information for me. Very thankful!
    If you’re familiar with it, are there researchers linking Bayesian statistics with copulas ?
    (Since you seem very eclectic on statistics I would gladly push for a video on copulas ^^)

    • @very-normal
      @very-normal  5 місяців тому +1

      I’m just starting to get into copulas myself! Still starting out, but I’ve been working through the copula textbook by Czado, if that helps any. I do recall a small section being dedicated to Bayesian stuff, so it might be worth it to check it out

  • @seongunness608
    @seongunness608 5 місяців тому +1

    havent watched the entire thing but it reminds me of that thing where you can survey everyone a random thing (eg: how many jelly beans in a jar) and you'll get very close to the answer

  • @musaurelius7144
    @musaurelius7144 5 місяців тому

    Great video. I really like the walkthorugh of the technicals details. A nearby topic could be the Gibbs-sampling? :)

  • @kipwsm
    @kipwsm 3 місяці тому

    7:08 - 7:17 is top quality

  • @drdca8263
    @drdca8263 5 місяців тому

    10:51 : is the setting it to 1 on the right side, the min(1, •) part of the eventual formula obtained at 11:27 ?
    So, it will only get capped at 1 if it is the one that ordinarily wouldn’t be likely enough, and this is fine because of it being handled by the other side in that case?

    • @very-normal
      @very-normal  5 місяців тому +1

      Yea you have it about right. You might see other versions of the algorithm that don’t calculate a minimum and just the ratio. It’ll still be the same algorithm in the end so long as you generate an independent standard uniform.

  • @Kram1032
    @Kram1032 5 місяців тому +1

    So in context of solving the rendering equation there fairly recently were a couple of papers that upgraded Multiple Importance Sampling to Continuous Multiple Importance Sampling (which is often intractable), Stochastic Multiple Importance Sampling (which makes CMIS practical at the cost of, I think, higher variance) and Marginal Multiple Importance Sampling (which further broadens the scope of distributions it can handle)
    Ever since I heard those, knowing MCMC has this tradeoff involving the half-width parameter h to direct how well things are explored, I wondered if any of those techniques could be used to basically Multiple Importance Sample over all choices of h. Do you happen to know whether this is feasible?

  • @diribigal
    @diribigal 5 місяців тому

    I like how at the end you first rejected the transition from "That's it for this one." to the next line.

    • @very-normal
      @very-normal  5 місяців тому +1

      totally planned on my part

  • @mujtabaalam5907
    @mujtabaalam5907 5 місяців тому +1

    What does lowercase yi represent?

  • @blacklistnr1
    @blacklistnr1 5 місяців тому

    I don't understand why the detailed balance is so strict(10:00), shouldn't the balance rule support the global static distribution instead of the local one?
    i.e. Maybe X goes to Y but Y doesn't go directly to X to balance it out, instead they go to Z for a few mins then to X and sit there for as long as needed to balance the global distribution.

    • @very-normal
      @very-normal  5 місяців тому +1

      In doing the research for the video, I saw that there are MCMC implementations that do use the weaker balance condition instead of detailed balance. But, I’m not sure as to why one would prefer one assumption over the other

  • @sleepyinseattle4615
    @sleepyinseattle4615 4 місяці тому

    But if we’re using Metropolis for importance sampling for MC estimation, we have to weight each contribution in the sum by it’s 1/probability, which means we still need to estimate the normalizing constant???

  • @RSLT
    @RSLT 5 місяців тому

    Very cool Fact to learn GREAT VIDEO! Liked and subscribed ❤

  • @sriharsha580
    @sriharsha580 5 місяців тому +3

    Next gibbs sampling, exponential families

  • @5eurosenelsuelo
    @5eurosenelsuelo 5 місяців тому +1

    Statistics is so hard... I appreciate your effort and the quality of your work but I'm definitely not ready for this content. Still subscribed and liked though. I'll try checking your channel for simpler videos I might be able to follow. Statistics is tremendously useful to understand our world but I'm still a Calculus boy who enjoys vectors.

  • @strayorion2031
    @strayorion2031 5 місяців тому

    5:32 Does this mean we are not assuming independent values of the samples? or I am thinking too much like frequentist and you don't need to keep these assumptions in mind when using bayesian statistics? and/or these random number generation algorythm do not have any effect on the assumptions on the data?

    • @AubreyBarnard
      @AubreyBarnard 5 місяців тому +1

      Yes, in a Markov chain the samples aren't independent. The magic of Markov chains is that, despite the dependent samples, you can accurately estimate properties of the target distribution (given assumptions, etc.).

    • @AubreyBarnard
      @AubreyBarnard 5 місяців тому +1

      Oh, and this isn't really anything to do with frequentist vs. Bayesian viewpoints.

    • @minjae92
      @minjae92 5 місяців тому +1

      the most example of markov chain is weather. today's weather depends on yesterday. but it doesn't depend on days ago. because we already considered previous effects by the weather of yesterday. likewise this some various is related to its previous state then we can use markov chain. So it's totally different concept with the bayesian
      of course it could possible their stochastic distribution follows bayesian but im not studied this deeply so im not sure that

    • @strayorion2031
      @strayorion2031 5 місяців тому

      @@AubreyBarnard thanks

    • @strayorion2031
      @strayorion2031 5 місяців тому

      @@minjae92 thanks

  • @TheThreatenedSwan
    @TheThreatenedSwan 5 місяців тому

    He can't keep getting away with it! (making these cool videos)

  • @mohammedfarhaan9410
    @mohammedfarhaan9410 5 місяців тому

    its sort of like the way u can break down signals into sin and cos waves u can break samples into well behaved distributions

  • @seanpinyo5089
    @seanpinyo5089 5 місяців тому

    I would like to note that the metropolis algorithm is a bit different than the metropolis hastings algorithm, it is a more general case and has a different acceptance function

  • @juanpabloferro4419
    @juanpabloferro4419 4 місяці тому

    oh!! please make a Markov's chains video next!. Some ppl doesn't know that they even exist.

  • @lawrencenash4351
    @lawrencenash4351 5 місяців тому +8

    I have no clue what is happening

    • @very-normal
      @very-normal  5 місяців тому +4

      this was me two weeks ago

  • @idrankwhatt
    @idrankwhatt 5 місяців тому +1

    Dang! I got accepted to present at JSM but currently interning at NASA. I look forward to seeing you there next year in Nashville! Fantastic as always by the way, love your work!

  • @whatitmeans
    @whatitmeans 4 місяці тому

    the acceptance function looks more like a "likelihood" than a "probability"

  • @ruudh.g.vantol4306
    @ruudh.g.vantol4306 10 днів тому

    Just accept that “guaranteed” means “most likely”, in any context. Yes, that can be a huge struggle.

  • @alberta5879
    @alberta5879 5 місяців тому +1

    Fantastic Spongebob reference!

  • @skyletoft
    @skyletoft 15 днів тому

    The code and data are not in the description?

    • @very-normal
      @very-normal  14 днів тому

      lol I was partially convinced that no one actually looks for those. You can find it in this Github repository in the "metropolis.Rmd" file: github.com/very-normal/explained

  • @Impatient_Ape
    @Impatient_Ape 5 місяців тому

    I hope you got a chance to go to Powell's!

    • @very-normal
      @very-normal  5 місяців тому +1

      powells was amazing, i got lost

  • @billfrug
    @billfrug 4 місяці тому

    whats a stationary distribution

    • @very-normal
      @very-normal  4 місяці тому

      a distribution that doesn’t change over time. For example, if I’m losing weight, the mean weight I’d see on my scale would change over time, so it wouldn’t be stationary. If I’m maintaining, then it would be stationary

  • @Inexorablehorror
    @Inexorablehorror 5 місяців тому

    Have you tried brms as a frontend for stan??? It makes it MUCH easier, you should try it out!!

    • @very-normal
      @very-normal  5 місяців тому +1

      YES! I try to use brms when I can, but this was thankfully easy enough to implement

  • @carlospena98
    @carlospena98 5 місяців тому +1

    STAN MENTIONED !!!!

  • @loadingUserID...
    @loadingUserID... 5 місяців тому

    Nice video. You should explain NUTS next.

  • @LenPopp
    @LenPopp 5 місяців тому +1

    Possible error: Your list of Portland activities doesn't include "going to Powell's Books".
    But maybe that's not something locals do, only tourists.
    But maybe that's not something tourists do either, only really nerdy tourists.

    • @AubreyBarnard
      @AubreyBarnard 5 місяців тому

      Or really nerdy locals?

    • @very-normal
      @very-normal  5 місяців тому +1

      I’ll check if there are any nerdy tourists here and report back

  • @isbestlizard
    @isbestlizard 4 місяці тому

    How are the population units initialised? You couldn't have every single person you ask have an identical preference, so how does the algorithm decide some people prefer coffee to hiking, wheras aothers prefer hiking to coffee?

    • @very-normal
      @very-normal  4 місяці тому

      My population is Portland locals, but I also I made a pretty strong assumption that all Portland locals would have the same preference distribution. If I assume this, then any local I ask would generally give me the same type of answers. Of course this isn’t really a reasonable assumption, just something to further the metaphor

  • @sitrakaforler8696
    @sitrakaforler8696 5 місяців тому

    Nice 🎉

  • @andrewkraevskii
    @andrewkraevskii 5 місяців тому +2

    7:10 missing ( literally unwatchable. Joking great video

    • @very-normal
      @very-normal  5 місяців тому +2

      you caught me, im a fraud

  • @DrSimulate
    @DrSimulate 5 місяців тому

    Great video! I love the way you are explaining. Somehow hierarchical Bayesian models always confuse me. The choice of the model structure and the choice of the distributions often seems arbitrary to me. Do you have a good reference (or did you already make a video) that explains this? Keep up the good work (and check out my SoME video if you like 😁)

  • @EvanBerkowitz
    @EvanBerkowitz 5 місяців тому +1

    The heart of the Metropolis algorithm---the accept/reject step---is still widely used in physics simulations today, though typically we do not make such simple proposals. Hamiltonian Monte Carlo (also invented by physicists!), for example, still includes a Metropolis accept/reject test to guarantee that the generated distribution matches the target.

  • @xniyazi
    @xniyazi 5 місяців тому

    Thnx

  • @MrEo89
    @MrEo89 5 місяців тому

    Is it just me or y’all hear “halfwit” every time he said “half-width” 😂😂😂

  • @ivanakinfiev7572
    @ivanakinfiev7572 5 місяців тому

    EM algo plz

    • @very-normal
      @very-normal  5 місяців тому +1

      this is my current nemesis 💀

  • @Mikee512
    @Mikee512 5 місяців тому

    Statistics: the mathematics of eventually.

  • @MooImABunny
    @MooImABunny 5 місяців тому +3

    Some parts of the video were really good, but at some point you keep introducing new ideas before I had the chance to understand the previous ideas.
    When you moved on to implementing the algorithm on your like/view ratio I thought this would be an example that will help me understand what you discussed previously, but all of a sudden you started introduction priors and hyper priors and I just completely lost you.
    I feel like I almost learned something very interesting, and at the last moment I was left confused...

    • @very-normal
      @very-normal  5 місяців тому

      rip

    • @MooImABunny
      @MooImABunny 5 місяців тому +2

      @@very-normal hey, all the other comments here are very positive, don't feel bad because of my opinion.
      I wrote my comment because I hope you'll pay more attention next time to what a novice might get from a video, but you tackled a specialized subject, and it's also fair to make content for people with a background

    • @very-normal
      @very-normal  5 місяців тому +1

      oh no sorry, I definitely took your comment to heart, the video’s not perfect. I saw what you were talking about and agreed it’s not the smoothest transition. I appreciate you taking the time to tell me

  • @ryuuguu01
    @ryuuguu01 4 місяці тому

    An Edward Teller paper. 😀

  • @API-Beast
    @API-Beast 5 місяців тому

    I don't understand, if this algorithm needs to already know the probability distribution to work then you don't really gain any new information from it. All it does is create fake data?

    • @very-normal
      @very-normal  5 місяців тому +1

      Even if you know the form of the distribution, it doesn’t mean you can derive anything useful from it, like an expectation or a credible interval. In easy problems you can do this straight from the PDF, but with harder ones you can’t do this. With MCMC, you generate a sample that you know comes from it, and then use it to calculate said expectations. It’s not “fake” data

  • @Trizzer89
    @Trizzer89 5 місяців тому

    I just wish you didnt use so much mathematical notation and used normal words. I majored in math and I still dont pick up the meaning that quickly. I dont find it difficult at all to translate concepts to normal language, but maybe I'm unusually gifted at it

    • @very-normal
      @very-normal  5 місяців тому

      you’d probably be better than me tbh, you should use your talent with SoME! We need more good math content in yt

  • @davidjohnston4240
    @davidjohnston4240 5 місяців тому

    I'm a local in Portland. I nevee go hiking. I drink a lot of coffee though.

    • @very-normal
      @very-normal  5 місяців тому

      do you have a few go-tos you like?

    • @davidjohnston4240
      @davidjohnston4240 5 місяців тому

      ​@@very-normalSyun in Hillsboro to the West. Best sushi outside of Japan. Powell's book store. Le Pigeon on Burnside for slightly fancy food. Elephant deli for food stuff.

  • @addmoreice
    @addmoreice 5 місяців тому

    Am Portlander. Can Agree. Drink Coffee.

  • @TheEVEInspiration
    @TheEVEInspiration 4 місяці тому

    Don't ask the bartender!

  • @humdah
    @humdah 5 місяців тому

    Your metropolis algorithm had drift because the hyperprior creates a prior distribution that is overparameterized. Despite saying the prior needs a hyper prior, it in fact does not. If you really wanted one, it would have been better fix the value of alpha+beta, rather than placing a hyper prior on it.

    • @very-normal
      @very-normal  5 місяців тому

      it worked fine with Stan tho lol

    • @humdah
      @humdah 5 місяців тому

      did it tho lol

  • @mateusvmv
    @mateusvmv 5 місяців тому

    So you're saying I can roll a RNG that tells me to study 90% of the time and then I'll stop procrastinating?

  • @strayorion2031
    @strayorion2031 5 місяців тому

    15:37 so your are telling me you NUTTED your data?

    • @very-normal
      @very-normal  5 місяців тому +1

      that’s the technical term for it, yes

  • @RSLT
    @RSLT 5 місяців тому

    ❤❤❤❤❤

  • @MatthisDayer
    @MatthisDayer 5 місяців тому

    alpha and beta too big, try between 0.1 and 10

  • @HarryFortyTwo
    @HarryFortyTwo 5 місяців тому +1

    until the sample it was accesible - then you lost me; too many assumptions (seemingly not related to the actual metropolis algorithm) and too quick succession. a way simpler sample would be better for pedagogical reasons.

  • @Trizzer89
    @Trizzer89 5 місяців тому

    I dont like your system for finding the average likes. It will likely be dependent on how many views you have, or how many non-subscribers view. The model should be dependent on total views

    • @very-normal
      @very-normal  5 місяців тому

      lol no one will really care if a model with data from a statistics UA-cam channel is misspecified; there are lots of different ways to approach the modeling and I chose the one where I could use the data I had

  • @holacabeza
    @holacabeza 5 місяців тому +6

    Can you explain the Deez NUTS algo next? I'm trying to understand testicular torsion

  • @mygamesm
    @mygamesm 5 місяців тому

    63

  • @ozlemelih
    @ozlemelih 5 місяців тому

    Lots of drugs in Portland

  • @ucngominh3354
    @ucngominh3354 5 місяців тому

    hi

  • @jimf2525
    @jimf2525 5 місяців тому +9

    I felt overwhelmed by this video. Too much information coming too quickly. I guess I’m not your target audience. By the end of it, I was wondering how it could be used in a real world example.

    • @very-normal
      @very-normal  5 місяців тому +5

      Sorry about that, the algorithm was also relatively new to me so I didn’t have the time to refine it further. I appreciate you gave some of your time to it tho
      You could get a more concrete real life example by replacing the videos in my examples with different groups of people who are taking a drug to cure a disease. What we’d be interested in is seeing how well the drug works overall. This type of model suits a Bayesian analysis, but it wouldn’t be possible to do it before algorithm like Metropolis.

    • @nabibunbillah1839
      @nabibunbillah1839 5 місяців тому

      I like the feeling of drowning with information not understanding most of it like a masochistic tendency.

    • @gunaysoni6792
      @gunaysoni6792 5 місяців тому

      ​@@very-normalI think you should link people to a video on bayesian statistics at the start of such a video

    • @hypebeastuchiha9229
      @hypebeastuchiha9229 4 місяці тому

      Good there’s a million channels full of pop science already
      And if you genuinely had trouble with this video then pop science really has deluded the every man into thinking he’s actually intelligent

    • @jimf2525
      @jimf2525 4 місяці тому

      @@hypebeastuchiha9229 I feel that you seem to be spending energy to be condescending. Did I miss something? I think you might want to ask yourself, ‘how many friends do I have?’ Then maybe read a book on the subject. Feel free to get the last word in.

  • @djangoworldwide7925
    @djangoworldwide7925 5 місяців тому

    Props do t makes sense, more like 80% doing nothing 😊

  • @ulamss5
    @ulamss5 4 місяці тому

    far too many (hyper-)parameters to work IRL. overfitting nightmare.

    • @very-normal
      @very-normal  4 місяці тому

      lol how many parameters would you recommend

  • @ianglenn2821
    @ianglenn2821 5 місяців тому

    I did like and subscribe, but you were talking about it, so you should remove this video from future analysis and meta-analysis, but definitely include it for meta-meta-analysis

    • @very-normal
      @very-normal  5 місяців тому

      will definitely consider it for the metameta-analysis