Tech talk: A practical introduction to Bayesian hierarchical modelling
Вставка
- Опубліковано 28 чер 2024
- When the data that you’re modelling naturally splits into sectors - like countries, branches of a store, or different hospitals within a region - it’s difficult to decide whether you should model jointly or separately. Modelling jointly takes advantage of all the data available, but ignores the subtleties that distinguish each individual sector. On the other hand, fitting a separate model to each sector is prone to overfitting, especially where there’s limited data.
However, an approach called hierarchical modelling can combine the best of both approaches, allowing us to take advantage of the overarching information across sectors without giving up on their distinctive features.
In this talk, Faculty Data Scientist Omar Sosa will provide an introduction to the approach, focusing heavily on its practical side. He’ll cover:
- When hierarchical modelling can be used.
- How to implement hierarchical modelling.
- The limitations of using this approach.
- What can we learn by implementing hierarchical modelling.
You’ll need some familiarity with Bayesian inference to get the most value from this talk. - Наука та технологія
hands down the best explanation for hierarchical modelling on youtube, especially with the graph visualising the effect of pooling strength on the parameter values
Probably the clearest explanation of hierarchical models I’ve ever seen. Great video!
Very nice, thanks a lot for the talk. Very well explained and easy to follow!
Thank you for this well-paced video with its explanations. I feel much more confident in my understanding of Bayesian hierarchical modelling.
Excellent explanation...Thank you!
50:20 Make sure you know when to use variational inference instead of MCMC. (usually when working with large datasets)
One of the few well explained examples
great one, currently looking for overlap between same treatment used for different diseases, this one looks very helpful to approach for synthesize
Very good explanations . Seen a lot of videos but your presentation is very understandable . Thanks
Thanks Omar. Clear and helpful.
Thanks for clear explanation. Very helpful.
Perhaps the best explanation. Thanks a hell lot
Thanks for the informative video. In the end model in which uncertainty for different counties is very similar, isn´t the model understating this uncertainty for the cases with just 1 or 2 datapoints? Could you elaborate? Thanks!
Amazing lecture!
this is really well explained.
Great video, thanks !
Under Partial Pooling, why does sigma_a represent the degree of pooling?
Omar, it is not clear to me why sigma controls the amount of pooling. Could you point me into some sources to learn more about this? I enjoyed your presentation. Thanks.
Hi Noé, maybe I can have a go at explaining. When doing the hierarchical modelling, we suppose that the parameters for each group themselves come from a distribution. If we assume that this distribution has zero variance, we are saying that all of the group-level parameters must be the same - they must be equal to the mean. This is because there is literally zero variance. This is the same as pooling all the data (the first example). On the other hand, if we assume the variance is very large, then each group parameter has the freedom to choose any value it wants, without penalisation from the group model-parameter distribution. This is the same as having no pooling - each group has its own parameter. We can choose sigma between these two extremes to specify how closely linked the group parameter should be.
Thank you and I hope that helped!
Hi @@williamchurcher9645. Correct me if I'm wrong: given the prior distribution of alpha_i is assumed as Normal(mu_alpha, sigma_alpha), if sigma_alpha = 0, all the alpha_i may not be (and can not be) equal because the mu_alpha is not a fixed number but follows a distribution Normal(0, 5). Put that in a formula to be clearer:
alpha_i ~ Normal(mu_alpha, sigma_alpha)
alpha_i ~ Normal(mu_alpha, 0)
alpha_i ~ Normal(Normal(0, 5), 0)
alpha_i ~ Normal(0, 5)
=> alpha_i is not a constant but a distribution
Following that understanding, the answer for why the sigma_alpha represents the degree of pooling is still vague.