*Abstract* This tutorial provides an introduction to Bayesian modeling with PyMC, a probabilistic programming library in Python. It covers the fundamental concepts of Bayesian statistics, including prior distributions, likelihood functions, and posterior distributions. The tutorial also explains Markov Chain Monte Carlo (MCMC) methods, specifically the No U-Turn Sampler (NUTS), used to approximate posterior distributions. Additionally, it emphasizes the importance of model checking and demonstrates techniques for assessing convergence and goodness of fit. The tutorial concludes with examples of building and analyzing models in PyMC, including predicting the outcomes of sporting events. *Summary* *Introduction (**0:03**)* - This tutorial is intended for data scientists and analysts interested in applying Bayesian statistics and probabilistic programming. - No prior knowledge of statistics, machine learning, or Python is assumed. - The tutorial provides a high-level overview of Bayesian statistics, probabilistic programming, and PyMC. *Probabilistic Programming (**1:24**)* - Probabilistic programming involves writing programs with outputs partially determined by random numbers. - It allows for specifying statistical models using stochastic language primitives like probability distributions. - The main purpose of probabilistic programming is to facilitate Bayesian inference. *What is Bayes? (**3:30**)* - Bayesian statistics uses probability models to make inferences from data about unknown quantities. - It involves updating prior beliefs based on observed data to obtain posterior distributions. - Bayes' formula is the foundation of Bayesian inference. *Why Bayes? (**4:39**)* - Bayesian inference is attractive due to its utility and conceptual simplicity. - It allows for incorporating prior knowledge and quantifying uncertainty in estimates and predictions. *Prior distribution (**6:51**)* - Prior distributions quantify uncertainty in unknown variables before observing data. - Uninformative priors can be used when little is known beforehand. - Informative priors can be based on domain knowledge or previous data. *Likelihood function (**8:13**)* - The likelihood function describes how the data relates to the model. - It is a probability distribution conditioned on the model and observed data. - Different likelihood functions are appropriate for different types of data (e.g., normal, binomial, Poisson). *Infer values for latent variables (**9:34**)* - Bayesian inference combines prior and likelihood information to generate the posterior distribution. - The posterior distribution represents updated knowledge about unknown variables after observing data. - Calculating the posterior distribution often requires numerical methods due to the complexity of integration. *Probabilistic programming in Python (**16:48**)* - Several probabilistic programming libraries are available in Python, including PyMC, Stan, Pyro, and TensorFlow Probability. - PyMC is specifically designed for fitting Bayesian statistical models using MCMC methods. *PyMC and its features (**17:29**)* - PyMC provides various features for Bayesian modeling, including: - Built-in statistical distributions - Tools for output analysis and plotting - Extensibility for custom distributions and algorithms - GPU support and different computational backends *Example: Building models in PyMC (**22:30**)* - The tutorial demonstrates building a changepoint model in PyMC to analyze baseball spin rate data. - The model estimates the changepoint and mean spin rates before and after the sticky stuff crackdown. - The example showcases specifying stochastic and deterministic variables, priors, likelihoods, and running MCMC sampling. *Markov Chain Monte Carlo and Bayesian approximation (**41:47**)* - MCMC methods are used to approximate posterior distributions by simulating a Markov chain. - The simulated chain converges to the posterior distribution as its stationary distribution. - Metropolis sampling and Hamiltonian Monte Carlo (HMC) are two MCMC algorithms. *Hamiltonian Monte Carlo (**48:03**)* - HMC uses gradient information to efficiently explore the posterior distribution. - It simulates a physical analogy of a particle moving through the parameter space. - The No U-Turn Sampler (NUTS) is an improved HMC algorithm that automatically tunes parameters. *Example: Markov Chain Monte Carlo in PyMC (**53:06**)* - The tutorial demonstrates fitting a model for predicting rugby scores using MCMC in PyMC. - The example showcases specifying priors, likelihoods, running MCMC sampling, and analyzing the results. *Model checking (**1:11:58**)* - Model checking is crucial to ensure the validity of the fitted model. - It involves assessing convergence diagnostics and goodness of fit. *Convergence diagnostics (**1:12:10**)* - Convergence diagnostics verify whether the MCMC algorithm has effectively explored the posterior distribution. - Techniques include visually inspecting trace plots, checking for divergences, analyzing energy plots, and calculating potential scale reduction (R-hat) statistics. *Goodness of fit (**1:17:58**)* - Goodness of fit assesses how well the model fits the observed data. - The posterior predictive distribution is used to compare model predictions with the data. - Visualizations like cumulative distribution plots can help evaluate goodness of fit. *Making predictions (**1:20:43**)* - PyMC allows for making predictions with fitted models by updating the data and sampling from the posterior predictive distribution. - The tutorial demonstrates predicting the outcome of a rugby match between Wales and England. *Conclusion* - The tutorial concludes by encouraging further exploration of Bayesian modeling with PyMC and suggesting additional resources. disclaimer: i used gemini 1.5 pro to summarize the youtube transcript.
Wow. I'm literally looking up PyMC3 because I'm writing a paper on Bayesian analysis of pitcher performance as a game progresses. Turns out this dude is doing the same thing.
I like Chris and has been watching his videos for couple of years. But his answer at ua-cam.com/video/911d4A1U0BE/v-deo.html is not correct. The difference between those languages(libraries) are not based high level vs low level. The different philosophy. PyMC and Stan are a FOPPL(first order probabilistic programming language) which means they are limiting themselves to finite random variable cardinality languages while Pyro or PyProb are HOPPL(Higher order probabilistic programming language) which are unbounded random variable cardinality languages. Not that you should care at this level :D but it is what it is
*Abstract*
This tutorial provides an introduction to Bayesian modeling with PyMC,
a probabilistic programming library in Python. It covers the
fundamental concepts of Bayesian statistics, including prior
distributions, likelihood functions, and posterior distributions. The
tutorial also explains Markov Chain Monte Carlo (MCMC) methods,
specifically the No U-Turn Sampler (NUTS), used to approximate
posterior distributions. Additionally, it emphasizes the importance of
model checking and demonstrates techniques for assessing convergence
and goodness of fit. The tutorial concludes with examples of building
and analyzing models in PyMC, including predicting the outcomes of
sporting events.
*Summary*
*Introduction (**0:03**)*
- This tutorial is intended for data scientists and analysts interested in applying Bayesian statistics and probabilistic programming.
- No prior knowledge of statistics, machine learning, or Python is assumed.
- The tutorial provides a high-level overview of Bayesian statistics, probabilistic programming, and PyMC.
*Probabilistic Programming (**1:24**)*
- Probabilistic programming involves writing programs with outputs partially determined by random numbers.
- It allows for specifying statistical models using stochastic language primitives like probability distributions.
- The main purpose of probabilistic programming is to facilitate Bayesian inference.
*What is Bayes? (**3:30**)*
- Bayesian statistics uses probability models to make inferences from data about unknown quantities.
- It involves updating prior beliefs based on observed data to obtain posterior distributions.
- Bayes' formula is the foundation of Bayesian inference.
*Why Bayes? (**4:39**)*
- Bayesian inference is attractive due to its utility and conceptual simplicity.
- It allows for incorporating prior knowledge and quantifying uncertainty in estimates and predictions.
*Prior distribution (**6:51**)*
- Prior distributions quantify uncertainty in unknown variables before observing data.
- Uninformative priors can be used when little is known beforehand.
- Informative priors can be based on domain knowledge or previous data.
*Likelihood function (**8:13**)*
- The likelihood function describes how the data relates to the model.
- It is a probability distribution conditioned on the model and observed data.
- Different likelihood functions are appropriate for different types of data (e.g., normal, binomial, Poisson).
*Infer values for latent variables (**9:34**)*
- Bayesian inference combines prior and likelihood information to generate the posterior distribution.
- The posterior distribution represents updated knowledge about unknown variables after observing data.
- Calculating the posterior distribution often requires numerical methods due to the complexity of integration.
*Probabilistic programming in Python (**16:48**)*
- Several probabilistic programming libraries are available in Python, including PyMC, Stan, Pyro, and TensorFlow Probability.
- PyMC is specifically designed for fitting Bayesian statistical models using MCMC methods.
*PyMC and its features (**17:29**)*
- PyMC provides various features for Bayesian modeling, including:
- Built-in statistical distributions
- Tools for output analysis and plotting
- Extensibility for custom distributions and algorithms
- GPU support and different computational backends
*Example: Building models in PyMC (**22:30**)*
- The tutorial demonstrates building a changepoint model in PyMC to analyze baseball spin rate data.
- The model estimates the changepoint and mean spin rates before and after the sticky stuff crackdown.
- The example showcases specifying stochastic and deterministic variables, priors, likelihoods, and running MCMC sampling.
*Markov Chain Monte Carlo and Bayesian approximation (**41:47**)*
- MCMC methods are used to approximate posterior distributions by simulating a Markov chain.
- The simulated chain converges to the posterior distribution as its stationary distribution.
- Metropolis sampling and Hamiltonian Monte Carlo (HMC) are two MCMC algorithms.
*Hamiltonian Monte Carlo (**48:03**)*
- HMC uses gradient information to efficiently explore the posterior distribution.
- It simulates a physical analogy of a particle moving through the parameter space.
- The No U-Turn Sampler (NUTS) is an improved HMC algorithm that automatically tunes parameters.
*Example: Markov Chain Monte Carlo in PyMC (**53:06**)*
- The tutorial demonstrates fitting a model for predicting rugby scores using MCMC in PyMC.
- The example showcases specifying priors, likelihoods, running MCMC sampling, and analyzing the results.
*Model checking (**1:11:58**)*
- Model checking is crucial to ensure the validity of the fitted model.
- It involves assessing convergence diagnostics and goodness of fit.
*Convergence diagnostics (**1:12:10**)*
- Convergence diagnostics verify whether the MCMC algorithm has effectively explored the posterior distribution.
- Techniques include visually inspecting trace plots, checking for divergences, analyzing energy plots, and calculating potential scale reduction (R-hat) statistics.
*Goodness of fit (**1:17:58**)*
- Goodness of fit assesses how well the model fits the observed data.
- The posterior predictive distribution is used to compare model predictions with the data.
- Visualizations like cumulative distribution plots can help evaluate goodness of fit.
*Making predictions (**1:20:43**)*
- PyMC allows for making predictions with fitted models by updating the data and sampling from the posterior predictive distribution.
- The tutorial demonstrates predicting the outcome of a rugby match between Wales and England.
*Conclusion*
- The tutorial concludes by encouraging further exploration of Bayesian modeling with PyMC and suggesting additional resources.
disclaimer: i used gemini 1.5 pro to summarize the youtube transcript.
lovely video, after watching it a third time and trying to build some models of my own i finally start understanding it :)
Wow. I'm literally looking up PyMC3 because I'm writing a paper on Bayesian analysis of pitcher performance as a game progresses. Turns out this dude is doing the same thing.
It's always great to see a video on pymc
@39:00 it takes like > 250 mins on my VSC and 50 mins in my jupyterhub. Is there any reason for that?
This is great video with good practical application. Is it possible to have access to the notebooks?
not sure if any of these are in the gallery, but I would like to see these as well
I really enjoyed the Video. Helped me a lot. Is there any document for the presentation?
Can we use PyMC to estimate DSGE models with Bayesian technique?
When I ran the sticky baseball example it took 20 mins on my computer! Anyone know what could be wrong? He said it should take seconds.
is it called the back-end because its where the shit goes?
Can i get a book 📖 n Bayesian in python
sticky stuff is interest rates
SOmebody noticed that Mr. Bayes have a very similar face to the speaker?
I like Chris and has been watching his videos for couple of years. But his answer at ua-cam.com/video/911d4A1U0BE/v-deo.html is not correct. The difference between those languages(libraries) are not based high level vs low level. The different philosophy. PyMC and Stan are a FOPPL(first order probabilistic programming language) which means they are limiting themselves to finite random variable cardinality languages while Pyro or PyProb are HOPPL(Higher order probabilistic programming language) which are unbounded random variable cardinality
languages. Not that you should care at this level :D but it is what it is
Can yo suggest a book on Bayesian using python
@@musiknation7218 Bayesian Analysis in Python