Chris Fonnesbeck - Probabilistic Python: An Introduction to Bayesian Modeling with PyMC

Поділитися
Вставка
  • Опубліковано 9 січ 2025

КОМЕНТАРІ •

  • @wolpumba4099
    @wolpumba4099 9 місяців тому +2

    *Abstract*
    This tutorial provides an introduction to Bayesian modeling with PyMC,
    a probabilistic programming library in Python. It covers the
    fundamental concepts of Bayesian statistics, including prior
    distributions, likelihood functions, and posterior distributions. The
    tutorial also explains Markov Chain Monte Carlo (MCMC) methods,
    specifically the No U-Turn Sampler (NUTS), used to approximate
    posterior distributions. Additionally, it emphasizes the importance of
    model checking and demonstrates techniques for assessing convergence
    and goodness of fit. The tutorial concludes with examples of building
    and analyzing models in PyMC, including predicting the outcomes of
    sporting events.
    *Summary*
    *Introduction (**0:03**)*
    - This tutorial is intended for data scientists and analysts interested in applying Bayesian statistics and probabilistic programming.
    - No prior knowledge of statistics, machine learning, or Python is assumed.
    - The tutorial provides a high-level overview of Bayesian statistics, probabilistic programming, and PyMC.
    *Probabilistic Programming (**1:24**)*
    - Probabilistic programming involves writing programs with outputs partially determined by random numbers.
    - It allows for specifying statistical models using stochastic language primitives like probability distributions.
    - The main purpose of probabilistic programming is to facilitate Bayesian inference.
    *What is Bayes? (**3:30**)*
    - Bayesian statistics uses probability models to make inferences from data about unknown quantities.
    - It involves updating prior beliefs based on observed data to obtain posterior distributions.
    - Bayes' formula is the foundation of Bayesian inference.
    *Why Bayes? (**4:39**)*
    - Bayesian inference is attractive due to its utility and conceptual simplicity.
    - It allows for incorporating prior knowledge and quantifying uncertainty in estimates and predictions.
    *Prior distribution (**6:51**)*
    - Prior distributions quantify uncertainty in unknown variables before observing data.
    - Uninformative priors can be used when little is known beforehand.
    - Informative priors can be based on domain knowledge or previous data.
    *Likelihood function (**8:13**)*
    - The likelihood function describes how the data relates to the model.
    - It is a probability distribution conditioned on the model and observed data.
    - Different likelihood functions are appropriate for different types of data (e.g., normal, binomial, Poisson).
    *Infer values for latent variables (**9:34**)*
    - Bayesian inference combines prior and likelihood information to generate the posterior distribution.
    - The posterior distribution represents updated knowledge about unknown variables after observing data.
    - Calculating the posterior distribution often requires numerical methods due to the complexity of integration.
    *Probabilistic programming in Python (**16:48**)*
    - Several probabilistic programming libraries are available in Python, including PyMC, Stan, Pyro, and TensorFlow Probability.
    - PyMC is specifically designed for fitting Bayesian statistical models using MCMC methods.
    *PyMC and its features (**17:29**)*
    - PyMC provides various features for Bayesian modeling, including:
    - Built-in statistical distributions
    - Tools for output analysis and plotting
    - Extensibility for custom distributions and algorithms
    - GPU support and different computational backends
    *Example: Building models in PyMC (**22:30**)*
    - The tutorial demonstrates building a changepoint model in PyMC to analyze baseball spin rate data.
    - The model estimates the changepoint and mean spin rates before and after the sticky stuff crackdown.
    - The example showcases specifying stochastic and deterministic variables, priors, likelihoods, and running MCMC sampling.
    *Markov Chain Monte Carlo and Bayesian approximation (**41:47**)*
    - MCMC methods are used to approximate posterior distributions by simulating a Markov chain.
    - The simulated chain converges to the posterior distribution as its stationary distribution.
    - Metropolis sampling and Hamiltonian Monte Carlo (HMC) are two MCMC algorithms.
    *Hamiltonian Monte Carlo (**48:03**)*
    - HMC uses gradient information to efficiently explore the posterior distribution.
    - It simulates a physical analogy of a particle moving through the parameter space.
    - The No U-Turn Sampler (NUTS) is an improved HMC algorithm that automatically tunes parameters.
    *Example: Markov Chain Monte Carlo in PyMC (**53:06**)*
    - The tutorial demonstrates fitting a model for predicting rugby scores using MCMC in PyMC.
    - The example showcases specifying priors, likelihoods, running MCMC sampling, and analyzing the results.
    *Model checking (**1:11:58**)*
    - Model checking is crucial to ensure the validity of the fitted model.
    - It involves assessing convergence diagnostics and goodness of fit.
    *Convergence diagnostics (**1:12:10**)*
    - Convergence diagnostics verify whether the MCMC algorithm has effectively explored the posterior distribution.
    - Techniques include visually inspecting trace plots, checking for divergences, analyzing energy plots, and calculating potential scale reduction (R-hat) statistics.
    *Goodness of fit (**1:17:58**)*
    - Goodness of fit assesses how well the model fits the observed data.
    - The posterior predictive distribution is used to compare model predictions with the data.
    - Visualizations like cumulative distribution plots can help evaluate goodness of fit.
    *Making predictions (**1:20:43**)*
    - PyMC allows for making predictions with fitted models by updating the data and sampling from the posterior predictive distribution.
    - The tutorial demonstrates predicting the outcome of a rugby match between Wales and England.
    *Conclusion*
    - The tutorial concludes by encouraging further exploration of Bayesian modeling with PyMC and suggesting additional resources.
    disclaimer: i used gemini 1.5 pro to summarize the youtube transcript.

  • @iliya-malecki
    @iliya-malecki 2 роки тому +11

    lovely video, after watching it a third time and trying to build some models of my own i finally start understanding it :)

  • @donnymcjonny6531
    @donnymcjonny6531 Рік тому +2

    Wow. I'm literally looking up PyMC3 because I'm writing a paper on Bayesian analysis of pitcher performance as a game progresses. Turns out this dude is doing the same thing.

  • @CristianHeredia0
    @CristianHeredia0 2 роки тому

    It's always great to see a video on pymc

  • @tobiasmuenchow9884
    @tobiasmuenchow9884 Рік тому

    @39:00 it takes like > 250 mins on my VSC and 50 mins in my jupyterhub. Is there any reason for that?

  • @tas47
    @tas47 2 роки тому +7

    This is great video with good practical application. Is it possible to have access to the notebooks?

    • @SamuelMindel
      @SamuelMindel 8 місяців тому

      not sure if any of these are in the gallery, but I would like to see these as well

  • @tobiasmuenchow9884
    @tobiasmuenchow9884 Рік тому

    I really enjoyed the Video. Helped me a lot. Is there any document for the presentation?

  • @teshex
    @teshex Рік тому

    Can we use PyMC to estimate DSGE models with Bayesian technique?

  • @BillTubbs
    @BillTubbs Рік тому

    When I ran the sticky baseball example it took 20 mins on my computer! Anyone know what could be wrong? He said it should take seconds.

  • @JosephKings-j9f
    @JosephKings-j9f 8 місяців тому

    is it called the back-end because its where the shit goes?

  • @musiknation7218
    @musiknation7218 Рік тому

    Can i get a book 📖 n Bayesian in python

  • @JosephKings-j9f
    @JosephKings-j9f 8 місяців тому

    sticky stuff is interest rates

  • @pablolecce6931
    @pablolecce6931 2 роки тому

    SOmebody noticed that Mr. Bayes have a very similar face to the speaker?

  • @haditime1665
    @haditime1665 2 роки тому +1

    I like Chris and has been watching his videos for couple of years. But his answer at ua-cam.com/video/911d4A1U0BE/v-deo.html is not correct. The difference between those languages(libraries) are not based high level vs low level. The different philosophy. PyMC and Stan are a FOPPL(first order probabilistic programming language) which means they are limiting themselves to finite random variable cardinality languages while Pyro or PyProb are HOPPL(Higher order probabilistic programming language) which are unbounded random variable cardinality
    languages. Not that you should care at this level :D but it is what it is