ERRATA (aka my brain on editing): 5:58: Bayes' Rule is incorrect here, the numerator should contain the likelihood P(X | theta) instead, but I have the condition flipped.
Wow, I love this. I hope a thousand more channels about statistics pop up that are more informal but still accurate. Statistics is just as beautiful as most other fields of study but few people notice it because it feels so austere at first glance.
Indeed. I always think of statistic as a toolbox for research and thinking. It may not be super exciting in itself, but if it performs an interesting job it is beautiful and satisfying. Also some details in statistics are actually fun and mind-boggling. Another thing, similar to mathematics, is, that statistics is often sold as niche. But it is everywhere and in evreything. Evolution? Statistics. Transportation? Statistics. Social justice? Statistics. Personality? Statistics. I research statistics in psychology and I find it tragic that the link between maths, statistics, and all the so called "soft" topics is not taught more in school. Almost regardless of what you want to achieve, maths and statistics are your tools - or, if you really endorse them: friends.
Love this video, it's fantastic. When you show Bayes Theorem, you wrote P(theta | X) in the numerator when it should be P(X | theta). As a self-proclaimed bayesian, I love that you including bayesian statistics in this video!
@@awadafuk4863 dtplyr is interesting but in addition to the speed advantage the main reason I love data.table is the syntax, especially since I use SQL a lot too. the dtplyr docs also explicitly state that there are some data.table functionalities that have no direct translation in dtplyr
Just starting to learn more about this in depth for my psychology degree and this was the longest I have sat still for a video thank you ! Keep up the good work.
At this point, i feel like machine learning can be considered one of the biggest improvements in "statistics" (especially for forecasting) even though it is not directly involved with statistics. If i wanted to forecast something, i would be hard pressed to create a statistical model that could outperform a machine learning model.
Yeah I agree. Machine learning is mentioned a little more in the article than I’ve let on, but it’s enough out of my comfort zone that I would need to read up more on recent research
@The Stopper i can see your notification but can't see your comment, not sure if youtube blocked your comment or something. They have a horrible censorship filter.
I think that it's implied that all those important ideas of statistics are important also largely by their influence on machine learning. Goeffrey Hinton's major step in the 90s was to link neural nets with statistics and import all statistical methods into machine learning.
Machine learning is only statistics. Statistics and linear algebra form the entire basis of the machine learning field. All machine learning ends up being various forms of transformed linear projections from higher dimensional spaces mapped with their statistically most likely results given the data it trained on to lower spaces. The trouble is how machine learning likes to sweep the subspaces where there is a lack of data under the rug. They assume that all inputs will resemble typical speech for ChatGPT. When you give it unexpected speech input like extreme repetition which it hasn't been trained for, it breaks down. It has no generalized intelligence or heuristics to deal with or even evaluate outside of the box issues. If you have ever been stuck on a voice response line where you have a question that isn't among the preprogrammed voice response line, what do you do when you have a non-emergency tech support issue and their options are only? "Press 1 for emergency or say emergency, 2 for change of address or say mailing, 3 for billing or say bill 9 to repeat or say repeat to repeat the option" Many of the AI systems will just hang up if it cant fit you into their categories one too many times. We keep applying machine learning to commonplace scenarios to make things more efficient but forget that other uncommon configurations happen. AI is poorly suited for the uncommon. When faced with these and without adequate handling of poorly covered data space, the responses can be wildly unpredictable, just as the statistics are for predictions in the sparse and extrapolated areas of datasets. At least in statistics you are given a margin of error for how badly the model is handling your input, thus far ChatGPT, etc have no such indicators or guardrails. AI is artificial not intelligent. It will (just like statistics) bake in the bias that generated the data that it trained on. Black patients have lower hospitalization rates and healthcare utilization in commercial claims datasets overall and are less well represented in medical data. When Optum (United Healthcare) then uses those models to recommend whether to approve continued coverage for hospital stays and the like, it underestimated the need of sicker Black patients when compared to some white patients. That's not due to anything but how they optimized their model and its a real problem www.theguardian.com/society/2019/oct/25/healthcare-algorithm-racial-biases-optum When Tesla cars on autopilot see emergency vehicles with flashing lights at night more than once they have driven into the emergency responders on the scene of an existing accident. They also weren't adequately trained with black people in their AI images models and don't reliably stop for black pedestrians. We need to be more careful with these tools
This is great! Could you do a video on "degree of freedom". I feel that NONE of the textbooks on the market is able to explain this idea clearly intuitively, or mathematically, or numerically.
High quality video! Usually these videos just reinforce the stereotypical view of statistics "you must learn to understand linear algebra, MSE and matrices"
You make very informative & easy to digest videos! I have a small suggestion. At times, the background music becomes too loud or too complex. You might want to change that for your future works. I hope you get big as I would love to see more from you!
@andrewnguyen3312 Because he didn't trust statistics. Since I have this background, I know how to distort such data. For example, a survey was taken on a Presudential election. It took place in the state one of the candidates lived in. The data said that he would win the Presidency. However there are 50 states. The other candidate won in 48 of the states. See the problem.
@Evan490BC I have a background in statistics. How do you know that there is a big or small statistic? A statistic can cause all kinds of problems if used the wrong way.
Thank you very much for creating this video. At 6:25 you discuss overparameterization. The Akaike information criterion (AIC) provides guidance on model order. Do you use it? What benefits and drawbacks have you found?
I have used it in the past to perform model selection to figure out a set of confounders for a linear model before, but not much else that I can remember. I usually think just of it as just a tool for model selection, and not much past that
Thank you very much for sharing your insights and wisdoms filled videos !! Best scientific channel on UA-cam for a while !! Outstanding !! Greetings from California … I wish you and folks good health , success and happiness !! Much Love ✌️😎💕
🎯 Key Takeaways for quick navigation: 00:00 🔍 Statistics evolves with influential ideas that change its trajectory. 00:13 🎓 As a biostatistics student, it's crucial to know revolutionary statistical ideas. 00:28 👥 The video aims to make complex statistical innovations accessible to a general audience. 00:43 🏆 Gelman and Vitari's article in JASA highlights important statistical ideas from the past 50 years. 01:12 🧠 Gelman and Vitari are renowned for their contributions to Bayesian statistics. 01:52 📆 The article by Gelman and Vitari focuses on statistical innovations from 1970 to 2021, defining "Modern statistics." 02:07 📊 Observational data, unlike experimental data, does not readily allow for causal claims, only correlational ones. 02:33 🔄 Counterfactual causal inference provides a framework for making causal inferences from observational data. 03:58 📈 The counterfactual framework formalizes causal effects in mathematical models, beneficial in economics and psychology. 04:13 💻 The bootstrap is a significant statistical tool for estimating the sample distribution of a statistic through resampling. 05:22 🤖 Computational power has allowed statisticians to perform extensive simulations, aiding in experimental model assessments. 06:18 🔍 Simulations in Bayesian statistics enable prior and posterior predictive checks, validating statistical models. 07:01 🧮 Increasing the number of parameters in a model can make it better represent complex real-world phenomena. 08:11 🌐 Overparameterized models, like neural networks, can approximate a wide array of functions due to their flexibility. 08:39 ⚖️ Regularization techniques in statistics help manage the complexity of highly flexible models. 08:55 📈 Multi-level models introduce additional structure to parameters, aiding in the analysis of complex clustered data. 09:50 🧩 Gelman advocates for multi-level models as a means to synthesize different information sources into a unified analysis. 10:05 📊 Multi-level models are important for their flexibility and the ability to incorporate prior knowledge, beneficial for estimating treatment effects with small samples. 10:46 💻 The advancement of computers and computational power has been crucial in developing complex statistical models and solving difficult problems. 11:16 🧮 The significance of statistical algorithms is highlighted by their variety and utility in solving diverse statistical problems. 11:29 🔄 The EM algorithm is noted for solving estimation problems involving parameters in models that can't be solved directly, like mixture models with latent classes. 12:25 📈 The Metropolis algorithm allows sampling from complex probability distributions, which is essential for dealing with difficult posterior distributions in Bayesian statistics. 13:21 🔬 Adaptive decision analysis in statistics has enabled the adaptation of experiments in real-time, improving their design and potentially stopping them early based on preliminary results. 14:31 🛡️ Robust inference provides trustworthy statistical analyses even when traditional assumptions are violated, highlighting the use of the sample median as a robust estimator. 15:26 🔗 Propensity score matching in causal inference is used to match treatment and control groups for more accurate causal effect estimation, with robust versions available to handle model misspecification. 16:09 📉 Visuals and plots are emphasized as crucial tools for examining data and assessing statistical models, marking an important skill set for statisticians and data scientists. 16:51 🧩 The tidyverse in R programming is celebrated for making data cleaning and visualization much easier, advocating for its use in exploratory data analysis. 17:05 🤔 The importance of statistical ideas is not measured by citation counts but by their influence on statistical practice and development of new methodologies. Made with HARPA AI
Most of these ideas seem to have arisen from the analysis of very large data sets (and the issues inherent in combining sets of data) and the availability of computing power and the increased tendency to model issues.
As a prestigious graduate of business statistics I am proud to say I was effectively taught how to pronounce "statistics" and how to ignore Bayes Theorem.
I suspect this might be a fascinating video, but it seems there's a heart-beat-paced thumping in the audio that's causing me a panic attack. Didn't make it to 3 minutes. Love to have access to a version with clean audio; statistics doesn't get enough love.
Sorry about that! I’m still learning how I should balance my audio, there was actually a version without music, which you can find here: The most important ideas in modern statistics (no music) ua-cam.com/video/NkkKF3JaTTY/v-deo.html
Wow. Your explanations are nearly impenetrable for normal humans. I was really hoping to gain some insight here, but I gave up on your video after 4 minutes of, as you called it, ado.
ERRATA (aka my brain on editing):
5:58: Bayes' Rule is incorrect here, the numerator should contain the likelihood P(X | theta) instead, but I have the condition flipped.
Again at 10:14. Every time I saw it I thought I was going crazy.
As Statistics student, your channel is a blessing. Thank you and keep your great work
Wow, I love this. I hope a thousand more channels about statistics pop up that are more informal but still accurate. Statistics is just as beautiful as most other fields of study but few people notice it because it feels so austere at first glance.
Linear regression is poetry for the resolute and painful humiliation for the dissolute.
Indeed. I always think of statistic as a toolbox for research and thinking. It may not be super exciting in itself, but if it performs an interesting job it is beautiful and satisfying. Also some details in statistics are actually fun and mind-boggling. Another thing, similar to mathematics, is, that statistics is often sold as niche. But it is everywhere and in evreything. Evolution? Statistics. Transportation? Statistics. Social justice? Statistics. Personality? Statistics. I research statistics in psychology and I find it tragic that the link between maths, statistics, and all the so called "soft" topics is not taught more in school. Almost regardless of what you want to achieve, maths and statistics are your tools - or, if you really endorse them: friends.
I always disliked statistics cuz of the concept itself lol… the math seems to be cool though when you can use it for something
@@I61void Statistics is literallly "using math for something". Not sure where you tried to draw a line there.
Love this video, it's fantastic. When you show Bayes Theorem, you wrote P(theta | X) in the numerator when it should be P(X | theta). As a self-proclaimed bayesian, I love that you including bayesian statistics in this video!
oh man, my advisor would kill me if she saw me do that. I've added it to my errata, thank you fellow Bayesian!
This is why I write P(H | D) where H is hypothesis and D is data. I've never made this mistake again.
I like Bayes and related stuff. Curious though what it means to be a 'bayesian'?
Tidyverse is indeed one of the main reasons for using R
We owe a lot to Hadley Wickham
tidyverse sucks. long live data.table
@@sereysothe.ano love for dtplyr?
@@awadafuk4863 dtplyr is interesting but in addition to the speed advantage the main reason I love data.table is the syntax, especially since I use SQL a lot too. the dtplyr docs also explicitly state that there are some data.table functionalities that have no direct translation in dtplyr
As a student of statistics your channel is a heaven ❤🎉.
Thank you so much for the videos.
Just starting to learn more about this in depth for my psychology degree and this was the longest I have sat still for a video thank you ! Keep up the good work.
At this point, i feel like machine learning can be considered one of the biggest improvements in "statistics" (especially for forecasting) even though it is not directly involved with statistics. If i wanted to forecast something, i would be hard pressed to create a statistical model that could outperform a machine learning model.
Yeah I agree. Machine learning is mentioned a little more in the article than I’ve let on, but it’s enough out of my comfort zone that I would need to read up more on recent research
@The Stopper i can see your notification but can't see your comment, not sure if youtube blocked your comment or something. They have a horrible censorship filter.
I think that it's implied that all those important ideas of statistics are important also largely by their influence on machine learning. Goeffrey Hinton's major step in the 90s was to link neural nets with statistics and import all statistical methods into machine learning.
I've always assumed machine learning is a subset of statistics. Statistics is an umbrella term for data analysis.
Machine learning is only statistics. Statistics and linear algebra form the entire basis of the machine learning field. All machine learning ends up being various forms of transformed linear projections from higher dimensional spaces mapped with their statistically most likely results given the data it trained on to lower spaces.
The trouble is how machine learning likes to sweep the subspaces where there is a lack of data under the rug. They assume that all inputs will resemble typical speech for ChatGPT. When you give it unexpected speech input like extreme repetition which it hasn't been trained for, it breaks down. It has no generalized intelligence or heuristics to deal with or even evaluate outside of the box issues.
If you have ever been stuck on a voice response line where you have a question that isn't among the preprogrammed voice response line, what do you do when you have a non-emergency tech support issue and their options are only? "Press 1 for emergency or say emergency, 2 for change of address or say mailing, 3 for billing or say bill 9 to repeat or say repeat to repeat the option" Many of the AI systems will just hang up if it cant fit you into their categories one too many times.
We keep applying machine learning to commonplace scenarios to make things more efficient but forget that other uncommon configurations happen. AI is poorly suited for the uncommon. When faced with these and without adequate handling of poorly covered data space, the responses can be wildly unpredictable, just as the statistics are for predictions in the sparse and extrapolated areas of datasets. At least in statistics you are given a margin of error for how badly the model is handling your input, thus far ChatGPT, etc have no such indicators or guardrails.
AI is artificial not intelligent. It will (just like statistics) bake in the bias that generated the data that it trained on. Black patients have lower hospitalization rates and healthcare utilization in commercial claims datasets overall and are less well represented in medical data. When Optum (United Healthcare) then uses those models to recommend whether to approve continued coverage for hospital stays and the like, it underestimated the need of sicker Black patients when compared to some white patients. That's not due to anything but how they optimized their model and its a real problem
www.theguardian.com/society/2019/oct/25/healthcare-algorithm-racial-biases-optum
When Tesla cars on autopilot see emergency vehicles with flashing lights at night more than once they have driven into the emergency responders on the scene of an existing accident. They also weren't adequately trained with black people in their AI images models and don't reliably stop for black pedestrians.
We need to be more careful with these tools
This is great! Could you do a video on "degree of freedom". I feel that NONE of the textbooks on the market is able to explain this idea clearly intuitively, or mathematically, or numerically.
I was JUST thinking about this, I’ll definitely take you up on this request
@@very-normal Thank you.
My computer science didn't thougt about deep statistic like this but i still learn it to improve my made model in ML
High quality video! Usually these videos just reinforce the stereotypical view of statistics "you must learn to understand linear algebra, MSE and matrices"
Oh shoot I'm at Columbia, I should see if I can find Gelman and have a chat with him
You make very informative & easy to digest videos! I have a small suggestion. At times, the background music becomes too loud or too complex. You might want to change that for your future works. I hope you get big as I would love to see more from you!
It always makes my day to see that you posted a new video!! Keep it up please!!!
It will be good to add an example with each concept.
As a mathematician, I love Mark Twain's comment:
There are lies, and there are damn lies, and then there is statistics!
Why would he say that.
@andrewnguyen3312 Because he didn't trust statistics. Since I have this background, I know how to distort such data. For example, a survey was taken on a Presudential election. It took place in the state one of the candidates lived in. The data said that he would win the Presidency. However there are 50 states. The other candidate won in 48 of the states. See the problem.
If it's statistics (small "s") then that's ok, if it's Statistics, though, then we have a problem...
@Evan490BC I have a background in statistics. How do you know that there is a big or small statistic? A statistic can cause all kinds of problems if used the wrong way.
@@johncipolletti5611 You didn't get the joke, did you?
Thank you very much for creating this video. At 6:25 you discuss overparameterization. The Akaike information criterion (AIC) provides guidance on model order. Do you use it? What benefits and drawbacks have you found?
I have used it in the past to perform model selection to figure out a set of confounders for a linear model before, but not much else that I can remember. I usually think just of it as just a tool for model selection, and not much past that
Thank you very much for sharing your insights and wisdoms filled videos !! Best scientific channel on UA-cam for a while !! Outstanding !!
Greetings from California … I wish you and folks good health , success and happiness !! Much Love ✌️😎💕
🎯 Key Takeaways for quick navigation:
00:00 🔍 Statistics evolves with influential ideas that change its trajectory.
00:13 🎓 As a biostatistics student, it's crucial to know revolutionary statistical ideas.
00:28 👥 The video aims to make complex statistical innovations accessible to a general audience.
00:43 🏆 Gelman and Vitari's article in JASA highlights important statistical ideas from the past 50 years.
01:12 🧠 Gelman and Vitari are renowned for their contributions to Bayesian statistics.
01:52 📆 The article by Gelman and Vitari focuses on statistical innovations from 1970 to 2021, defining "Modern statistics."
02:07 📊 Observational data, unlike experimental data, does not readily allow for causal claims, only correlational ones.
02:33 🔄 Counterfactual causal inference provides a framework for making causal inferences from observational data.
03:58 📈 The counterfactual framework formalizes causal effects in mathematical models, beneficial in economics and psychology.
04:13 💻 The bootstrap is a significant statistical tool for estimating the sample distribution of a statistic through resampling.
05:22 🤖 Computational power has allowed statisticians to perform extensive simulations, aiding in experimental model assessments.
06:18 🔍 Simulations in Bayesian statistics enable prior and posterior predictive checks, validating statistical models.
07:01 🧮 Increasing the number of parameters in a model can make it better represent complex real-world phenomena.
08:11 🌐 Overparameterized models, like neural networks, can approximate a wide array of functions due to their flexibility.
08:39 ⚖️ Regularization techniques in statistics help manage the complexity of highly flexible models.
08:55 📈 Multi-level models introduce additional structure to parameters, aiding in the analysis of complex clustered data.
09:50 🧩 Gelman advocates for multi-level models as a means to synthesize different information sources into a unified analysis.
10:05 📊 Multi-level models are important for their flexibility and the ability to incorporate prior knowledge, beneficial for estimating treatment effects with small samples.
10:46 💻 The advancement of computers and computational power has been crucial in developing complex statistical models and solving difficult problems.
11:16 🧮 The significance of statistical algorithms is highlighted by their variety and utility in solving diverse statistical problems.
11:29 🔄 The EM algorithm is noted for solving estimation problems involving parameters in models that can't be solved directly, like mixture models with latent classes.
12:25 📈 The Metropolis algorithm allows sampling from complex probability distributions, which is essential for dealing with difficult posterior distributions in Bayesian statistics.
13:21 🔬 Adaptive decision analysis in statistics has enabled the adaptation of experiments in real-time, improving their design and potentially stopping them early based on preliminary results.
14:31 🛡️ Robust inference provides trustworthy statistical analyses even when traditional assumptions are violated, highlighting the use of the sample median as a robust estimator.
15:26 🔗 Propensity score matching in causal inference is used to match treatment and control groups for more accurate causal effect estimation, with robust versions available to handle model misspecification.
16:09 📉 Visuals and plots are emphasized as crucial tools for examining data and assessing statistical models, marking an important skill set for statisticians and data scientists.
16:51 🧩 The tidyverse in R programming is celebrated for making data cleaning and visualization much easier, advocating for its use in exploratory data analysis.
17:05 🤔 The importance of statistical ideas is not measured by citation counts but by their influence on statistical practice and development of new methodologies.
Made with HARPA AI
"There's lies, there's damned lies, and then there's statistics." --Mark Twain
What’s your opinion on Nicholas Taleb and his ideas about how anti-fragility and fat tails affect mainstream statistics?
That would make for an interesting video
This reminds me of Ten great ideas about chance By Persi Diaconis and Brian Skyrms... Great summary!
Fascinating. And the irony at the end of using statistics to discuss how ideas in statistics are rated.
13:21 Is that a magic keyboard?!
Most of these ideas seem to have arisen from the analysis of very large data sets (and the issues inherent in combining sets of data) and the availability of computing power and the increased tendency to model issues.
I wish you had spoken the titles of each section. Very hard to follow when driving and the title slide flashed away too quickly.
Can you make a video going over the statistics behind Credit Default Swaps?
Fantastic video. You're a good explainer!
A statistical idea is important if it's influence is statistically significant 😂
Cool video!
Bayed theorem is incorrectly shown. On the right side it should be P(X | theta) not P(theta | X). Loved the vid!
I'd love to see a video on using ideas from statistics to explore learnings from a data set such as IMDB (movies).
excelled video! if only there is a way you can give us your knowledge map that would be fantastic
As a prestigious graduate of business statistics I am proud to say I was effectively taught how to pronounce "statistics" and how to ignore Bayes Theorem.
Statistics is the number one reason I have to redo voice lines. Also you must break Bayesian my brother
I told my biostat prof and mentor I was thinking about switching to epidemiology because I kept stumbling over saying “statistics.”
Amazing work and narrative thank you for this video
Gread vid! Did not knew that statistics can be that beautiful-looking.
A useful set of tools that are accessible to all via spreadsheets.
Really enjoyed this video!
of course you earned it hombre
Is there a version without the "music" so I can listen properly?
😊Excellent video! 🎉😊
I could have used this when i was in school!
wow, great video!!!
Awesome video
Definitely make a class on Udemy with code and real world examples.
gold
I suspect this might be a fascinating video, but it seems there's a heart-beat-paced thumping in the audio that's causing me a panic attack. Didn't make it to 3 minutes.
Love to have access to a version with clean audio; statistics doesn't get enough love.
Sorry about that! I’m still learning how I should balance my audio, there was actually a version without music, which you can find here:
The most important ideas in modern statistics (no music)
ua-cam.com/video/NkkKF3JaTTY/v-deo.html
@@very-normal Super! Thanks.
Business school is to statistics 🤓 as LaCroixe 💦 is to fresh fruit 🍌
Yes yes, balls balls
2:00 content starts
Wow. Your explanations are nearly impenetrable for normal humans.
I was really hoping to gain some insight here, but I gave up on your video after 4 minutes of, as you called it, ado.
background 'muzik' is noise -- so i bailed