- 16
- 331 428
RiskByNumbers
United States
Приєднався 15 лип 2023
Civil engineering professor by day. Risk quantifier 24/7.
=============================================
UPDATE (July 8, 2024): We now have a RiskByNumbers blog available at riskbynumbers.org!
=============================================
Hello and welcome to RiskByNumbers!
I am a professor excited to share educational resources around probability, statistics, optimization methods, algorithms, and programming to a broad audience.
Outside of UA-cam, you can currently find me in Vancouver, Canada at the University of British Columbia.
If this content resonates with you, or if you have further questions, leave a comment or reach out to me directly (while the channel is still relatively new).
Email: RiskByNumbers@gmail.com
LinkedIn Bio: www.linkedin.com/in/omar-swei/
=============================================
UPDATE (July 8, 2024): We now have a RiskByNumbers blog available at riskbynumbers.org!
=============================================
Hello and welcome to RiskByNumbers!
I am a professor excited to share educational resources around probability, statistics, optimization methods, algorithms, and programming to a broad audience.
Outside of UA-cam, you can currently find me in Vancouver, Canada at the University of British Columbia.
If this content resonates with you, or if you have further questions, leave a comment or reach out to me directly (while the channel is still relatively new).
Email: RiskByNumbers@gmail.com
LinkedIn Bio: www.linkedin.com/in/omar-swei/
Why Averages Are (Almost) Always Wrong: Jensen's Inequality and the Flaw of Averages
Today's video provides an overview of the 'flaw of averages'. The basic premise is that feeding average values into a model or function will typically not yield an average response.
We'll show why this is the case, prove why this happens via Jensen's inequality, and see the 'flaw of averages' in action via a realistic problem! The portion on Jensen's inequality is likely of interest to anyone interested in data science given that it underlies many important theorems (e.g., KL divergence)!
0:00 Overview of Flaw of Averages
0:42 Expected Value Definition
1:36 'Average In' vs. 'Average Out' - 2 Examples
4:02 Jensen's Inequality Proof
7:38 Summary of Non-Linearity and Jensen's Inequality
8:18 Will You Arrive in 1 Hour?
#probability #datascience #average #mathematics #education #jensensinequality
=====================
September 27, 2024: As I've mentioned in the last couple of videos, we now have a RiskByNumbers blog (riskbynumbers.org and riskbynumbers.com)! I've finally started putting together some written explainer drafts, and I'll be posting them shortly. Feel free to provide comments or feedback around the type of content you'd like to learn more about by commenting on this video or reaching out to me directly.
======================
If this is your first video, welcome! I am a professor sharing educational resources around probability, statistics, optimization methods, algorithms, and programming to a broad audience.
Outside of UA-cam, you can currently find me in Vancouver, Canada at the University of British Columbia.
Thank you, and I look forward to seeing you in future videos!
Email: RiskByNumbers@gmail.com.
LinkedIn: www.linkedin.com/in/omar-swei/
We'll show why this is the case, prove why this happens via Jensen's inequality, and see the 'flaw of averages' in action via a realistic problem! The portion on Jensen's inequality is likely of interest to anyone interested in data science given that it underlies many important theorems (e.g., KL divergence)!
0:00 Overview of Flaw of Averages
0:42 Expected Value Definition
1:36 'Average In' vs. 'Average Out' - 2 Examples
4:02 Jensen's Inequality Proof
7:38 Summary of Non-Linearity and Jensen's Inequality
8:18 Will You Arrive in 1 Hour?
#probability #datascience #average #mathematics #education #jensensinequality
=====================
September 27, 2024: As I've mentioned in the last couple of videos, we now have a RiskByNumbers blog (riskbynumbers.org and riskbynumbers.com)! I've finally started putting together some written explainer drafts, and I'll be posting them shortly. Feel free to provide comments or feedback around the type of content you'd like to learn more about by commenting on this video or reaching out to me directly.
======================
If this is your first video, welcome! I am a professor sharing educational resources around probability, statistics, optimization methods, algorithms, and programming to a broad audience.
Outside of UA-cam, you can currently find me in Vancouver, Canada at the University of British Columbia.
Thank you, and I look forward to seeing you in future videos!
Email: RiskByNumbers@gmail.com.
LinkedIn: www.linkedin.com/in/omar-swei/
Переглядів: 44 484
Відео
Make Better Decisions With Less Data - Bayesian Statistics (Part 2)
Переглядів 1,9 тис.3 місяці тому
In Part 2 of our Bayesian series (Part 1: ua-cam.com/video/NLKWLBJ-b9E/v-deo.html), we will learn how Bayes' theorem can be used in statistics! We will come up with our estimate of a model parameter via the Bayesian approach, and we will use our Bayesian result to answer a realistic question. This is a slight twist on the infamous 'sun rise problem', with some alterations made to make everythin...
The Most Important Concept in Probability: Bayes' Theorem (Part 1)
Переглядів 3,1 тис.4 місяці тому
Today's video is the first of a multi-part series on Bayesian methods! We'll derive Bayes' theorem, demonstrate how it allows us to update our beliefs with new information, and we'll apply it to a realistic problem. 0:00 Bayesian Methods Introduction 0:42 Deriving Bayes' Theorem - Events and Probabilities 2:04 Conditional Probabilities and Bayes' Theorem 4:18 Updating Beliefs via Bayes' Theorem...
Python Regression Made Easy: Master it in Just 4 Minutes!
Переглядів 9046 місяців тому
I've received requests from past students as well as subscribers to this channel via email to provide a quick tutorial on regression modeling in Python. So, here it is! 0:00 Import Data and Define Variables 0:29 Visualize Variables 1:20 Estimate Model 2:08 Interpret Model Results 2:37 Evaluate Fit 3:27 Use Model for Prediction I know that your time is valuable, and so I've aimed to be as brief ...
The Right Statistic to Understand Financial Data Better: the Geometric Mean
Переглядів 1 тис.7 місяців тому
In finance and statistics, we look at past data to predict the future. Today, I delve into geometric mean and why we tend to prefer it relative to the arithmetic mean for finance and investments. We'll learn how each of these means are computed and see that the arithmetic mean is generally larger than the geometric mean both in theory and in practice via real data! 0:00 Intro - Did Frank Really...
Make Smarter Financial Decisions With The Help of Natural Logarithms
Переглядів 4,6 тис.8 місяців тому
In finance, we frequently rely on natural logarithms to analyze financial data. Today's video aims to highlight 3 particularly important reasons why this is the case. 0:00 Introduction: Should you invest with Frank? 1:53 Reason 1: Linearize our non-linear data 4:04 Reason 2: Create interpretable statistical models 7:45 Reason 3: Model uncertainty and risk in financial investments 11:57 Key Vide...
A Really Simple Trick to Correctly Model Your Data (with Probability Plots)
Переглядів 4,1 тис.11 місяців тому
If you fit your data to a probability model/distribution, how do you know if is the 'right' one? Today's video provides a conceptual overview of quantile plots, a simple, intuitive method to visually evaluate whether your data aligns the assumptions of a probabilistic model/distribution. 0:00 Probability Distribution Selection Overview 0:39 Part 1: Compare 2 Probability Distributions Through Qu...
Make Smarter Decisions Faster (with Dynamic Programming)
Переглядів 13 тис.Рік тому
Dynamic programming is a really powerful algorithmic framework to make smarter decisions today that prepare you for the future. How exactly does it work, and how does it let you solve challenging problems quickly? Today's video is part of a new mini-series on this channel around sequential decision-making analytics! We'll cover reinforcement learning and other methods to solve these types of ch...
A Simple Solution for Really Hard Problems: Monte Carlo Simulation
Переглядів 232 тис.Рік тому
Today's video provides a conceptual overview of Monte Carlo simulation, a powerful, intuitive method to solve challenging probability questions. And we get to see how we can use it to answer a realistic question in Python! 0:00 Monte Carlo Applications 0:22 Party Problem: What is The Chance You'll Make It? 1:16 Monte Carlo Conceptual Overview 3:00 Monte Carlo Simulation in Python: NumPy and mat...
Python Data Analysis Hack: Fitting Data to a Distribution in 60 Seconds
Переглядів 6 тис.Рік тому
Previously, I provided a conceptual overview of likelihood methods and model estimation: ua-cam.com/video/uN7yIhXxHHk/v-deo.html. How can you actually fit data to a probability distribution in practice? Today, I provide a 60 second tutorial on how to do so in Python! This video is part of a series added to this channel where I am providing short, quick tutorials on implementing the concepts cov...
Maximum Likelihood Estimation: Clear and Simple Explainer
Переглядів 2,1 тис.Рік тому
Maximum likelihood estimation (MLE) is widely used in statistics to model systems and applications. How does it work? This video explains the principles of likelihood functions, MLE and model selection through an intuitive example: should I continue partaking in a coin-flipping bet? 00:00 Video Overview 0:26 Coin Flipping Problem Introduction 1:46 Concept of Independent and Identically Distribu...
Welcome to RiskByNumbers!
Переглядів 739Рік тому
Hello, and welcome to RiskByNumbers! I am a professor sharing educational resources around probability, statistics, optimization methods, algorithms, and programming to a broad audience. Outside of UA-cam, you can currently find me in Vancouver, Canada at the University of British Columbia. Thank you, and I look forward to seeing you in future videos! Email: RiskByNumbers@gmail.com. LinkedIn: w...
Probability Distributions Clearly Explained Visually (PMF, PDF and CDF)
Переглядів 6 тис.Рік тому
A visual lesson about probability distributions for random variables. I cover the probability mass, probability density, and cumulative distribution functions for discrete and continuous random variables. 00:00 Video Overview 0:31 Random Variable Definition 1:02 Probability Mass Function (PMF - Discrete Random Variable) 2:04 Cumulative Distribution Function (CDF - Discrete) 4:32 Continuous Rand...
Probability Crash Course: Key Definitions and Mathematical Concepts
Переглядів 2 тис.Рік тому
This video is a crash course on key probability concepts including: (1) key definitions; (2) sample spaces and their events; (3) complementary, intersection, and union of events; (4) mutually exclusive and collectively exhaustive events; (5) the mathematics of probability; (6) conditional, joint, and marginal probabilities; (7) independence Time Stamps: 00:00 Video Overview 1:00 Sample Space, P...
Demystifying Probability and Statistics: A 5-Minute Explainer
Переглядів 3,1 тис.Рік тому
Demystifying Probability and Statistics: A 5-Minute Explainer
Simply superb ❤ .... Can you make more videos on something like ... why most of the formulas are squared, why we need fsecond derivative, why we donot often use third, 4th and so on ...
OMG! I just read that you're in Van! That's so cool! I go to school at Uvic.
@jamesdennis6120: great to hear from you! Thanks for the note, and I hope that you are enjoying your time at UVic. If you find yourself roaming around UBC, feel free to reach out!
Was hoping monte carlo could help me get the reports done in 5 minutes so i could duck out early and pregame before my party.
@anzov1n: 😂😂😂 I suppose if the reports required that you compute some underlying probability distribution, they maybe!
thanks, needed some refresher on stats n prob 😂 and found this channel
Awesome! Thanks so much for the positive note, and welcome!
The best explanation of Bayes that I've come across so far. Thank you!
This is wonderful to hear, @mehtubbhai9709! Really appreciate you taking the time to reach out!
Such a good content, thanks for clarifying
Thanks, @braiandeivid! Appreciate the kind words.
Excellent explanation ❤❤ Can you suggest some good books to study Statistics the way you explained, or your favourite ones..
Thanks, @omgupta2012! I've actually been slowly writing up my own explainers (yet to be posted) which will be my own version of a 'text'. A way to expand on some of the points not expanded on in the videos. I really enjoy 'Introduction to Statistical Learning'. In particular, the newest version now demonstrates how to apply the concepts in Python, which I think is fantastic. Conversely, I've always enjoyed Ang and Tang's 'Probability Concepts in Engineering' for a good overview of probability. It is particularly helpful, I find, for those who have some background in engineering to understand where these concepts may show up! Happy reading, and thanks again for the positive note!
@RiskByNumbers now I'm more curious about the text which you are writing, and the knowledge in that. Waiting eagerly for your explainers.❤️❤️
Very interesting! I would have approached it differently before i knew about this. assuming the average of both reports is in the middle of the time ranges given. for instance if 1-5hr and 2-6hrs, then I would assume 3hrs and 4hrs respectively for each report and got a quick assumption that I would take about 7hrs total. So can you also say its a 50/50 chance of taking longer than 7hrs?
@washington_pc3306: thanks for the note! Your reasoning is very intuitive and sound. In this particular case, you are correct, which is due to the 'linearity of expectation'. If I sum multiple random variables, their expected value (i.e., mean) is the sum of their individual expected values. As the expected values for these 2 reports were 3 and 4 hours, respectively, then their summation is equal to 7 hours. Now, a couple of nuances. Linearity of expectations speaks to the mean. The mean may differ from the median (i.e., 50th percentile), which may complicate things. Second, determining the probability of exceeding, say, 9 hours is actually a bit involved. One distribution to check out is the Irwin-Hall distribution, which is the distribution for the sum of 'n' independent uniform distributions lying between 0 and 1. It is quite an interesting result. To solve this problem in the video, one can use convolutions. The nice aspect of Monte Carlo is that it can get you 'approximately' the right answer pretty quickly these days. And, furthermore, there is usually so much uncertainty in the underlying distributions (e.g., is Report 1 really going to take 1-5 hours, or perhaps 0.5-5.5 hours?) that an approximate answer is usually quite good. Thanks again for the comment. Feel free to follow up with me!
The car example in the video is incorect. Average speed of 20, means that in the past many drivers rode this 20 km, and they did it in exactly in 1 hour. So your expectation is 1 hour of drive, and the video is wrong. However, if you define an average speed as (many drivers look instantly on their speedometer and report the number, and i averaged this and got 20) then the expected time travel would be more than 1 hour. But this is not how average speed is defined.
What is the probability that I will like this video? 😂😂
Hopefully higher than the probability that Todd causes you to be late to the party 😂
thank you, now i finally understand what the CDF is
@tommasoc.2207: great to hear -- thanks!
If average in does not lead to average out... then you're using the wrong average.
This analysis is very insightful. The concept is on full display in financial markets with daily leveraged ETFs where the average return of the underlying instrument(s) over time is meaningless. It is the distribution of daily returns of the underlying that determines the ETF return.
Great example. The original plan for this video was providing the example of the value of a financial option, so it is great to hear that this concept resonated with your past experience!
@@RiskByNumbers In fact, ignoring transaction costs and fees, etc, the excess return on a 2x leveraged ETF expressed as a factor on the original investment is roughly the SQUARE of the excess return (also expressed as a factor) on the underlying reduced by multiplying by function of the total-variance (<1) due to slippage from “buying high / selling low”. For a 3x leveraged ETF, read “CUBE” and the total-variance reduction factor is lower, etc. This analysis can be shown to be exact in the limit of continuous trading from applying Ito’s lemma - which itself is a statement exemplifying the “flaw of averages” - and, for a given underlying return and total variance, the return on the ETF is NOT path-dependent, (but the margin is too small for me to put the proof here 😊) NOT investment advice or a recommendation to buy or sell any financial instrument. For academic interest only.
In summary, the expected value of non linear systems will be different to the expected value of linear systems.
Your channel is fascinating. Thanks for taking the time to compile these insightful videos.
@jonmacdonald1413: thanks for this comment and for taking the time to visit the channel. Really appreciate it.
Which of the two function does represent the correct value, f(avg(x)) or avg(f(x)) ? Or are both wrong ? Because the distribution is unknown ? And if so, is there away to calculate the correct value without a simulation ?
Great question, @TimschneiderSchneider! As highlighted at the start of this video, we are oftentimes interested in E[f(x)]. We want to know the distribution around some response, say 'Z', which is uncertain due to uncertainty for 'X'. In this video, we can directly determine the distribution for 'Z' (i.e., travel speed) by simply mapping the probability of each possible outcome for 'X' to 'Z'. However, that is not always the case. Or, it may be possible, but the problem we are looking at is quite large. Simulation is what I've found to be the quick solution in these cases. However, analytical approaches do exist, such as the use of Taylor expansions. Again, very good question(s).
Isnt there a confusion here between 'average' and 'expected' speed ? Average speed by definition comes from total distance over total time. So the example to begin with is not correct I think.
@denizersoz7012: thanks for the clarifying question! Indeed, I found myself nervous using a speed example in a discussion around 'averages'. By 'average speed', I do mean total distance over time. By 'expected average speed', I mean the expected value across all possible average speeds. Here, we have three possible average speeds: 10, 20, and 30. As some have mentioned, this case is a great problem to highlight the harmonic mean. I'll think about how to fit that into future videos. Feel free to reach out to me directly to continue the conversation -- Omar
In the example provided at the end of the video let's see what happens with travelled distance. So, 15 minutes at 10 mph give us 2,5 miles. 30 mins at 20 mph - 10 miles. 15 mins at 30 mph - 7,5 miles. And that gives us 20 miles travelled in 1 hour with an average speed of 20 mph. I understand the concept, but either you assume that distance is linear function of time with speed as a coefficient or you can't just say "the probabilities of finding 10, 20 and 30 mph are distributed the way shown". And average speed in physics is total distance to total time by definition.
Thanks, @DenisZlokazov, for the comment! As I mentioned in a couple of responses, I was a bit hesitant around this example given the understanding of 'average' in a physics sense. Here, I do mean average speed as the total distance between 2 points, d2-d1, versus the total travel time, t2-t1. By its expected value, we can imagine three scenarios (given a deterministic distance): the travel time will be 120 minutes (10 mph), 60 minutes (20 mph), or 40 minutes (30 mph). With the travel times listed above, it becomes quite obvious what the expected travel time should be. Thanks for the comment and feedback.
This is my new favorite field to study 😅
@adelshahbakhsh2683, awesome and welcome!
Are you sure that distribution of r is gaussian?
@alex-craft: great comment and question! Having looked at log returns for financial assets in the past, I've found it to be a good 'approximation', but there is definitely evidence out there that they do not perfectly follow a Gaussian distribution (nor a random walk with drift). I think Black-Scholes is a nice example of a case where strong assumptions have been made that may not be perfectly true, but that does not mean that the Black-Scholes model has not been immensely valuable. A very interesting paper around the modeling of returns can be found in Andrew Lo's paper: "Stock Market Prices do not Follow Random Walks: Evidence from a Simple Specification Test". It is a great, interesting read around the nature of stock prices explored through an intuitive variance ratio test. Highly recommend. Happy to continue the conversation, and thanks again for the question.
I wonder if it’s just me, but I think this is easier to observe by simply going over operations on random variables. It’s clearer to me that both sum and integral will behave this way, so a moment defined with them will as well. Sums of averages work, but multiplication/division fails.
Thanks, @theondon, for the feedback! Great idea and food for thought.
Thank you! This is actually very helpful 👍👍
Thanks, @vlad_objective! Very much appreciated.
"imagine a river bed of pebbles, now imagine the agerage pebble weighs just 5 grams, whats the chance of you putting your hand in and grabbing a 5 gram pebble?" - jung (as best as i can remember)
I believe it comes from "The Undiscovered Self": "If, for instance, I determine the weight of each stone in a bed of pebbles and get an average weight of 145 grams, this tells me very little about the real nature of the pebbles. Anyone who thought, on the basis of these findings, that he could pick up a pebble of 145 grams at the first try would be in for a serious disappointment. Indeed, it might well happen that however long he searched he would not find a single pebble weighing exactly 145 grams."
@@RiskByNumbers was this written b4 or after the mathematics you referenced in your video??
I think a better lesson would be that the weights matter: equal weighted, time weighted, and distance weighted mean speeds will all be different. And only the distance weighted mean speed has the property that its reciprocal multiplied by distance equals the time taken.
@hdthor: this is a wonderful comment. A couple of others have also mentioned that the harmonic mean is something worth bringing up (I've discussed the geometric in the past), so I'll think about how to do so in future videos. Thanks again for the great comment and feedback -- definitely helpful food for thought!
"Think about subscribing to the channel." You already earned my subscription when you said we were rolling a singular die, and not """singular dice""".
Haha, @RandomBurfness! Perhaps next video I’ll try to throw in the word ‘datum’. Appreciate the note and subscription!
I have three pages listing YT channels on mathematical/ programming, but RiskByNumbers stands above the rest. Analyzing financial data is a great topic, encouragingly. Please keep it coming.
@lioncaptiv: wonderful to hear from you. Really appreciate the kind words, and thanks for supporting the channel!
It's so obvious when you put it like that
@orterves: that's wonderful to hear -- much appreciated!
you just saved me from backlog in my degree
@deepakbhaiya.shorts: thank you so much for your kind words! Really appreciate it.
Who you trying to dazzle with your bs? Just show that when f(x) is nonlinear usually avg[f(x)] is not equal to f(avg(x)).
Superb content. Keep it going, brother.
@Rdffuguihug: many thanks - cheers!
This came up for me at work in the context of agile software development. The agile instructor explained that the estimated difficulty of fixing a problem would be coded as follows: 0 = minutes, 1 = hours, 2 = days, 3 = weeks, 4 = months, 5 = years. Then he went on to say that we could use the average of the codes for all of our bugs to estimate how long it would take to fix them all. Say what??? You can''t do that; the encoding is not linear! In general, avg(f(x)) = f(avg(x)) only if f is linear. He didn't understand what that meant, unfortunately, and thought I was just being a PITA. OK, imagine you have 1000 bugs, all 0s except for one 5. The average difficulty estimate is going to be very close to 0, but it will take you years to fix them all!
@ClearerThanMud: this is such an excellent example. Thanks for sharing!
I wouldn't be so quick to dismiss the idea! The proposed metric lies between the median and the actual average, which seems useful! The actual average might be of little interest when you know the total time. The median might be bad since it goes to 0 faster. It's a tradeoff between two location measures.
Mhm the speed example is excelent but tricky at the same time. If you do the physics on paper you always have an absolute value, if you look at the dash of your car you always get the average speed that got you from A-B. Speed is usually not experienced as a mean. Also once you do statistics you get a very weird looking probability function. You can't get faster without going illegal, you get a view local maxima due to intersections or common jams. And you never drive at average. You almost always faster as it needs to average out stopping times.
@MrHaggyy: thanks for the note -- you brought up some wonderful points.
Bravo! 👏
@vitalysarmaev many thanks!
Amazingly clear explanation, you didn't make too big jumps without explaining it which is not a common thing (of course hard to do as well). Thanks.
@andrashorvath2411 really appreciate the kind note and message! It means a lot. Cheers! -Omar
Great visuals and explanations.
Thanks, @kevon21, for the kind note!
Interesting video. Can you recommend any literature on this topic, please? Aside from the example in the video, where/how can this idea be applied in real life? Like business, engineering, social studies...
@leo_tra: great questions! First, in terms of applications, I'll note a couple of items. Jensen's inequality can show up quite a bit for certain proofs important in the areas of probability, statistics, and machine learning. A common example is Kullback-Leibler divergence (where we are measuring the difference between 2 distributions), where Jensen's inequality can be used to prove its non-negative property. For myself, the more interesting application of this idea is in the modeling of dynamic systems. A very intuitive example that I use to motivate one of my courses is the construction of a parking garage to maximize financial return. It costs you a certain amount of money to build each floor. Your revenue is based on demand for your parking facility, though there is a capacity constraint. You now want to balance your added revenue with each floor versus its cost to build. For an average demand, E[X], I can determine the design that maximizes my profit, f(E[X]). However, the expected profit, E[f(x)] is likely lower for that design when we introduce uncertainty. If demand is low, we are making less than we thought. If demand is high, the capacity constraint comes into play, and we can't take advantage of 'good times'. The solution then, in this uncertain system, may be to create a garage that is smaller (so that, if demand is low, we spent less upfront) but with beefed up columns and a foundation (so that, if demand is high, we can take advantage of that high demand). The reason that I like this example is that it does not require a background in optimization, reinforcement learning, etc. to appreciate that recognizing uncertainty allows you to identify adaptable policies that allow you to do well in the real world. As I worked out the script for the above example, I realized it was getting to be a bit much, which is why I just did this travel speed example. In terms of references, Warren Powell has provided excellent references on the topic, and he does a good job of trying to unify the perspectives of those working in the areas of optimization and reinforcement learning. Feel free to shoot me an email if you'd like to know more -- I'm planning to expand on the topic soon. Cheers -- Omar
@@RiskByNumbers Thank you for the details. I`ll try to go through Powell's work and if I have further questions I`ll message you.
The problem with that video is the definition of average speed which is exactly total distance divided by total time.
@PeterZaitcev: thanks for the comment -- very much appreciated. I mentioned a couple of points in other comments that I'll mention here. Indeed, when I put together this video, I got a bit worried about using the term 'average' speed in the same video where I would be discussing expectations/means. Here, I meant by expected average speed as travel distance over time, as you noted, but considering that there are 3 possible situations. There is a 25% chance that total distance to total time is one ratio, etc. Now, the original problem was going to be a case of modeling a financial option, as I find them to be a great example of this idea. I actually will buy one each term in my class to motivate the topic and keep folks in the class engaged. They require knowing quite a few terms, though, hence I switched things up. Very much appreciate the feedback -- it is helpful as I think through ways to improve things in the future. Thanks again.
"Average speed" is universally understood as the true average though, and not a mean of multiple speeds over time so the beginning of the video is confusing to me
What is “true average”? Wouldn’t that literally be a mean of multiple speeds over time?
@@ryanlohbrunner7760 if you were driving for 1 hour with a speed of 20 km/h and last 2 seconds with a speed of 100 km/h, would you say that you average speed is (20 + 100)/2?
You're thinking of the average speed of a trip, which by definition would be the actual average speed of that specific trip. The video would make no sense at all if that's what he was talking about. He's talking about a trip with an unknown true average speed, but whose average speed is some distribution based on traffic that day, etc.
The speed given at the start of the video is not "average speed" it is "expected average speed". In fact, it was just written as "expected speed" omitting the words average entirely in the video graphics, because it's not the important bit. The important word is "expected". "Average speed" is just total average speed of a trip computed the natural way as you say. "Expected average speed" is the mean of those calculated average speeds over many trips. This also gives a useful intuition as to why they are different: "average speed" of a trip is a time-weighted average of your travel-speeds. "Expected average speed" is not a time-weighted average, it's just one data point per trip regardless of how long that trip took.
It is wonderful to see all of these great comments. Apologies as well for just chiming in -- we just got back from the hospital with our newborn, so a bit sleep deprived. @OMGclueless: spot on! I subtly mentioned 'expected' value at the start of the video to try and highlight a really important point. The motivation for this video (outside of trying to distill Jensen's inequality in an understandable manner) stemmed from some of my past consulting experiences I've done outside of my university day job. I have found that there is a tendency to frequently only discuss and use 'expected values' in making decisions without recognizing that, for non-linear systems, it may be quite important to know the underlying distribution for your random variable of interest. My hope is that this video helps clarify that point. The original motivation for this video was going to be determining the value of a financial option (e.g., call or put), but I realized pretty early on that it would make more sense to use an example familiar to most everyone. I might, though, come back to that in the future (ideally when I put together a couple of videos around reinforcement learning).
Can you share video code ?
@maths.visualization: great to hear from you. I'll work on cleaning up the code on my end and eventually share it on GitHub. We are welcoming a new member to the family this week, so apologies ahead of time for the delay!
The beginning of the video is not fair. 1.167 hours is not the answer if you only have the information about the expected value for the velocity, that will follow an unknown distribution. Indeed if you are driving at about 20 mph, under certain assumptions, 1 hour is an approximation for the time taken to cover 20 miles.
@33gbm: absolutely spot on and correct! One of the motivators for this video was that I have worked quite a bit with industry over the years and found that there is a tendency to avoid modeling uncertainties and to operate in a deterministic world. The consequences of doing so are not necessarily intuitive or apparent, particularly for more complex problems. Therefore, the goal was to show the possible consequences in a more straightforward example. Great job and catch! - Omar
@@RiskByNumbersthanks for the reply. Nice to hear from you. By the way, the discussion about the problem is very well presented and I hope to see a lot more from you! 😊
Hey nice video,but before/after the formal proof you could have just shown why does this happen,its easy to visualise why the equality would hold for linear functions and why it would deviate for convex/concave functions.
Thanks for the positive comment, @boltez6507, and feedback. Really do appreciate it.
for the many variations where f and f' are monotonic over the range of relevant values of x, the cookie cutter proof for the relative size of E(f) and f(E) revolves around a first order taylor series with remainder. lots of results (the harmonic mean being smaller than the algebraic, et al; the economic idea of risk aversion) can be spun out of applying this to a suitable f(x)
@theupson: love this point. I debated if it would be worthwhile to delve into that towards the end of the video, but the video was already feeling a bit long. Great point again, and I'll see if I can bring this up in a follow up video. Thanks!
Very well done! For travel speed, I'm interested in using harmonic mean as opposed to arithmetic mean. But I couldn't figure out a way that this generalizes to that
Thanks for the comment, @fibbooo1123! Here, you could compute the inverse of the expected value of 1/X. E[1/X] = 1/4 x 1/10 + 1/2 x 1/20 + 1/4 x 1/30 = 7/120. Therefore, 1/E[1/X] = 120/7. We can then plug in our distance and 20/(120/7) = 140/120 = 7/6. Hopefully I've made no mistakes (welcoming a newborn to the family this week, so I've been a bit out of it...). Thanks again for the comment, and great catch around the relationship and importance of the harmonic mean!
@@RiskByNumbers looks right to me, thanks!
@@fibbooo1123Harmonic mean works for average, when sectional speeds are known for equal lengths. This is actually the case here, where "1/4,1/2,1/4" probabilities determine the distribution along the *path* not over time. Because it's not "1/4 of time the speed is 10" etc, it's "1/4 of the path the speed is 10". Sure the speed 20 is twice as likely, but that's a nice multiple - so just use it twice, basically the overall average speed is the harmonic mean of "10, 20, 20, 30"
Okay, so. Referring to functions not all being straight lines as "flaw of averages" is really messed up.
The videos not bad, but its mostly just a phrasing thing. If you had initially phrased the question: Given that a destination is 20 miles away, and given the average speed across many different journeys to that same destination is 20 miles per hour, what is the average journey's time to the destination, of all possible journeys. Then, i doubt everyone would jump to answer 1 hour. Plenty would say, who knows, or phrase "1 hour?" as a guess. Here is a much simpler explanation: If every 100 journeys you end up going, a single journey goes at a rate of 1/(20/100)th a mile per hour, then that one journey would take 100 hours. But we know, that the sum of all 100 journeys needs to be = 100. Therefore, adding that single case eliminates our conclusion from being possible. It is also the case that we can validly add a 1/5th mile per hour average into the distrbution of average travel speeds, as the average of all miles per hour needs to be 20, and we can simply add 20+(20-1/5) and have the distribution of average travel speeds already back exactly on expectation with a single next journey. I'm also not at all convinced we should be talking about "most" things following or not following this type of rules. I would instead say, when does it make sense for a distribution to be equally weighted(the actual mechanism behind when this works or not If there were a situation where someone did not drive past any red lights to a specific location, then there are only a few other red-light ish factors, like when you come to a turn is someone already occupying the road in the same direction. It's realistically possible for some drivers, going to some regular destinations, the mean of average speeds matches the mean of travel times. I would personally have delved into how to identify, without direct observation, what might a distribution look like, and then from that, how we might be able to make judgements about the relationship between the two averages. For instance, the squared distribution, obviously has its "weight" distributed to the right. So if we take a equally weighted distribution as an input, we should expect that to line up to the left of the true mean. No math needed!
Great work
Thanks, @wstaempfli!
Another banger to watch while I eat
Haha, cheers and thanks, @berlinisvictorious!
1:47 faces 3 and 4 are adjacent. This is not true on normal d6-dice as opposite faces always sum up to total 7.
Thanks for catching this, @ReneKnuvers74rk! You know, I kept looking at this die thinking 'something looked off...' but could not put my finger on it. This is a good point that, in the future, I should be willing to spend a few bucks for a stock image and spend my time solely on the animations. Thanks again!
0:24 In the caption, it should be the "flaw" of averages, instead of "law". The explanation is simple and intuitive. Thank you!
Thanks, @NicolasChanCSY! Just updated the subtitle -- appreciate the comment and positive feedback!
should've used the harmonic mean :p
@lonjil: I have gone over geometric means in a past video, but I really should also find a point to bring up the harmonic mean (and their application). Thanks for the comment!