Even with a background in probability, Monte Carlo is still used frequently. This is because even in simple scenarios calculating the expected value becomes very complex very quickly.
Although trivial example for demonstration that you can get away with CLT and sum of two Gaussian for easier approximate (mean 7 and std 1.632), very good explanation of Monte Carlo simulation for freshman
Thanks, @matholive6994! Definitely a simple example that you could solve through alternative means. When I find the time, I'm going to create a blog where I can discuss in greater depth the different ways to solve a problem like this one. Thanks again for the kind words!
In fact, the assumed probability distribution has an impact on the result (as always in general). For example, in the case illustrated in the video, under Monte Carlo simulations, the probability of needing at least 9 hours is around 12.5% under uniform distribution; around 2.2% under normal distribution, and around 4.2% under triangular distribution.
@jamesdennis6120: great to hear from you! Thanks for the note, and I hope that you are enjoying your time at UVic. If you find yourself roaming around UBC, feel free to reach out!
Thanks, @othmanmurad5267! That was the goal of the video, so I'm glad it resonated with you. And, yes, definitely intending to put together a video on Markov chain Monte Carlo in the not too distant future.
Thanks for the suggestion, @johnmaskell5124! I really enjoy the intuition and simplicity of Cholesky decomposition when introducing folks to generating correlated random variables, so I'll plan on adding that to the list of future videos. Given the time it takes to create each of these animated videos, I am thinking that I'll make a blog soon, where I can provide more rapid thoughts on these topics. I'll announce that once I've set it up. Thanks again for your comment -- Omar
Hi Omar, really enjoying the way you teach. Is there any advise you have to those new to probability theory, where to start etc for those interesting in the practical application of risk? It's fascinating!
@LiamRohan_: really great question, and thanks for the positive feedback! In terms of learning about probability theory itself, I find that there are less 'accessible' resources than those related to statistics. This may be due to the fact that it is quite easy these days to import different libraries and use them to estimate a model. Having said that, I think that MIT OCW has a great primer on probability via 18.05. If you want to purchase a text, Ang and Tang have a good text titled 'Probability Concepts in Engineering' that I've seen used for courses in civil engineering, nuclear engineering, mining, and others. Now, there are even less resources in terms of practical application of risk analysis. Richard de Neufville is one author/researcher who comes to mind that provides practical resources around risk analysis and modeling for real problems. He focuses quite a bit on flexibility (which, in principle, underlies methodologies such as reinforcement learning) to better handle uncertainty and discusses so via quantitative as well as qualitative means. Is there a particular application you are working on these days?
You say that Monte Carlo analysis assumes a uniform distribution - meaning that all possibilities between the specified min and max values are equally likely to happen. But this is very rare, isn't it? If you know the shortest person is 3 feet tall and the tallest person is 8 feet tall and you programed an MCA to predict a stranger's height with this min/max range, wouldn't it be just as likely to predict a stranger is 3 feet tall as it would be to predict that they are 5'8"? You're going to be right MUCH more often if you guess a stranger is 5'8" than if they are 3 feet tall. Is this a weakness of MCA? Thanks. (Or wait...does lots of sampling within that range predict/simulate a normal distribution? But that's also a false assumption often, for example with stock market returns isn't it?)
@dalrax4: really great set of questions! Thanks as well for reaching out! For our Monte Carlo simulation, we could have actually assumed our distributions came from any available distribution to us. In this video, I used NumPy, and as you can see we actually have a broad set of distributions available to us for random sampling: numpy.org/doc/stable/reference/random/legacy.html. Your point highlights an important aspect of Monte Carlo simulation. Unless we specify our distributions (reasonably) correctly, we may be facing a case of creating a model that is 'garbage in, garbage out'. This point also highlights why, in many cases, an individual may be okay with just 1,000 Monte Carlo simulations. Given that we typically do not know our underlying distribution (it is estimated through various statistical means) for our inputs, we are oftentimes okay with an approximate estimate of the distribution for our response variable of interest (in this case, the total time to complete both reports). Your point about stock market returns is great (and relevant to the channel)! I'll actually be talking about the usefulness of natural logarithms for modeling financial data in the next video. I've recorded the audio and have made the animations -- I'm just finalizing the stitching of the two together.
The distribution at 4:55 seems a bit strange. Could be because of the way python plots it. I did the experiment in MATLab and I had distinct peak at 7 (which is indeed the expected value of the random variable "duration" i.e. E[duration] = E[A] + E[B] = 3 + 4 = 7) and probability that it will take more than nine hours came out to be 12 %. (I calculated it using the CDF i.e. P(X > 9) = 1 - P(X
@wanderer291: thanks for the comment! Do you want to provide your MATLAB script? The plot at 3:00, in theory, is the same plot at 4:55, but a bit more cleaned up with the number of bins increased. A couple of other comments: 1. If the time to complete both reports is modeled as a uniform, continuous random variable, then the resulting distribution should be triangular. This can be shown via convolutions. In addition, check out the Irwin-Hall distribution, which helps explain the resulting distribution when summing a series of uniform random variables. 2. The answers are close, but slightly different (12% vs. 12.5%), if you model the time to complete each report as discrete vs. continuous. For the discrete case, you can work out the answer quite quickly. For Report 1 and Report 2, you have 5 possible outcomes with equal probability (i.e., 1 in 5). Therefore, if we assume independence, the joint occurrence for a time to complete both Report 1 and Report 2 is 1 in 25 (i.e., 20% x 20%). We have 3 discrete combinations that exceed 9 hours: 4-6, 5-5, and 5-6. As each combination is mutually exclusive, we therefore have a 3 in 25 (or 12%) chance of exceeding 9 hours.
@@RiskByNumbers Thanks for the exaplanantion. I am sure I saved the code but I can't find it anymore (have a lot of m files). I did use discrete distribution though. I think it was the number of bins, as you said, that was causing the difference.
Very interesting! I would have approached it differently before i knew about this. assuming the average of both reports is in the middle of the time ranges given. for instance if 1-5hr and 2-6hrs, then I would assume 3hrs and 4hrs respectively for each report and got a quick assumption that I would take about 7hrs total. So can you also say its a 50/50 chance of taking longer than 7hrs?
@washington_pc3306: thanks for the note! Your reasoning is very intuitive and sound. In this particular case, you are correct, which is due to the 'linearity of expectation'. If I sum multiple random variables, their expected value (i.e., mean) is the sum of their individual expected values. As the expected values for these 2 reports were 3 and 4 hours, respectively, then their summation is equal to 7 hours. Now, a couple of nuances. Linearity of expectations speaks to the mean. The mean may differ from the median (i.e., 50th percentile), which may complicate things. Second, determining the probability of exceeding, say, 9 hours is actually a bit involved. One distribution to check out is the Irwin-Hall distribution, which is the distribution for the sum of 'n' independent uniform distributions lying between 0 and 1. It is quite an interesting result. To solve this problem in the video, one can use convolutions. The nice aspect of Monte Carlo is that it can get you 'approximately' the right answer pretty quickly these days. And, furthermore, there is usually so much uncertainty in the underlying distributions (e.g., is Report 1 really going to take 1-5 hours, or perhaps 0.5-5.5 hours?) that an approximate answer is usually quite good. Thanks again for the comment. Feel free to follow up with me!
@dmitriyivkov7578: great question. Technically, you could actually solve this problem via convolutions. A neat, related distribution for this problem is the Irwin-Hall distribution, which represents the distribution for the summation of 'n' independent, standard uniform random variables. I'd suggest checking it out! If you sum 2 standard uniform random variables (assuming independence), you arrive at a triangular distribution. However, as 'n' grows large, the resulting distribution tends to the normal distribution per the central limit theorem.
Ask Todd what is his priority out of the two reports? He’s the boss and knows the answer to that question (even if you know the answer in your opinion). If he continues to say both then you say attempting both increases the risk that neither report will be finished so again, what is the priority? I’ve learnt over many years to not break your back for a boss. When I get treated by overly demanding bosses and customers is to hold back asking questions they have the answers to when time constraints start to cause problems you can raise the question giving many options and say you can’t complete it or even start working on it till the answer comes in. This does need working out before hand. But seems to be exactly what solicitors do in the uk when you chase them up….
@MrMeltdown: it's been pretty neat for me to see the range of responses to this video. Although the intended focus of this video was less so around the correct course of action for this situation, I enjoyed reading your response and your perspective. Personally, I have found that providing a rationale and acknowledging the other's perspective can go a long way in these situations. I find this even when I'm teaching. Explaining 'why' I've structured an assignment in a certain manner goes a long way in terms of student buy in. Similarly, acknowledging how student feedback from past years has informed certain choices in the current year gives students a sense of voice and acknowledgement. I think this extends to more traditional employer-employee relationships, too. Now, whether that actually happens is another story...
Thanks, @briancruz3551! I appreciate the kind words. For the visuals in this video, I relied on a few tools. First, for the graphical animations (e.g., the Monte Carlo sampling), I wrote up a small Python script. I am doing that more and more in my videos, primarily with matplotlib.animation. As I use matplotlib so frequently for work, I have used to date, but this summer I'll explore other libraries. Second, for the screen recording I used OBS Studio (which is free!). Finally, to bring all the animated clips together, I use Premiere Pro. Launching this channel has given me a much greater appreciation around other channels that I watch on UA-cam in terms of their time and effort! Final exams end this Friday for me, so I'll be much more timely around my output soon!
What doesn’t make any sense with this scenario is why would you randomize both A and B scenarios? How does finishing report A in 1 hour have the same equal chance of finishing it in 6 hours? Shouldn’t these hours have probabilities attached to them as well? Could you weight each hour into the randomizers?
@a1cswiz1611: great question (and apologies for the delayed response!). For each of these two reports, we are assuming that there is uncertainty around to complete them. For Report A, it is 1-5 hours and, for Report B, 2-6 hours. We can use our probability density function to define this uncertainty. You could also use a probability mass function, if we wanted to model time to complete each report as a discrete random variable. In NumPy, you can do this via np.random.choice(), and you can alter the probability masses as you see fit (i.e., each outcome does not need to have an equal probability of occurrence). The key aspect here is that the probability densities/masses should be representative of the uncertainty. If you believe certain times to completion have a higher relative likelihood, you should select a probability density/mass function representative of that situation. The nice aspect of Monte Carlo simulation is that it can adapt to alternative distributions. I hope that makes sense. Do not hesitate to follow up if you have any further questions. Omar
Really appreciate the comment and feedback, @muzahmad2104! Now that I've received quite a few comments around distribution selection, I'm switching up the order of upcoming videos and having the next one focus on that topic. The comments received have been really valuable in determining what to focus on next. Thanks again for the feedback and positive note!
Thanks, @OutOfRangeDE, for highlighting another application for Monte Carlo! It has been neat to hear from many folks around how they've applied Monte Carlo simulation. Your comment sent me down a 30-minute rabbit hole of reading up on its use for circuit designs -- its been great to learn via the comments.
I was planning on covering Markov decision processes eventually, so this would be a great moment to discuss these topics. I sincerely appreciate the interest in the channel, the suggestion, and the patience in awaiting new content!
It seems like what you do is calculate (or more specific approximate) the area under the curve with a limited amount of points. Let's say you have N points, and you want to use these N points to approximate the area under the curve. The monte carlo simulation now expects you to throws these N points "randomly" (uniformly) onto the curve increasing the chances to cover enough area for good approximation accuracy. _
@statistik4908 you may be interested (if you have not read it already) in McKay et al.'s 1979 paper that compares random sampling (as done in this video) vs. stratified sampling: www.jstor.org/stable/1268522. A really good read!
@@RiskByNumbers I have a complex physics problem that I soon need to simulate, involving ionic particles (with a velocity distribution) that come in at different incident angles with some probability to neutralize on a surface AFTER which I'd then have to simulate the directions they go afterward. Monte Carlo came up and so I needed to brush up my knowledge and build some intituition to solve this problem. Your video greatly helped me get down some ideas on how to solve my problem :)
@@Tommybotham thanks for the reply, and apologies for just getting back to you. I've really enjoyed learning about the different problems that others have analyzed/solved via Monte Carlo simulation. Glad the video could be of some utility, and I hope that you were able to solve it!
Great Video! However, I am confused by what the output probability would be. In the video, would the 0.125168 be the probability of the tasks being completed in time or the probability that the deadline is missed? Thanks,
Thanks, @charliehunter_! It was meant to be the probability the deadline would be missed. I summed up the number of instances that exceeded a threshold (in this case 9 hours) versus the total number of simulations. So, 12.5% is approximately the probability it takes us more than 9 hours (implying the deadline is not met). Thanks again!
I don't understand the benefit of initiating a random distribution here when the values between 1-5 hours and 2-6 hours should statistically occur with equal frequency. Why do I need the computer to generate random numbers when, in an infinite number of attempts, there should always be a 20% probability of each occurrence?
@SubZero101010: thanks for the great question. This particular problem was an instance where you could analytically solve for the distribution around the sum of these two random variables (per the discussion with @gokulr8755 in the comments). As mentioned by @andrewevanyshyn1709, seemingly 'simple' problems become quite complex quickly, and so this technique is a nice way to deal with this situation. Furthermore, this approach is a nice way to validate one's intuition and understanding of probability. For example, it is not necessarily obvious that the sum of a series of standard uniform distributions follows the Irwin-Hall distribution. Therefore, I sometimes like to use this technique as a way to validate that my derivations for simpler problems to ensure that they are sound.
Ok I could be wrong, but to find the percentage that you will miss the barbecue, you divided by sims which is 1mil. But the entire data set that is computed is 2*sims data points (A+B and both are 1 million points). So to find the actual chance you will miss the barbecue isn’t it actually closer to half of what you computed?
@taylorolp8090: really great question! The length of all 3 arrays is actually 1 million. As you mentioned, for "A" and "B" we have generated 1 million random values. When we implement "duration = A + B", we will actually create another array that is the same length as "A" and "B". The value for the first indexed value, duration[0], will be A[0] + B[0]. In fact, for any index value, i, duration[i] = A[i] + B[i]. In other words, we will have 1 million values, where each value is the sum of a random realization of A and another random realization of B. I hope that makes sense. Feel free to reply or follow up with me directly if anything is still ambiguous.
Thanks! That is definitely something I can introduce. The next few videos will focus on a few algorithms (e.g., dynamic programming, reinforcement learning), but the content afterwards is still up in the air. Thanks again for the comment and request.
Why, thank you, @retajmohamed859! That is quite the compliment. The short answer is “no”. Thanks again for the comment and for checking out the channel!
Had you started to work on the reports immediately instead of writing a program to inform you of your chances of making your party, you’d be done already 😊
@kimsourtat601: great question! In this case, the expected value (the median happens to be the mean) of our two random variables is 3 and 4. The expected value of their summation is simply 3 + 4 = 7. However, ignoring the full distribution means that we cannot comment on other statistics (e.g., variance) nor explore, for example, the probability the total duration exceeds some threshold (in this case, 9 hours). Monte Carlo simulation, however, helps us do that without working out the solution analytically! I hope that helps.
@@RiskByNumbers Thank you so much, the expected value is 7 hours (my bad is 8 hours). In this case he has 9 hours left to complete the task but there is 12.5% chance he could complete, if he take this task, he is unlikely to join his party. : (
Question which bothered me for long which also suits this example: Say X and Y are random variables representing the time taken for each report, I'd have X+Y as the total time taken. Now, the distribution of X+Y can be evaluated by computing the convolution of X and Y and normalizing it. On doing so, will we get the same distribution as what the Monte Carlo simulation has given us? Does Monte Carlo perform essentially the same thing as convolutions? If it does, why is it better than convolutions? Eager to see your answer as I'm working on a project where I need to decide between using these two...(also do correct me if I went wrong somewhere)
My project is to estimate the GPA distribution of my batch through the grading stats of each course that we get. The grade in each course being a random variable and the GPA being a weighted average. Was thinking of convolution approach to do this but now Monte Carlo seems computationally inexpensive compared to it(if it could work for this scenario that is)
Thanks, @gokulr8755, for this really fantastic question. You’ve made a really good point. In this problem, we are summing not just two random variables, but two independent random variables. The perfect case for convolutions to estimate the distribution of A + B. Here is then when I see Monte Carlo as being appropriate: 1. Fairly large, complex problems, where determining the underlying distribution for some application is difficult (or at least fairly involved) to compute. 2. Situations where you are satisfied with an “approximate” answer. From my own personal experience, the latter situation is what we typically find in practice. There are a couple of reasons. One of which is that, in the video problem, the true distributions of A and B are likely unknown. We are likely estimating those distributions from data, but in statistics the true distributions for A and B are not truly known. In addition, I should note that we may use Monte Carlo simulation to learn a “good” set of decision-rules. These days, I mainly work on reinforcement learning problems, where we will (1) simulate different realizations of the future; (2) make decisions in this simulation environment; (3) observe the reward of those decisions; and (4) update our decision-making policies based on the observed rewards. This did not really apply in this problem, as I was only interested in the distribution for a single decision (I.e., do the two reports), but Monte Carlo simulation is quite valuable in this context. I’ll plan on covering this in 3 videos. Lastly, I should note that, depending on the problem, you may be interested in other sampling methods such as Latin Hypercube sampling, which are generally more efficient and require less samples. Really great questions. I sincerely appreciate it.
@@RiskByNumbers Ah yes I'll look into that professor... It'd be great if you could bring out videos about RL, Markov chains, queuing, simulated annealing etc as these are some really cool topics of applied math in ML. Wanna get some ideas from you for good RL projects too in a video (which involve a bit of coding)...as an undergrad I(and many others) would need to show and test what they learned by doing projects
Hi @alperkaya8919: correct! Perhaps more precisely: there is (approximately) a 12% chance that the 2 reports will take you more than 9 hours to complete. As the party is in 9 hours, there is a 1 - 0.12 = 88% chance that you will finish both reports prior to the party's start time. Thanks for engaging with this content.
@@RiskByNumbersmany things, mostly verifying my calculations. For example when I was a beginner and was not sure how to calculate the probability of one random variable from Normal dist being greater than another one from a different Normal dist. Or during covid instead of solving the diff equations to find sick people etc. I just simulated it using probabilities and many interations
1. "You can pretty quickly estimate that there is a reasonable chance that you won't make it to the bbq in 9 hrs. Around 12-13 %". How is 12-13 % reasonable? That is very low. It sounds like you have quite a good chance to make it to the bbq no? 2. In most cases, the values will not be uniformly distributed obviously. Worst case scenarios are less likely to happen than non-worst case scenarios. How do you input this programmatically into matlab? can you make a video on that?
Hi Val -- these are all excellent comments and points. With regards to your first comment: good point around the word choice "reasonable". It is making me think that I should put together a short video around risk and probabilities across contexts. As a simple example, engineering structures such as buildings and bridges are designed with a much lower probability of failure than 12-13%. A 12-13% probability of a failure would be considered quite high given the consequences of a failure could be catastrophic. Conversely, a 12-13% chance you are late to a meet-up would likely be viewed as more acceptable, in part because the consequences are much lower. As for your second comment: yes, I can definitely provide a tutorial on other distributions! I similarly received a good question in my video where I covered fitting data to a distribution in Python. I was asked how do we determine which distribution is "appropriate" (e.g., Gaussian, log-normal, etc.). So, I'll provide a tutorial on that topic soon. Feel free to reach out to me via email to continue the conversation!
Find the probability p(A) of the event A- that when throwing two "fair" dice, the sum of the numbers obtained was 7. Create a program for the simulation of n series (n>30) of 25 such switching and selection of balls, which at the output gives Xi - the number of realizations of event A in the i-th series (i=1,...,n). Consider the obtained sequence X1,...,Xn as a random sample from the Bin(25,p), 0
Only looking at those Uniform values I'd able to know with simple calculations the average amount of time to finish those reports... just saying, you used a nuclear bomb to crush a mosquito instead of crushing it with your feet...
@joacimjohnsson6704: there is actually a nice analytical solution to this particular problem. I'd also suggest checking out the Irwin-Hall distribution, which is the distribution for the sum of "n" independent standard uniform random variables: en.wikipedia.org/wiki/Irwin-Hall_distribution. When "n" is large for the Irwin-Hall distribution, indeed it approximately follows a normal distribution. Conversely, if n = 2, we arrive at a probability density that resembles the one in the video. Based on this comment as well as one other in the thread, I am thinking that it may be worthwhile to step through convolutions on this channel. Thanks for the feedback, and definitely reach out to me directly to continue the conversation.
I don't really find this helpful. There is no explanation for what exactly a monte carlo simulation's defining idea is. Does the distribution have to be uniform? What if it's not? Based on the example, it seems to be adding two random variables and looking at their distribution, is that all? More (complex) examples would have helped too. This video left me with more questions than I came with.
@scalex1882: thanks for the comment. I appreciate that you took the time to (1) view this video; (2) provide feedback; and (3) list clarifications questions that you have after watching it. I'd be happy to expand on this topic with you, perhaps over email. Feel free to reach out at riskbynumbers@gmail.com.
Don't we need to calculate standard deviation? For example we assume the tolerance is within 3 standard deviations, and divide each tolerance by 3 to calculate standard deviation of each report? So in this case the standard deviation of the first report would be 4/3 and the second report would also be 4/3. I've done Monte Carlo for mechanical quality engineering before using Minitab and we had to assume standard deviation for the process (3 sigma or 6 sigma). I've never done it in python looks pretty cool and I'm hoping to learn it thank for the video
@TheGooner-uh2jx: these are really great points/questions. We actually don't need to define the standard deviation explicitly, as it is implicitly defined already via our distribution. What do I mean by that? The variance for a uniform distribution happens to be 1/12*(max-min)^2. Based on that, the variance for our problem is 16/12 = 4/3 for each report, and the standard deviation is therefore (4/3)^0.5. The above equation can be derived by knowing the following relationship: Var[X] = E[X^2] - E[X]^2. I'm thinking that this is of worthy discussion on the blog that I've just started to put together. Long story short, when you define any parametric distribution, the parameters selected define the variance implicitly. For certain distributions, such as the normal distribution, you need to define the standard deviation/variance given that it defines its probability density. Let me know if that helps, and feel free to reply with further questions!
@@RiskByNumbers Thanks for the reply. I didn't realize your example was Uniform Distribution (didn't even know what that was until now). In my past experiences we mostly assumed Normal Distribution in Minitab. Excited to learn from your blog! I'm a mechanical engineer with very basic statistics background. I have a lot to learn!
I didn't use numpy or matplotlib, but I was able to estimate it with just the random library (changing num_of_trials as I like): import random num_of_trials = 10000 time_limit = 9 num_of_successes = 0 num_of_failures = 0 for _ in range(num_of_trials): first_report = random.randint(1, 5) second_report = random.randint(2, 6) if first_report + second_report
@seyproductions: awesome! As a small comment, you could actually use the random library to model the time to complete each report as both discrete (as you've done above with 'randint') or continuous! To do the latter, you could just change the following 2 lines of code to: first_report = random.random()*(5-1)+1 second_report = random.random()*(6-2)+2 This is inverse transform sampling, which forms the basis of other sampling methods. The idea is that the CDF for our uniform random variables (and any other random variable, for that matter) lies between 0 and 1. The CDF for a uniform distribution for the range between the minimum and maximum is: F(x) = (x-min)/(max-min). We can also rewrite this equation as: x = F(x)*(max-min)+min. Therefore, we use random.random() to generate a random CDF value between 0 and 1 (i.e., F(x)), and map that out to x. It is a pretty neat concept. Interestingly, the probabilities of completing the reports in 9 hours is extremely close for the discrete (88%) vs. continuous (87.5%) approaches (though just slightly different)! Finally, thanks for posting this comment -- really appreciate the effort and thought.
“Alexa, tell Chatgpt to write my 2 reports. And message Monty and Carlo that I’m on my way to the BBQ. “
🤣 @RichardRietdijk: one of the best comments received yet -- it made my night!
Great video! It took me about 15 minutes to set up the convolution integral, and do all the algebra to get the same result: 1/8.
Thanks, @vitor00! Also glad that you were able to work out the analytical solution -- well done!
How does this video not have many views? Well done, very clear and interesting.
Thank you for the positive feedback and kind note -- I sincerely appreciate it!
Even with a background in probability, Monte Carlo is still used frequently. This is because even in simple scenarios calculating the expected value becomes very complex very quickly.
Definitely true! Thanks for providing this feedback.
You can tell Todd, that one of the tasks will be ready. Freshly waiting for him on his deck, and the next one will be on his desk the next morning.
Poor Todd, @zp5808. He's really just upset that he was not invited to the BBQ. 😂
@@RiskByNumbers 🤣
Well he certainly isn't saving time running this simulation before getting to work.......
LOL @jameslinzmeier368, touché!
Although trivial example for demonstration that you can get away with CLT and sum of two Gaussian for easier approximate (mean 7 and std 1.632), very good explanation of Monte Carlo simulation for freshman
Thanks, @matholive6994! Definitely a simple example that you could solve through alternative means. When I find the time, I'm going to create a blog where I can discuss in greater depth the different ways to solve a problem like this one. Thanks again for the kind words!
@@RiskByNumberslooking forward to it!!
In fact, the assumed probability distribution has an impact on the result (as always in general). For example, in the case illustrated in the video, under Monte Carlo simulations, the probability of needing at least 9 hours is around 12.5% under uniform distribution; around 2.2% under normal distribution, and around 4.2% under triangular distribution.
Your channel is fascinating. Thanks for taking the time to compile these insightful videos.
@jonmacdonald1413: thanks for this comment and for taking the time to visit the channel. Really appreciate it.
This is my new favorite field to study 😅
@adelshahbakhsh2683, awesome and welcome!
Very helpful, this is exactly what I've been looking to do in Python.
Thank you, @KpxUrz574! I really appreciate the positive feedback.
This video makes Monte Carlo Simulation easy to understand. Thank you, I wish I had my prof being able to explain that simply
Thank you for the kind note! I really appreciate it.
This video is just crap. Nothing else!! Monte Carlo is NOT by any means used in statistics or data sceince.
OMG! I just read that you're in Van! That's so cool! I go to school at Uvic.
@jamesdennis6120: great to hear from you! Thanks for the note, and I hope that you are enjoying your time at UVic. If you find yourself roaming around UBC, feel free to reach out!
Beautiful, brief and very informative. Well done
Thank you, @eamonmolloy8308! Sincerely appreciative of the kind note.
This was an excellent demonstration, thank you
Thank you, @adamjc86, for the kind comment! Really appreciate it.
This was much simpler than what I was expecting. Wish you create a video on MCMC as well.
Thanks, @othmanmurad5267! That was the goal of the video, so I'm glad it resonated with you. And, yes, definitely intending to put together a video on Markov chain Monte Carlo in the not too distant future.
Tod has a 100% chance of receiving a letter of resignation
🤣
It would be great to have a video where you simulate the time to do both reports but where the time to do report 2 has some dependency on report 1.
Thanks for the suggestion, @johnmaskell5124! I really enjoy the intuition and simplicity of Cholesky decomposition when introducing folks to generating correlated random variables, so I'll plan on adding that to the list of future videos.
Given the time it takes to create each of these animated videos, I am thinking that I'll make a blog soon, where I can provide more rapid thoughts on these topics. I'll announce that once I've set it up. Thanks again for your comment -- Omar
@@RiskByNumbers i support this, keep going
Nice, informative, and to-the-point video. Thanks for the video and explanation.
Thanks, @imyashbhatt! I appreciate the kind words.
thanks for explain it so clearly that even i can understand
Thanks for the kind words, @YuXingSong!
Thankyou sir, great video i am a research scholar, and wanna apply on my research, ur video has cleared MCS concepts, thanx
Thank you, @azkaazmi1374! I appreciate the note. Good luck on your research, too!
Hi Omar, really enjoying the way you teach. Is there any advise you have to those new to probability theory, where to start etc for those interesting in the practical application of risk? It's fascinating!
@LiamRohan_: really great question, and thanks for the positive feedback!
In terms of learning about probability theory itself, I find that there are less 'accessible' resources than those related to statistics. This may be due to the fact that it is quite easy these days to import different libraries and use them to estimate a model.
Having said that, I think that MIT OCW has a great primer on probability via 18.05. If you want to purchase a text, Ang and Tang have a good text titled 'Probability Concepts in Engineering' that I've seen used for courses in civil engineering, nuclear engineering, mining, and others.
Now, there are even less resources in terms of practical application of risk analysis. Richard de Neufville is one author/researcher who comes to mind that provides practical resources around risk analysis and modeling for real problems. He focuses quite a bit on flexibility (which, in principle, underlies methodologies such as reinforcement learning) to better handle uncertainty and discusses so via quantitative as well as qualitative means.
Is there a particular application you are working on these days?
best for explanation wow .. thank you
Thank you, @sonalikamble5929, for the kind words!
You say that Monte Carlo analysis assumes a uniform distribution - meaning that all possibilities between the specified min and max values are equally likely to happen. But this is very rare, isn't it? If you know the shortest person is 3 feet tall and the tallest person is 8 feet tall and you programed an MCA to predict a stranger's height with this min/max range, wouldn't it be just as likely to predict a stranger is 3 feet tall as it would be to predict that they are 5'8"? You're going to be right MUCH more often if you guess a stranger is 5'8" than if they are 3 feet tall. Is this a weakness of MCA? Thanks. (Or wait...does lots of sampling within that range predict/simulate a normal distribution? But that's also a false assumption often, for example with stock market returns isn't it?)
@dalrax4: really great set of questions! Thanks as well for reaching out!
For our Monte Carlo simulation, we could have actually assumed our distributions came from any available distribution to us. In this video, I used NumPy, and as you can see we actually have a broad set of distributions available to us for random sampling: numpy.org/doc/stable/reference/random/legacy.html.
Your point highlights an important aspect of Monte Carlo simulation. Unless we specify our distributions (reasonably) correctly, we may be facing a case of creating a model that is 'garbage in, garbage out'.
This point also highlights why, in many cases, an individual may be okay with just 1,000 Monte Carlo simulations. Given that we typically do not know our underlying distribution (it is estimated through various statistical means) for our inputs, we are oftentimes okay with an approximate estimate of the distribution for our response variable of interest (in this case, the total time to complete both reports).
Your point about stock market returns is great (and relevant to the channel)! I'll actually be talking about the usefulness of natural logarithms for modeling financial data in the next video. I've recorded the audio and have made the animations -- I'm just finalizing the stitching of the two together.
@@RiskByNumbersThank you for clarifying that!
The distribution at 4:55 seems a bit strange. Could be because of the way python plots it. I did the experiment in MATLab and I had distinct peak at 7 (which is indeed the expected value of the random variable "duration" i.e. E[duration] = E[A] + E[B] = 3 + 4 = 7) and probability that it will take more than nine hours came out to be 12 %. (I calculated it using the CDF i.e. P(X > 9) = 1 - P(X
@wanderer291: thanks for the comment! Do you want to provide your MATLAB script?
The plot at 3:00, in theory, is the same plot at 4:55, but a bit more cleaned up with the number of bins increased.
A couple of other comments:
1. If the time to complete both reports is modeled as a uniform, continuous random variable, then the resulting distribution should be triangular. This can be shown via convolutions. In addition, check out the Irwin-Hall distribution, which helps explain the resulting distribution when summing a series of uniform random variables.
2. The answers are close, but slightly different (12% vs. 12.5%), if you model the time to complete each report as discrete vs. continuous. For the discrete case, you can work out the answer quite quickly. For Report 1 and Report 2, you have 5 possible outcomes with equal probability (i.e., 1 in 5). Therefore, if we assume independence, the joint occurrence for a time to complete both Report 1 and Report 2 is 1 in 25 (i.e., 20% x 20%). We have 3 discrete combinations that exceed 9 hours: 4-6, 5-5, and 5-6. As each combination is mutually exclusive, we therefore have a 3 in 25 (or 12%) chance of exceeding 9 hours.
@@RiskByNumbers Thanks for the exaplanantion.
I am sure I saved the code but I can't find it anymore (have a lot of m files). I did use discrete distribution though. I think it was the number of bins, as you said, that was causing the difference.
Pretty good explanation, thanks!
Thanks, @tkmushroomer. Much appreciated!
Very interesting! I would have approached it differently before i knew about this. assuming the average of both reports is in the middle of the time ranges given. for instance if 1-5hr and 2-6hrs, then I would assume 3hrs and 4hrs respectively for each report and got a quick assumption that I would take about 7hrs total. So can you also say its a 50/50 chance of taking longer than 7hrs?
@washington_pc3306: thanks for the note!
Your reasoning is very intuitive and sound. In this particular case, you are correct, which is due to the 'linearity of expectation'. If I sum multiple random variables, their expected value (i.e., mean) is the sum of their individual expected values. As the expected values for these 2 reports were 3 and 4 hours, respectively, then their summation is equal to 7 hours.
Now, a couple of nuances. Linearity of expectations speaks to the mean. The mean may differ from the median (i.e., 50th percentile), which may complicate things. Second, determining the probability of exceeding, say, 9 hours is actually a bit involved. One distribution to check out is the Irwin-Hall distribution, which is the distribution for the sum of 'n' independent uniform distributions lying between 0 and 1. It is quite an interesting result. To solve this problem in the video, one can use convolutions.
The nice aspect of Monte Carlo is that it can get you 'approximately' the right answer pretty quickly these days. And, furthermore, there is usually so much uncertainty in the underlying distributions (e.g., is Report 1 really going to take 1-5 hours, or perhaps 0.5-5.5 hours?) that an approximate answer is usually quite good.
Thanks again for the comment. Feel free to follow up with me!
Such a great video, thank you so much for it. You got another subscriber!
Thank you, @pedrocolangelo5844, for the really kind message! I'll be getting the next video finally out next week!
Could this particular problem be solved with central limit theorem instead of MC
@dmitriyivkov7578: great question. Technically, you could actually solve this problem via convolutions.
A neat, related distribution for this problem is the Irwin-Hall distribution, which represents the distribution for the summation of 'n' independent, standard uniform random variables. I'd suggest checking it out!
If you sum 2 standard uniform random variables (assuming independence), you arrive at a triangular distribution. However, as 'n' grows large, the resulting distribution tends to the normal distribution per the central limit theorem.
excellent video.
Thank you, @ever55!
Great video. Simple and useful. Thanks!
Thanks, @altr0403! Your kind note means a lot. Thanks as well for subscribing!
Ask Todd what is his priority out of the two reports? He’s the boss and knows the answer to that question (even if you know the answer in your opinion). If he continues to say both then you say attempting both increases the risk that neither report will be finished so again, what is the priority?
I’ve learnt over many years to not break your back for a boss. When I get treated by overly demanding bosses and customers is to hold back asking questions they have the answers to when time constraints start to cause problems you can raise the question giving many options and say you can’t complete it or even start working on it till the answer comes in. This does need working out before hand. But seems to be exactly what solicitors do in the uk when you chase them up….
@MrMeltdown: it's been pretty neat for me to see the range of responses to this video. Although the intended focus of this video was less so around the correct course of action for this situation, I enjoyed reading your response and your perspective.
Personally, I have found that providing a rationale and acknowledging the other's perspective can go a long way in these situations. I find this even when I'm teaching.
Explaining 'why' I've structured an assignment in a certain manner goes a long way in terms of student buy in. Similarly, acknowledging how student feedback from past years has informed certain choices in the current year gives students a sense of voice and acknowledgement. I think this extends to more traditional employer-employee relationships, too. Now, whether that actually happens is another story...
Beautify stuff! And nice visuals, where do you edit your visuals?
Thanks, @briancruz3551! I appreciate the kind words.
For the visuals in this video, I relied on a few tools. First, for the graphical animations (e.g., the Monte Carlo sampling), I wrote up a small Python script. I am doing that more and more in my videos, primarily with matplotlib.animation. As I use matplotlib so frequently for work, I have used to date, but this summer I'll explore other libraries. Second, for the screen recording I used OBS Studio (which is free!). Finally, to bring all the animated clips together, I use Premiere Pro.
Launching this channel has given me a much greater appreciation around other channels that I watch on UA-cam in terms of their time and effort!
Final exams end this Friday for me, so I'll be much more timely around my output soon!
Really intuitive and cool
@bernardogeocometto5562: thank you!
What doesn’t make any sense with this scenario is why would you randomize both A and B scenarios? How does finishing report A in 1 hour have the same equal chance of finishing it in 6 hours? Shouldn’t these hours have probabilities attached to them as well? Could you weight each hour into the randomizers?
@a1cswiz1611: great question (and apologies for the delayed response!).
For each of these two reports, we are assuming that there is uncertainty around to complete them. For Report A, it is 1-5 hours and, for Report B, 2-6 hours. We can use our probability density function to define this uncertainty. You could also use a probability mass function, if we wanted to model time to complete each report as a discrete random variable. In NumPy, you can do this via np.random.choice(), and you can alter the probability masses as you see fit (i.e., each outcome does not need to have an equal probability of occurrence).
The key aspect here is that the probability densities/masses should be representative of the uncertainty. If you believe certain times to completion have a higher relative likelihood, you should select a probability density/mass function representative of that situation. The nice aspect of Monte Carlo simulation is that it can adapt to alternative distributions.
I hope that makes sense. Do not hesitate to follow up if you have any further questions.
Omar
Very nice video. One of the best I’ve come across. It’d be great to cover when probabilities aren’t evenly distributed.
Really appreciate the comment and feedback, @muzahmad2104!
Now that I've received quite a few comments around distribution selection, I'm switching up the order of upcoming videos and having the next one focus on that topic. The comments received have been really valuable in determining what to focus on next. Thanks again for the feedback and positive note!
its a pleasure, look forward to it. Keep up the good work
Nice presentation of the power of Monte Carlo. Also used in all its glory (many parameters) in analog integrated circuit design.
Thanks, @OutOfRangeDE, for highlighting another application for Monte Carlo! It has been neat to hear from many folks around how they've applied Monte Carlo simulation. Your comment sent me down a 30-minute rabbit hole of reading up on its use for circuit designs -- its been great to learn via the comments.
well explained. thanks
Thank you!
❤ Plz. do consider an Experiment with Sub or Super martingale system in Future with Python
I was planning on covering Markov decision processes eventually, so this would be a great moment to discuss these topics. I sincerely appreciate the interest in the channel, the suggestion, and the patience in awaiting new content!
It seems like what you do is calculate (or more specific approximate) the area under the curve with a limited amount of points. Let's say you have N points, and you want to use these N points to approximate the area under the curve. The monte carlo simulation now expects you to throws these N points "randomly" (uniformly) onto the curve increasing the chances to cover enough area for good approximation accuracy.
_
@statistik4908 you may be interested (if you have not read it already) in McKay et al.'s 1979 paper that compares random sampling (as done in this video) vs. stratified sampling: www.jstor.org/stable/1268522. A really good read!
Excellent video.
@Tommybotham: thank you!
@@RiskByNumbers I have a complex physics problem that I soon need to simulate, involving ionic particles (with a velocity distribution) that come in at different incident angles with some probability to neutralize on a surface AFTER which I'd then have to simulate the directions they go afterward. Monte Carlo came up and so I needed to brush up my knowledge and build some intituition to solve this problem.
Your video greatly helped me get down some ideas on how to solve my problem :)
@@Tommybotham thanks for the reply, and apologies for just getting back to you. I've really enjoyed learning about the different problems that others have analyzed/solved via Monte Carlo simulation. Glad the video could be of some utility, and I hope that you were able to solve it!
excellent and straightforward🎉
Thank you! Trying my best to make sure next week's video is straightforward, too!
Great Video! However, I am confused by what the output probability would be. In the video, would the 0.125168 be the probability of the tasks being completed in time or the probability that the deadline is missed? Thanks,
Thanks, @charliehunter_! It was meant to be the probability the deadline would be missed. I summed up the number of instances that exceeded a threshold (in this case 9 hours) versus the total number of simulations. So, 12.5% is approximately the probability it takes us more than 9 hours (implying the deadline is not met). Thanks again!
I don't understand the benefit of initiating a random distribution here when the values between 1-5 hours and 2-6 hours should statistically occur with equal frequency. Why do I need the computer to generate random numbers when, in an infinite number of attempts, there should always be a 20% probability of each occurrence?
@SubZero101010: thanks for the great question.
This particular problem was an instance where you could analytically solve for the distribution around the sum of these two random variables (per the discussion with @gokulr8755 in the comments).
As mentioned by @andrewevanyshyn1709, seemingly 'simple' problems become quite complex quickly, and so this technique is a nice way to deal with this situation.
Furthermore, this approach is a nice way to validate one's intuition and understanding of probability. For example, it is not necessarily obvious that the sum of a series of standard uniform distributions follows the Irwin-Hall distribution. Therefore, I sometimes like to use this technique as a way to validate that my derivations for simpler problems to ensure that they are sound.
Ok I could be wrong, but to find the percentage that you will miss the barbecue, you divided by sims which is 1mil. But the entire data set that is computed is 2*sims data points (A+B and both are 1 million points). So to find the actual chance you will miss the barbecue isn’t it actually closer to half of what you computed?
@taylorolp8090: really great question! The length of all 3 arrays is actually 1 million.
As you mentioned, for "A" and "B" we have generated 1 million random values. When we implement "duration = A + B", we will actually create another array that is the same length as "A" and "B".
The value for the first indexed value, duration[0], will be A[0] + B[0]. In fact, for any index value, i, duration[i] = A[i] + B[i]. In other words, we will have 1 million values, where each value is the sum of a random realization of A and another random realization of B.
I hope that makes sense. Feel free to reply or follow up with me directly if anything is still ambiguous.
Thank you for the simple explanation.
Many thanks!
Please continue to post content. Subscribed!
Thanks, @ambrosianas7505, for subscribing! I'll definitely continue to post content -- thanks again!
Great intro.
Thank you, @danielzuzevich4161! Much appreciated.
I love this
Thanks, @stt5858! Another video will be out this Saturday.
What a great video!
Can you do a video on hypothesis testing using Python?
Thanks! That is definitely something I can introduce. The next few videos will focus on a few algorithms (e.g., dynamic programming, reinforcement learning), but the content afterwards is still up in the air.
Thanks again for the comment and request.
Can you help me in my project on Monte Carlo simulations
are you and the organic chemistry tutor blood related?
Why, thank you, @retajmohamed859! That is quite the compliment. The short answer is “no”. Thanks again for the comment and for checking out the channel!
Had you started to work on the reports immediately instead of writing a program to inform you of your chances of making your party, you’d be done already 😊
😂 Funny enough, my spouse said roughly the same thing when I showed her the video, @dang4546.
New sub right here
Awesome @raymondnepomuceno8815! Welcome to the channel! Really appreciate the support.
Why not just find the middle of them. 1 to 5 = 3.5, and 2 to 6 = 4.5, total 8 hours. What do you think?
@kimsourtat601: great question! In this case, the expected value (the median happens to be the mean) of our two random variables is 3 and 4. The expected value of their summation is simply 3 + 4 = 7. However, ignoring the full distribution means that we cannot comment on other statistics (e.g., variance) nor explore, for example, the probability the total duration exceeds some threshold (in this case, 9 hours). Monte Carlo simulation, however, helps us do that without working out the solution analytically! I hope that helps.
@@RiskByNumbers Thank you so much, the expected value is 7 hours (my bad is 8 hours). In this case he has 9 hours left to complete the task but there is 12.5% chance he could complete, if he take this task, he is unlikely to join his party. : (
@@RiskByNumbers This helps. Thx!
You tell your boss you can do 1 report today and part of the other, finish it in the morning.
Question which bothered me for long which also suits this example: Say X and Y are random variables representing the time taken for each report, I'd have X+Y as the total time taken. Now, the distribution of X+Y can be evaluated by computing the convolution of X and Y and normalizing it. On doing so, will we get the same distribution as what the Monte Carlo simulation has given us? Does Monte Carlo perform essentially the same thing as convolutions? If it does, why is it better than convolutions? Eager to see your answer as I'm working on a project where I need to decide between using these two...(also do correct me if I went wrong somewhere)
My project is to estimate the GPA distribution of my batch through the grading stats of each course that we get. The grade in each course being a random variable and the GPA being a weighted average. Was thinking of convolution approach to do this but now Monte Carlo seems computationally inexpensive compared to it(if it could work for this scenario that is)
Thanks, @gokulr8755, for this really fantastic question.
You’ve made a really good point. In this problem, we are summing not just two random variables, but two independent random variables. The perfect case for convolutions to estimate the distribution of A + B.
Here is then when I see Monte Carlo as being appropriate:
1. Fairly large, complex problems, where determining the underlying distribution for some application is difficult (or at least fairly involved) to compute.
2. Situations where you are satisfied with an “approximate” answer.
From my own personal experience, the latter situation is what we typically find in practice. There are a couple of reasons. One of which is that, in the video problem, the true distributions of A and B are likely unknown. We are likely estimating those distributions from data, but in statistics the true distributions for A and B are not truly known.
In addition, I should note that we may use Monte Carlo simulation to learn a “good” set of decision-rules. These days, I mainly work on reinforcement learning problems, where we will (1) simulate different realizations of the future; (2) make decisions in this simulation environment; (3) observe the reward of those decisions; and (4) update our decision-making policies based on the observed rewards. This did not really apply in this problem, as I was only interested in the distribution for a single decision (I.e., do the two reports), but Monte Carlo simulation is quite valuable in this context. I’ll plan on covering this in 3 videos.
Lastly, I should note that, depending on the problem, you may be interested in other sampling methods such as Latin Hypercube sampling, which are generally more efficient and require less samples.
Really great questions. I sincerely appreciate it.
@@RiskByNumbers Ah yes I'll look into that professor... It'd be great if you could bring out videos about RL, Markov chains, queuing, simulated annealing etc as these are some really cool topics of applied math in ML. Wanna get some ideas from you for good RL projects too in a video (which involve a bit of coding)...as an undergrad I(and many others) would need to show and test what they learned by doing projects
Bro, amazing video!
Thanks, @davidclercrodriguez5801! Really appreciate the kind words.
So it means you will make it to the party with 88% of chance?
Hi @alperkaya8919: correct!
Perhaps more precisely: there is (approximately) a 12% chance that the 2 reports will take you more than 9 hours to complete. As the party is in 9 hours, there is a 1 - 0.12 = 88% chance that you will finish both reports prior to the party's start time.
Thanks for engaging with this content.
@@RiskByNumbers Thank you very much for explaining!!!!!!
What is the probability that I will like this video? 😂😂
Hopefully higher than the probability that Todd causes you to be late to the party 😂
Invite Todd to the barbecue 😊
@aalb1970: on it! I'll invite him for this coming weekend.
great!
Thank you!
Was hoping monte carlo could help me get the reports done in 5 minutes so i could duck out early and pregame before my party.
@anzov1n: 😂😂😂 I suppose if the reports required that you compute some underlying probability distribution, they maybe!
I'm more of a fan of the Jose Cuervo simulation
Do what you are telling me is that I have been using MCS for years without knowing it
I enjoyed this note, @matejnovosad9152. If you don't mind me asking, for which applications/domain?
@@RiskByNumbersmany things, mostly verifying my calculations. For example when I was a beginner and was not sure how to calculate the probability of one random variable from Normal dist being greater than another one from a different Normal dist. Or during covid instead of solving the diff equations to find sick people etc. I just simulated it using probabilities and many interations
Monte carlo is so intuitive i thought of it before i knew it was a thing
Haha that is awesome, @KinomaroMakhosini!
u good
thanks
Thanks!
1. "You can pretty quickly estimate that there is a reasonable chance that you won't make it to the bbq in 9 hrs. Around 12-13 %". How is 12-13 % reasonable? That is very low. It sounds like you have quite a good chance to make it to the bbq no?
2. In most cases, the values will not be uniformly distributed obviously. Worst case scenarios are less likely to happen than non-worst case scenarios. How do you input this programmatically into matlab? can you make a video on that?
Hi Val -- these are all excellent comments and points.
With regards to your first comment: good point around the word choice "reasonable". It is making me think that I should put together a short video around risk and probabilities across contexts. As a simple example, engineering structures such as buildings and bridges are designed with a much lower probability of failure than 12-13%. A 12-13% probability of a failure would be considered quite high given the consequences of a failure could be catastrophic. Conversely, a 12-13% chance you are late to a meet-up would likely be viewed as more acceptable, in part because the consequences are much lower.
As for your second comment: yes, I can definitely provide a tutorial on other distributions! I similarly received a good question in my video where I covered fitting data to a distribution in Python. I was asked how do we determine which distribution is "appropriate" (e.g., Gaussian, log-normal, etc.). So, I'll provide a tutorial on that topic soon.
Feel free to reach out to me via email to continue the conversation!
A video about this would be great too! Im also intrested in more of this kinda of content@@RiskByNumbers
On it@@tito9641 !
Find the probability p(A) of the event A- that when throwing two "fair" dice, the sum of the numbers obtained was 7. Create a program for the simulation of n series (n>30) of 25 such switching and selection of balls, which at the output gives Xi - the number of realizations of event A in the i-th series (i=1,...,n). Consider the obtained sequence X1,...,Xn as a random sample from the Bin(25,p), 0
Only looking at those Uniform values I'd able to know with simple calculations the average amount of time to finish those reports... just saying, you used a nuclear bomb to crush a mosquito instead of crushing it with your feet...
bayesian statistic 101
Thanks, @kdifiskeicis284kd72j, for engaging with this content! I appreciate it.
thank you
Many thanks!
The result should be a normal dist. But it looks more like a triangular..... something is wrong in the plot.
@joacimjohnsson6704: there is actually a nice analytical solution to this particular problem. I'd also suggest checking out the Irwin-Hall distribution, which is the distribution for the sum of "n" independent standard uniform random variables: en.wikipedia.org/wiki/Irwin-Hall_distribution. When "n" is large for the Irwin-Hall distribution, indeed it approximately follows a normal distribution. Conversely, if n = 2, we arrive at a probability density that resembles the one in the video.
Based on this comment as well as one other in the thread, I am thinking that it may be worthwhile to step through convolutions on this channel. Thanks for the feedback, and definitely reach out to me directly to continue the conversation.
I don't really find this helpful. There is no explanation for what exactly a monte carlo simulation's defining idea is. Does the distribution have to be uniform? What if it's not? Based on the example, it seems to be adding two random variables and looking at their distribution, is that all? More (complex) examples would have helped too. This video left me with more questions than I came with.
@scalex1882: thanks for the comment. I appreciate that you took the time to (1) view this video; (2) provide feedback; and (3) list clarifications questions that you have after watching it. I'd be happy to expand on this topic with you, perhaps over email. Feel free to reach out at riskbynumbers@gmail.com.
s
Something is deeply wrong with Todd
Get Lost, Todd
😂😂😂
Yeah, Tod blows
Don't we need to calculate standard deviation?
For example we assume the tolerance is within 3 standard deviations, and divide each tolerance by 3 to calculate standard deviation of each report?
So in this case the standard deviation of the first report would be 4/3 and the second report would also be 4/3.
I've done Monte Carlo for mechanical quality engineering before using Minitab and we had to assume standard deviation for the process (3 sigma or 6 sigma).
I've never done it in python looks pretty cool and I'm hoping to learn it
thank for the video
@TheGooner-uh2jx: these are really great points/questions.
We actually don't need to define the standard deviation explicitly, as it is implicitly defined already via our distribution. What do I mean by that? The variance for a uniform distribution happens to be 1/12*(max-min)^2. Based on that, the variance for our problem is 16/12 = 4/3 for each report, and the standard deviation is therefore (4/3)^0.5.
The above equation can be derived by knowing the following relationship: Var[X] = E[X^2] - E[X]^2. I'm thinking that this is of worthy discussion on the blog that I've just started to put together.
Long story short, when you define any parametric distribution, the parameters selected define the variance implicitly. For certain distributions, such as the normal distribution, you need to define the standard deviation/variance given that it defines its probability density.
Let me know if that helps, and feel free to reply with further questions!
@@RiskByNumbers
Thanks for the reply.
I didn't realize your example was Uniform Distribution (didn't even know what that was until now). In my past experiences we mostly assumed Normal Distribution in Minitab.
Excited to learn from your blog! I'm a mechanical engineer with very basic statistics background. I have a lot to learn!
I didn't use numpy or matplotlib, but I was able to estimate it with just the random library (changing num_of_trials as I like):
import random
num_of_trials = 10000
time_limit = 9
num_of_successes = 0
num_of_failures = 0
for _ in range(num_of_trials):
first_report = random.randint(1, 5)
second_report = random.randint(2, 6)
if first_report + second_report
@seyproductions: awesome!
As a small comment, you could actually use the random library to model the time to complete each report as both discrete (as you've done above with 'randint') or continuous! To do the latter, you could just change the following 2 lines of code to:
first_report = random.random()*(5-1)+1
second_report = random.random()*(6-2)+2
This is inverse transform sampling, which forms the basis of other sampling methods. The idea is that the CDF for our uniform random variables (and any other random variable, for that matter) lies between 0 and 1. The CDF for a uniform distribution for the range between the minimum and maximum is: F(x) = (x-min)/(max-min). We can also rewrite this equation as: x = F(x)*(max-min)+min.
Therefore, we use random.random() to generate a random CDF value between 0 and 1 (i.e., F(x)), and map that out to x.
It is a pretty neat concept. Interestingly, the probabilities of completing the reports in 9 hours is extremely close for the discrete (88%) vs. continuous (87.5%) approaches (though just slightly different)! Finally, thanks for posting this comment -- really appreciate the effort and thought.