Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis. I explain this in the entire probability and likelihood videos too if that helps
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to UA-cam, you have a brother from the other side of the world. Thank you very much.
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number. Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
Tell me how do I use intuition vs probability to predict outcome of my 5 lottery deep training model? 😂
Could you please explain why we used mean and standard deviation when attempting to calculate the likelihood?
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis.
I explain this in the entire probability and likelihood videos too if that helps
Thank you so much. Such a good and simple explanation sir.
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to UA-cam, you have a brother from the other side of the world. Thank you very much.
finally, my searching of 2-3 hours and many videos on the likelihood rests. thanks man...
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
This is such a clear explanation. Great job my dude
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
Very well done, clear and concise!
Thank you! My first time trying this style out. So I’m glad it turned out well :)
one of the best explanations on youtube! well done sir!
Thanks a ton for watching
This is the best explanation of likelihood function. thank you so much for the video.
Another 🔥video! This man has an insane brain
Thanks Shashank! I’m just happy it’s useful 🙂🙂
This is very well explained, thank you!
Thank you for watching!
Thank you so much!! You made complicated concepts so easy to understand!!! Thanks again!
Super welcome and also very glad to hear :D
Thank you! My confusion goes away after watching this. Thumb up.
You are very welcome. Thanks for watching !
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
Seriously, one of the best explanations !
Nicely explained! I got better understanding of this, could you also include some examples which give some feel about the calculations...
Thank you so much. This video solved so many things for me.
Thank god, I clicked the videoooo
Thanks man people out there really like to make easy things difficult ty og
The mean values are not well selected. Most of the samples are distributed around 200k. So the means have to be around 200k
Very good explanation of MLE. Amazing
Nice explanation
Awesome stuff! Just to clarify: logistic regression uses the binomial distribution; let's not confuse viewers with link functions and sigmoids.
Aren't sigmoids a whole family of functions that have certain properties?
Great explanation sir! Thx a lot!
Great job man. Thanks so much!
Thank you so much that was really helpful
wonderful, thanks for your clear explaining, pretty good
You are very welcome
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
awesome video. thank you!
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
Simply amazing!
Nice video! New topic...👍 Pl make video ML binary classification of time series forecasting using likelyhood equation. waiting for next video!
Hi sir can you do a video on why we use Basie n inferences and how to use them?
Excellent! Thanks :)
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
It should be probability Density on y axis. Not not probability since X a continious Random Variable
Yep! Going to make some videos around probability theory soon to clear this up. Good catch!
@@CodeEmporium yes please more probability theory videos is what we need
Super explanation. Thanks
Welcome! Thanks for watching:)
you are honestly #1
You are too kind :)
Thanks for the video
iid? i tot it was independent but non identical distribution, the fact that our data may come from different parameter values
obeservations y1 , y2 ..... yn are joint probability ? i didn't get that part .
With X values in the six figures, how can mu be a double digit number?
Great video dude
Thanks a lot!
I think you should include keyword: Maximum Likelihood, Log Likelihood Ratio, to your title to reach more audience.
Yea. I’ll keep this in mind. Thanks for the tip. Maybe I’ll change this title soon
Very nice review. Thanks.
You are very welcome!
Good stuff.
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
7.52
Another great video
Thank youu
Should have clarified that housing prices in practice are not independent. Perhaps use a better example.
Great explanation! Thanks, man. By the way, what Blackboard App are you using in this video?
Thank you! The app is called “explain everything “
Bro I got 4 ads watching this video. I hope this guy is making bank off of these videos
Great video.
Thanks a ton!
Why do we use pdf with well fitted parameter instead of histogram?
Nice
Thank you!
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
Writing red and green on a black background is very hard to read for colourblind people
Yea. I didn’t think it would look this dark. In future videos , I try to correct this. :)
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number.
Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
Waiting
Not much longer now :)
Smarter version of Aziz Ansari!
Next Level Explanation , Subscriber+=1 :)
Welcome aboard! Thanks a ton!
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Fake accent nothing else
God damn you explain so much better than my college prof.
Thanks a ton ! Hope you enjoy the rest of these videos :)
I got the benefit and enjoyment thank you
Thanks. It is such a nice explanation of the topic. Everything is explained well
Thanks so much for the compliment! And I am glad you liked it :)
Thx, life saver