For anyone that has trouble wrapping their head around why variable elimination is more efficient, writing out the explicit for loops to compute P(Y) was really helpful for me: If we assume W,X,Y,Z each have K possible values then we need to compute K^4 values to fill out the complete table for P(Y). The naive triple sum has K^3 terms and we need to compute this triple sum for each of the K possible values of Y, giving us a total of K^4 values. If we do variable elimination then we first compute f_W(x): for each value of X, call this x: f_W(x) = 0 for each value of W, call this w: f_W(x) += P(w)*P(x|w) Note: - big capital letters denote the random variable, lower case letters denote a value of the corresponding random variable. - f_W(x) is a table containing K numbers, one for each value of X - the innermost operation is "constant time" because we are just looking up these values in a table. - in total it takes K^2 operations to compute the f_W(x) table and then we store it away. Next we compute f_X(y): for each value of Y, call this y: f_X(y) = 0 for each value of X, call this x: f_X(y) += P(y|x) * f_W(x) Note: - f_X(y) is a table containing K numbers, one for each value of Y - the innermost operation is "constant time" because we are just looking up these values in a table! - in total it takes K^2 operations to compute the f_X(y) table and then we store it away. Last we can now compute P(Y) for each value of y: for each value of Y, call this y: P(y) = 0 for each value of Z, call this z: P(y) += P(z|y) * f_X(y) Note: - P(Y) is a table containing K numbers, one for each value of Y - the innermost operation is "constant time" because we are just looking up these values in a table. - in total it takes K^2 operations to compute this last table. Computing P(Y) by variable elimination takes 3 * K^2 operations, which is much less than K^4 for large K! Basically, by computing each of these tables in the right order we avoid repeating work that we already did.
Completely agree, my current professor makes it way to hard to understand, and I never understood what is the use of making things so abstract that students don't understand. What is then the point of education?
By far the best explanation of variable elimination; thanks for motivating via brute force/enumeration. For the longest time, it wasn't clear to me that VE was about computational spend not about being the only possible mathematical solution to a problem.
Around 9:43 you simply say that P(S|W,R) is reduced to P(S|W) but you never give a more formal explanation of why. I know it's because of conditional independence. You could have easily added clarity by stating that you started with the chain rule of probability and then applied conditional independence assumption. That would save anyone who has learned basic probability theory a few minutes of their time, instead of making them pause to think through what just happened there.
Thanks for that note. To make it even more explicit for people who still had to think about it (like me): If two variables A and B are independent, P(A|B) = P(A). Here, S and R are independent (which is counterintuitive, as mentioned in the video). Therefore, P(S|W,R) = P(S|W).
Very impressive, you make the model crystal clear, and I know that compute bayesian network is nothing than that to calculate a probability (for discrete variables), or a probability distribution (for continuous variables) efficiently.
For the slip node, can we say that the slip node is conditionally independent from rain? Or is it independent? Or is it still related indirectly? Does the order of summations in variable elimination matter? Also what are observed and unobserved variables? Ie are ancestor variables observed variables? Or are they the marginalized variables? Or something else?
Wait. What do the commas actually denote? It seems confusing that they're being used to denote both "AND" and "OR" (Union and Intersection) like at 14:49. Can someone explain what's going on?
very unclear and comfusing using venn diagrams to represent some of the probabilities and giving detail example of the math using numbers to show how it runs would be of great help, for people discovering the subject. I am fairly sure this is a great video for people who already understand the subject or have some grapst on it. But for new comer it is very confusing. not to mention the rise in difficulty between the first part which is quite easy to understand (although venn diagrams would help) and the second part which looks like elvish.
Great video but for the slipping bit your intuition isnt always true like it could be but if the ground is wet doesnt nessasarily mean it was raining as you said so it could not be raining and you could slip on dew covered grass. Loving this video tho as I dont know probability or bayesian classifiers which are in my literature for nns, okay you crossed out the intuition lol paused the video MB
For anyone that has trouble wrapping their head around why variable elimination is more efficient, writing out the explicit for loops to compute P(Y) was really helpful for me:
If we assume W,X,Y,Z each have K possible values then we need to compute K^4 values to fill out the complete table for P(Y). The naive triple sum has K^3 terms and we need to compute this triple sum for each of the K possible values of Y, giving us a total of K^4 values.
If we do variable elimination then we first compute f_W(x):
for each value of X, call this x:
f_W(x) = 0
for each value of W, call this w:
f_W(x) += P(w)*P(x|w)
Note:
- big capital letters denote the random variable, lower case letters denote a value of the corresponding random variable.
- f_W(x) is a table containing K numbers, one for each value of X
- the innermost operation is "constant time" because we are just looking up these values in a table.
- in total it takes K^2 operations to compute the f_W(x) table and then we store it away.
Next we compute f_X(y):
for each value of Y, call this y:
f_X(y) = 0
for each value of X, call this x:
f_X(y) += P(y|x) * f_W(x)
Note:
- f_X(y) is a table containing K numbers, one for each value of Y
- the innermost operation is "constant time" because we are just looking up these values in a table!
- in total it takes K^2 operations to compute the f_X(y) table and then we store it away.
Last we can now compute P(Y) for each value of y:
for each value of Y, call this y:
P(y) = 0
for each value of Z, call this z:
P(y) += P(z|y) * f_X(y)
Note:
- P(Y) is a table containing K numbers, one for each value of Y
- the innermost operation is "constant time" because we are just looking up these values in a table.
- in total it takes K^2 operations to compute this last table.
Computing P(Y) by variable elimination takes 3 * K^2 operations, which is much less than K^4 for large K!
Basically, by computing each of these tables in the right order we avoid repeating work that we already did.
Thanks!
Best explanation of probability I've received in my whole academic career, thank you
Completely agree, my current professor makes it way to hard to understand, and I never understood what is the use of making things so abstract that students don't understand. What is then the point of education?
My professor for AI explained this so badly that I had no idea what was going on. Thanks for this in-depth and logical explanation of these topics
I was struggling to understand this in my class. Glad I came here.
By far the best explanation of variable elimination; thanks for motivating via brute force/enumeration. For the longest time, it wasn't clear to me that VE was about computational spend not about being the only possible mathematical solution to a problem.
i have an assignment on this that i need to deliver in two hours and this video is saving me right now!
12.23 doesn't c,r mean car wash AND ( not OR) RAIN as mentioned in lecture
I struggled to understand this in my class, I'm glad I watched this video. These are very helpful.
Around 9:43 you simply say that P(S|W,R) is reduced to P(S|W) but you never give a more formal explanation of why. I know it's because of conditional independence.
You could have easily added clarity by stating that you started with the chain rule of probability and then applied conditional independence assumption. That would save anyone who has learned basic probability theory a few minutes of their time, instead of making them pause to think through what just happened there.
Thanks! I'll definitely try to clarify that better next time I teach this topic.
Thanks for that note. To make it even more explicit for people who still had to think about it (like me): If two variables A and B are independent, P(A|B) = P(A). Here, S and R are independent (which is counterintuitive, as mentioned in the video). Therefore, P(S|W,R) = P(S|W).
Thank you ! Good introduction
Excellent video. You brought up a lot of small things that I was confused about and explained them
This was great- please do more!🙏🏼
Your explanation is brilliant, it gives a very good intuition for the theory. Thanks a ton
How come condition is "Rain or Carwash" not "Rain and Carwash"?
Great video. Would love to see the code for that assigment.
What's the difference between enumeration and variable elimination anyway, still think it's only a difference in notation.
Very impressive, you make the model crystal clear, and I know that compute bayesian network is nothing than that to calculate a probability (for discrete variables), or a probability distribution (for continuous variables) efficiently.
Best explanation on the internet
Does "variable-elimination" imply: "the overall network's functionality got changed"? thanks
This is a great video on Bayesian Network. Other people creating videos should take a note from this one.
Thanks for great video! Helped me a lot in understanding this stuff for my Uni course :)
For the slip node, can we say that the slip node is conditionally independent from rain? Or is it independent? Or is it still related indirectly?
Does the order of summations in variable elimination matter?
Also what are observed and unobserved variables? Ie are ancestor variables observed variables? Or are they the marginalized variables? Or something else?
Came here searching for coal , found Gold ✌🏻✌🏻✌🏻✌🏻✌🏻
Great video, extremely clear and helpful. :)
Good lecture,that is a big help for me to understand baysian network and formula.
which book is he using for the reference?
Wait. What do the commas actually denote? It seems confusing that they're being used to denote both "AND" and "OR" (Union and Intersection) like at 14:49.
Can someone explain what's going on?
The commas represent values to be factored i.e. P(W | C, R) = P(W | C) P(W | R)
your voice doesn't sound like your photo
when u elimiate c, you have f(w), but where is r go?
wtf is this how is it so simple. had it always been this simple. thanks
Very simple explanation, thans !
This was a literal saviour! Thanks a ton!
your voice is gorgeous!!!
Damn, what a voice. Thanks for this
Until now I understand bayesian network and the notation.
Top tier video without a doubt.
thanks, for sharing this lecture video!
Thanks ! Very nice explanation !
Great video. Thanks a lot!
Really useful, thanks!
good explanation !
Thank you for the video!! :)
Can you tell me what we need to know about this method of data mining Other than this, please.
Great video !
great
love your voice bro!
awesome
very unclear and comfusing using venn diagrams to represent some of the probabilities and giving detail example of the math using numbers to show how it runs would be of great help, for people discovering the subject. I am fairly sure this is a great video for people who already understand the subject or have some grapst on it. But for new comer it is very confusing. not to mention the rise in difficulty between the first part which is quite easy to understand (although venn diagrams would help) and the second part which looks like elvish.
Very good
DId what my teacher tried to do in 1 hour in 5 minutes, and better so
Great video but for the slipping bit your intuition isnt always true like it could be but if the ground is wet doesnt nessasarily mean it was raining as you said so it could not be raining and you could slip on dew covered grass. Loving this video tho as I dont know probability or bayesian classifiers which are in my literature for nns, okay you crossed out the intuition lol paused the video MB
from Bihar (INDIA)
absolutely useless.