Thanks! One way to do it is as a search problem. Goal states are states where all variables have been put into the ordering. The state is the variables eliminated thus far. An action is the choice of next variable to eliminate. The cost could be, for example, the size of the factor generated during the elimination, or (a little more intricate) the max(0, size of factor generated - largest factor generated thus far). A uniform cost (or A*) search would allow you find the optimal ordering.
For a query of the type P(Q_1, Q_2, ..., Q_m | e_1, e_2, ..., e_n) we call the Q_i variables the query variables, the e_i variables the evidence variables, and the remaining variables in the Bayes' net are the hidden variables. Variable elimination proceeds by eliminating one hidden variable at a time. When all hidden variables have been consumed, then all the remaining factors need to be multiplied together to obtain P(Q_1, Q_2, ..., Q_m, e_1, e_2, ..., e_n).
The "renormalise" part just means that you would normalize the final answer over the query variable. So, we know that P(U|+z) = α P(U, +z) from the start. That alpha (α) is the normalization factor. It is used to make the resulting probability sum to 1 when considering its inverse. (P(u|+z) and P(!u|+z) should sum to 1 as a probability) After you finished all the variable elimination you still must normalize your result. You do this by dividing by the sum of your resulting equation with U as both u and !u (true and false) So, you get: P(U | +z) = (P(U)f4(+z,U))/(Σu P(u)f4(+z,u)) (That Σu is meant to represent summing over all U values)
You should choose the ordering based on the size of the factor that it will create as to avoid raising the complexity. For example, he first chose W to eliminate. This resulted in a factor that required two variables as arguments (Y and V). The size of this factor is 4, assuming a binary domain. 4 also becomes the largest sized factor that he created. However, if he were to first choose V, he would have had a resulting factor: f1(X,Y|U,W) = Σv P(v)P(X|U,v)P(Y|v,W) This results in a factor that takes four variables! In a binary domain that's a factor size of 16 (2^4). If you think about it in terms of a tree, it will make much more sense, and this is essentially how variable elimination works. It works its way down the tree, multiplying and summing factors and taking variables as arguments. If a factor is created with four variables, it must branch out four times! Meaning it has to compute f1 when X is true and false, when Y is true and false, when U is true and false, and when W is true and false. So, for this reason, you want it to branch as little as possible.
When you join the factors you aren't supposed to get a joint probability every time. For example, when you eliminated W, you should have gotten the factor as: f1(Y|V) = Σw P(w)P(Y|V,w) The resulting factor from a join of two factors contains any variables that came before a conditional in any of the joined factors in front of the conditional. Otherwise, they go behind the resulting conditional (as conditions). I thought I should mention this, because the way variable elimination is being done here is not entirely correct.
It was great up until "renormalize," which is tossed out without explanation.
Clear, consise, to the point. Tutorial still holds up 9 years later.
*12 years 😂
Landed on this after bumping head first on many books, articles and videos. This helped, so thanks.
You nailed it. Please explain renormalize in details.
Thanks! One way to do it is as a search problem. Goal states are states where all variables have been put into the ordering. The state is the variables eliminated thus far. An action is the choice of next variable to eliminate. The cost could be, for example, the size of the factor generated during the elimination, or (a little more intricate) the max(0, size of factor generated - largest factor generated thus far). A uniform cost (or A*) search would allow you find the optimal ordering.
For a query of the type P(Q_1, Q_2, ..., Q_m | e_1, e_2, ..., e_n) we call the Q_i variables the query variables, the e_i variables the evidence variables, and the remaining variables in the Bayes' net are the hidden variables. Variable elimination proceeds by eliminating one hidden variable at a time. When all hidden variables have been consumed, then all the remaining factors need to be multiplied together to obtain P(Q_1, Q_2, ..., Q_m, e_1, e_2, ..., e_n).
Does ordering of elimination matter? If not, then how did you think of the order?
Video is great but I don't know how to renormalize and I am crying now.
Could you please explain how to "renormalise" with an example? Thanks in advance
The "renormalise" part just means that you would normalize the final answer over the query variable.
So, we know that P(U|+z) = α P(U, +z) from the start.
That alpha (α) is the normalization factor. It is used to make the resulting probability sum to 1 when considering its inverse. (P(u|+z) and P(!u|+z) should sum to 1 as a probability)
After you finished all the variable elimination you still must normalize your result. You do this by dividing by the sum of your resulting equation with U as both u and !u (true and false)
So, you get:
P(U | +z) = (P(U)f4(+z,U))/(Σu P(u)f4(+z,u))
(That Σu is meant to represent summing over all U values)
@@backwardsman1 Thank you!
@@backwardsman1 Thanks you
@@backwardsman1 what if query was P(!U | +z), we still sum over all U values?
Many thanks and it really helps :). When you say Re normalize to find for eg. P(U|+z), this implies P(+z,U)/P(+z). Is that right?
UNITO: project IaLab, leave a like.
Eroe
How to choose the elimination ordering ??
I think the elimination ordering include variables that are not in the query.
*includes
You should choose the ordering based on the size of the factor that it will create as to avoid raising the complexity.
For example, he first chose W to eliminate. This resulted in a factor that required two variables as arguments (Y and V). The size of this factor is 4, assuming a binary domain. 4 also becomes the largest sized factor that he created.
However, if he were to first choose V, he would have had a resulting factor:
f1(X,Y|U,W) = Σv P(v)P(X|U,v)P(Y|v,W)
This results in a factor that takes four variables! In a binary domain that's a factor size of 16 (2^4).
If you think about it in terms of a tree, it will make much more sense, and this is essentially how variable elimination works. It works its way down the tree, multiplying and summing factors and taking variables as arguments. If a factor is created with four variables, it must branch out four times! Meaning it has to compute f1 when X is true and false, when Y is true and false, when U is true and false, and when W is true and false.
So, for this reason, you want it to branch as little as possible.
good even after 11 years
what if query was P(!U | +z)?
thank you for your great tutorial
When you join the factors you aren't supposed to get a joint probability every time.
For example, when you eliminated W, you should have gotten the factor as:
f1(Y|V) = Σw P(w)P(Y|V,w)
The resulting factor from a join of two factors contains any variables that came before a conditional in any of the joined factors in front of the conditional. Otherwise, they go behind the resulting conditional (as conditions).
I thought I should mention this, because the way variable elimination is being done here is not entirely correct.
I don't think so