You genius, you explained it in a way that my teacher could not. Thank you. The terms you use are different than the ones we use but the concepts are the same.
Great explanation! That was exactly what I was missing for my bachelor thesis :) But as @danieleboch3224 already said - the last short example with the [25, 25] child nodes (from [50, 50] parent node) does not work, as the gain will be zero here regardless of the use of gini, entropy or misclasification error. I also answered to their post with an explanation why: "In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong. (*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again)."
You are right. Thank you for the clue! In case you or anyone else want to know more: In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong. (*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again).
the flaw with the example is that splitting on x_2 would be the greatest information gain in all of the cases. no need to be concerned with x_1 at all, you could do it all with an x_2 stump. you say to "make the assumption x_1 is a better split" but this is clearly not the case, since x_2 splits everything perfectly as seen in the leaf nodes.
I haven't seen such a great series in decision trees. Great work you have done. I mean all concepts are crystal clear. Thanks a lot
You genius, you explained it in a way that my teacher could not. Thank you. The terms you use are different than the ones we use but the concepts are the same.
Thanks a lot for the kind works!
Thanks for providing the tree based models lecture. Nice explanation.
Thank you very much for this video! It was helpful for comparing the different metrics for classification trees!
Great explanation! That was exactly what I was missing for my bachelor thesis :) But as @danieleboch3224 already said - the last short example with the [25, 25] child nodes (from [50, 50] parent node) does not work, as the gain will be zero here regardless of the use of gini, entropy or misclasification error. I also answered to their post with an explanation why:
"In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong.
(*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again)."
wait, in the case where we split the [50, 50] node and go to two [25, 25] nodes, our information gain is still 0...
You are right. Thank you for the clue! In case you or anyone else want to know more:
In case the proportion between the classes is in both child nodes exactly the same as it was in the parent node, the information gain will be zero. Regarding the plot in the end of the video, both child nodes would lie in the exact same spot as the parent node and the child node average would also be at the same spot as the parent node. But in any other case, where the proportion of the classes of the child nodes changes compared to the parent node (*), the information gain will be positive, as seen in the example in the video. But you are right, his last short example with the two [25, 25] child nodes was wrong.
(*) It can never be, that only one of the child nodes has a different proportion. As soon as one child node has a different proportion than the parent node (for example a smaller percentage) the other child node will also have a different proportion than the parent node (in the opposite way - e.g. a bigger percentage). In this case one of the child nodes will have a spot above the parent node and the other one will have a spot below the parent node (thinking of the last plot in the video again).
very good explanation
Thanks a lot 🌸🌸
the flaw with the example is that splitting on x_2 would be the greatest information gain in all of the cases. no need to be concerned with x_1 at all, you could do it all with an x_2 stump. you say to "make the assumption x_1 is a better split" but this is clearly not the case, since x_2 splits everything perfectly as seen in the leaf nodes.
Not a good presentation. Speaker does not explain anything