In feature bagging at 25:52 is a random subset from all available features selected at every node(to get the best feature and the threshold) while building a single tree or is the random subset from all available features selected only once per decision tree?
I think it should be fine to sample with replacement considering there is a another random sampling going on with the data points. So the chances of two trees having the same combination of (feature subset, data points) should be very low)
if every dataset after being bagged has same number of datapoints as the parent dataset, then aren't they all equal, i.e. D = D_i ( for all i = 1,2...m) ?? How will this work then?
Well, replacement is allowed. So if you have D with 5 datapoints (x1, x2, x3, x4, x5), then the total possible combinations of D_i is 5 * 5 * 5 * 5 * 5= 5^5. So bags like x1, x1, x1, x1, x1 or x1, x3, x,3 , x4, x2 are legitimate.
In feature bagging at 25:52 is a random subset from all available features selected at every node(to get the best feature and the threshold) while building a single tree or is the random subset from all available features selected only once per decision tree?
I think it should be fine to sample with replacement considering there is a another random sampling going on with the data points.
So the chances of two trees having the same combination of (feature subset, data points) should be very low)
if every dataset after being bagged has same number of datapoints as the parent dataset, then aren't they all equal, i.e. D = D_i ( for all i = 1,2...m) ?? How will this work then?
Well, replacement is allowed. So if you have D with 5 datapoints (x1, x2, x3, x4, x5), then the total possible combinations of D_i is 5 * 5 * 5 * 5 * 5= 5^5. So bags like x1, x1, x1, x1, x1 or x1, x3, x,3 , x4, x2 are legitimate.