The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement. if you have 60% training data then the subsample should have 60% of the data not from the training data which is 36% of the data
Great video. I have two questions. Since the data can be the same in the same subset it can also be exactly the same (although the chance is low ofc) on another subset right? In the case of random forests, since features are also selected randomly from the dataset, can they also be the same in different decision trees? Thanks in advance.
I'd try to answer this. For your first question, yes it's possible to have the same data points in different subsets. For random forest, since the features are selected randomly, it again possible.
In practice, we use single algo for all different bags. For example, have a look at BaggingClassifier in sklearn, there you set one single base_estimator, i.e. learning algorithm. However, nothing is stopping you from using different algorithms for different bags and comparing the results. I think you would lose some important mathematical guarantees though, but I'm not entirely sure.
only person who explained this properly imo
Agreed :)
yep
I was just going to type the exact same comment. Beautifully explained.
Wow - great explanations on concepts most other people take 30 mins to explain with half the effect. Subscribed!
The best explanation of what the word replacement really means. Thanks
he explained it in very simplistic manner.. Thanks
really precise and easy understanding explanation, thank you, sir! really save my life :)
Why does it feel like Ron Swanson whose teaching? 😀 Well explained concept!!
This is the best explanations on bagging I've ever seen. Many thanks 👍🙌👌
Your explanation makes it very understandable!
Great explanation: concise and direct.
Very concise explanation - thanks for posting this!!
Random forest - randomly selects the samples as well as the features for each tree model .
Thank you so much for such a great explanation!
Thats awesome explanation in a nutshell
Wow, so amazing explained!
it was really understandable and easy way explanation. thank you
excellent explanation - many thanks, sir
This is soo awesome! Thank you!
very good explanation!
The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement. if you have 60% training data then the subsample should have 60% of the data not from the training data which is 36% of the data
Thank You!
(In under 3 minutes!)
but scikit-learns RF uses n for each bag with or without replacement is that something new?
What algorithm is used in the model step. Is this applicable for any algorithm?
Decision trees are used & an ensemble of decision trees make a forest.
georgia tech quality!!! thanks :))
very clear!! Thanks !!
Great video. I have two questions. Since the data can be the same in the same subset it can also be exactly the same (although the chance is low ofc) on another subset right? In the case of random forests, since features are also selected randomly from the dataset, can they also be the same in different decision trees? Thanks in advance.
I'd try to answer this. For your first question, yes it's possible to have the same data points in different subsets.
For random forest, since the features are selected randomly, it again possible.
well explained, thank you very much
very well done.... gr8
so we use single algo for all different bags? or different learners for different bags ?
In practice, we use single algo for all different bags. For example, have a look at BaggingClassifier in sklearn, there you set one single base_estimator, i.e. learning algorithm.
However, nothing is stopping you from using different algorithms for different bags and comparing the results. I think you would lose some important mathematical guarantees though, but I'm not entirely sure.
extremely clear
Could someone please explain to me what he meant by "you can simply wrap this up in an API"? How to do that and how does it help?
He meant you can use libraries that have implemented random forest, like scikit-learn
Well explained.
Where does the unknown X come from?
X is from the testing set (blue color)
We test X with all models and pick the one that gets the majority of voting
or in this case, the mean Y
Thanks a lot!
Listen at 5X speed.
Note: Standard Bagging is class imbalance, instance importance and feature importance insensitive.
Weird my school teaches n' should be the same size as n. Anyways it doesn't change that much i guess
Cheers
2:04
Hi what tablet is it? Is it wacom? Will that be very expensive??
those bags look like slug aliens from mars. also i sneezed while watching this which proves illuminati did 9/11
Nice explanation
In every video, what you explain is fine but is not enough to explain the topic of your header
Because its a lesson series. The entire lesson is available on Udacity, if they haven't uploaded it on UA-cam.
save my life
pretty clear explanation but his voice makes me asleep
you can watch after sleep