For smart picking: We can apply any of the clustering algorithms on Du. Then pick small number of points from each cluster and provide them to experts for labelling
But, picking points randomly from each cluster in Du may not always guarantee improvement with M1 as most of the points so picked could be very similar to points in DL.
For multiclass classification(say we have k classes), can we use a softmax logic and choose those points which give out probabilities around 0.5 for atleast k/2 classes (since it is confused for half of the classes now, such points could be a good pick to our subset)?
For multi-class classification, we can go ahead with human labelling of class for which model is not behaving up to the mark or in other words making more % of errors in some classes. By introducing human labelled data for such classes we can bring some confidence in our new model M1.
For multiclass classification( for example: 3 classes), we have to choose data points for which the model is giving roughly equal probability(0.33) for all the 3 classes.
This thinking is in the right direction. Would you not pick those points where the class probabilities in a three class classification are .4, .5, .1? In this case, the model is confused between two of the three classes.
for multiclass classification , we will set a threshold for probability that if the max probability among all the classes is less then certain threshold value (< 0.7) we will choose that data point for human labelling.
Hi please correct me if I am wrong , for multiclass setting we can calculate entropy , since the entropy works on a similar principle where the spread of points is almost equal then the entropy should be high as compared to high seperation , so we can filter all the low entropy points
Perfect, Entropy is a very popular metric that can be used to numerically quantify the uncertainty in class labels. That's why we use cross-entropy as the loss function in multi-class classification.
But, picking points randomly from each cluster in Du may not always guarantee improvement with M1 as most of the points so picked could be very similar to points in DL.
Could you please make a explanation for "Extreme multi label classification" problem, Which is also covered in course (StackOverflow Tag prediction problem) but in that case we limited ourselves to use few labels. what kind of solution we can apply for these type of problem..
the method you mentioned might not work if we don't have probabilistic model ( the model which only gives class not the probability). what should we do now?
Most machine learning and deep learning models can be slightly modified to obtain class probabilities. Hence, this is not a major issue in the real world.
My opinion to how to extend it to multi-class setting: In ML, if we're given with a k-class classification problem, we can solve it using k binary classifiers. So I think One Vs Rest we can employ. Just my thinking.
Suppose we have three class classifications, our threshold could be 0.33. But if we get a probability like 0.1, 0.4 and 0.5. We can choose both the 0.4 and 0.5 data as a sample for labelling.
Note that there is only one xi that has these three probabilities for the three classes. It’s just a single point, not two or three. But, you are true that we will pick can xi’s and here the model is less certain based on the fact that no single class has a high probability like 0.9 or so.
What i think is ....we can do 1 vs all.... let's say we have 10 classes....and we can get probability whether a point belong to that class or not...if the probability of a point belonging to a class is very high...we will leave that point.....and we will do same for all classes......in the end we will be left with points that has no high probability of belonging to any classes....we will choose those points
@@AppliedAICourse Method 1: Sample points from Du such that wherever model will fail to achieve 0.9 probability for particular class Method 2: For given point, Pick two highest class probabilities if there difference is less than some alpha e.g (0.3) then that particular point is not sure point.
For multiclass classification(say we have k classes), can we use a softmax logic and choose those points which give out probabilities around 0.5 for atleast k/2 classes (since it is confused for half of the classes now, such points could be a good pick to our subset)?
Just did this yesterday in my organisation. Kudos to you guys for posting videos like this.
For smart picking: We can apply any of the clustering algorithms on Du. Then pick small number of points from each cluster and provide them to experts for labelling
But, picking points randomly from each cluster in Du may not always guarantee improvement with M1 as most of the points so picked could be very similar to points in DL.
I really trying hard to arrange money for your course.
Ofcourse u r greatest teacher sir, reason is there are many institutions does charge lakhs 😅
I pick this video smartly so that I learn manually and written in exams easily
Active learning in my view 😊
For multiclass classification(say we have k classes), can we use a softmax logic and choose those points which give out probabilities around 0.5 for atleast k/2 classes (since it is confused for half of the classes now, such points could be a good pick to our subset)?
For multi-class classification, we can go ahead with human labelling of class for which model is not behaving up to the mark or in other words making more % of errors in some classes. By introducing human labelled data for such classes we can bring some confidence in our new model M1.
But, we don’t have class labels for points in Du. So, how can we determine if a point is erroneously classified or not.
For multiclass classification( for example: 3 classes), we have to choose data points for which the model is giving roughly equal probability(0.33) for all the 3 classes.
This thinking is in the right direction. Would you not pick those points where the class probabilities in a three class classification are .4, .5, .1? In this case, the model is confused between two of the three classes.
@@AppliedAICourse yes, we should consider this case as well.
for multiclass classification , we will set a threshold for probability that if the max probability among all the classes is less then certain threshold value (< 0.7) we will choose that data point for human labelling.
Hi please correct me if I am wrong , for multiclass setting we can calculate entropy , since the entropy works on a similar principle where the spread of points is almost equal then the entropy should be high as compared to high seperation , so we can filter all the low entropy points
Perfect, Entropy is a very popular metric that can be used to numerically quantify the uncertainty in class labels. That's why we use cross-entropy as the loss function in multi-class classification.
Excellent approach
A doubt: Why cant we use Unsupervised Techniques like Clustering to label the large Unlabelled data? Just curious🤔
But, picking points randomly from each cluster in Du may not always guarantee improvement with M1 as most of the points so picked could be very similar to points in DL.
Could you please make a explanation for "Extreme multi label classification" problem, Which is also covered in course (StackOverflow Tag prediction problem) but in that case we limited ourselves to use few labels. what kind of solution we can apply for these type of problem..
the method you mentioned might not work if we don't have probabilistic model ( the model which only gives class not the probability). what should we do now?
Most machine learning and deep learning models can be slightly modified to obtain class probabilities. Hence, this is not a major issue in the real world.
My opinion to how to extend it to multi-class setting: In ML, if we're given with a k-class classification problem, we can solve it using k binary classifiers. So I think One Vs Rest we can employ. Just my thinking.
While this is a possible solution, can you think of alternative and simpler methods where you don’t have to build k binary-classifiers.
Suppose we have three class classifications, our threshold could be 0.33. But if we get a probability like 0.1, 0.4 and 0.5. We can choose both the 0.4 and 0.5 data as a sample for labelling.
Note that there is only one xi that has these three probabilities for the three classes. It’s just a single point, not two or three. But, you are true that we will pick can xi’s and here the model is less certain based on the fact that no single class has a high probability like 0.9 or so.
What i think is ....we can do 1 vs all.... let's say we have 10 classes....and we can get probability whether a point belong to that class or not...if the probability of a point belonging to a class is very high...we will leave that point.....and we will do same for all classes......in the end we will be left with points that has no high probability of belonging to any classes....we will choose those points
Go with One vs Rest approach.
While this is a possible solution, can you think of alternative and simpler methods where you don’t have to build k binary-classifiers.
@@AppliedAICourse Method 1: Sample points from Du such that wherever model will fail to achieve 0.9 probability for particular class
Method 2: For given point, Pick two highest class probabilities if there difference is less than some alpha e.g (0.3) then that particular point is not sure point.
For multiclass classification(say we have k classes), can we use a softmax logic and choose those points which give out probabilities around 0.5 for atleast k/2 classes (since it is confused for half of the classes now, such points could be a good pick to our subset)?