I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are
For anyone stuck with the categorical features error. from sklearn.compose import ColumnTransformer ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X) X Then you should be able to continue the tutorial without further issue.
Hey, thank for the code. I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error. " TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array. "
@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one. This function may help confirm that a dense numpy array is being passed. import numpy as np import scipy.sparse def is_dense(matrix): return isinstance(matrix, np.ndarray) Pass in X for matrix and it should return True. Good luck fixing this.
Hi, Your explanation is very simple and effective Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061 B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219 C) Accuracy=0.9417050937281082(94 percent)
Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.
Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions? what is the solution to this?
This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!
@@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.
@@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.
Wonderful Video. This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today. Thanks a lot, sir.
this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.
15:50 write your code like this: ct = ColumnTransformer( [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])], remainder='passthrough' ) X = ct.fit_transform(X) X Ok so it will work fine otherwise it will give an error.
@@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed; from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(ct.fit_transform(X), dtype=float)
you really made it very easy to understand such new concepts, Thanks a lot starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0] here is the code update as of April 2020 from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough') X = np.array(columnTransformer.fit_transform(x), dtype = np.str) X= X[:,1:] model.fit(X,y) model.predict([[1,0,2800]]) model.predict([[0,1,3400]])
Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code df=pd.get_dummies(df, columns=['CarModel'],drop_first=True)
I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job
I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score. Thanks for your awesome explanation
15:50 write this code from sklearn.preprocessing import OneHotEncoder from sklearn.compose import ColumnTransformer ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough') x = ct.fit_transform(x) x
The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!
model.predict([[45000,4,0,0]])=array([[36991.31721061]]), model.predict([[86000,7,0,1]])=array([[11080.74313219]]), model.score(X,Y)=0.9417050937281082. Thanks sir for these exercise
Iam here from 2024 after 6 years and I want to say that this playlist is wonderful! I hope that you update it because there're many changes in the syntax of sklearn now
Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.
Sir ,what is the best method to do label encoding for job designations like (management ,blue-collar,technician etc) .Please let me know the best practice.
Wait wait... I don't see the point 😕 The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps Then why not use the pd.get_dummies instead of onehotencoding??? What's the advantage of using onehot?
I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient
Dhawal sir how to plot the scatter graph for this exercise as it has 3 independent variables(Car model , Mileage,Age) and 1 dependent variable (Sell prices)
Im not quite understanding with sklearn.preprocessing. firstly, we assign X with 2 values(town and area) then we got 4 values in TRANSFORM(X) to array. Secondly, how can we know what index in array(4 values) we should pick to drop?
Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?
Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.
What would help inform a decision to drop one of the dummy variables? You mentioned the linear regression classifier will typically be able to handle and nonlinear interaction between most dummy variables. When should we drop one?
Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?
Hi, can we know which interger values are assigned to the categories through 'LabelEncoder'. Here, only three categories are available, so it can be easily distinguished Encoded values. What if there are 10 categories available, and how to know the exact Encoded values with respect to Categories?
First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?
Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are
That for the feedback my friend 😊👍
100% aligned...am doing an external course but have to refer to ur session to understand the topic in external course...amazing effort..
For anyone stuck with the categorical features error.
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
X
Then you should be able to continue the tutorial without further issue.
thanks bro
thanks a lot! it helps
Thank you brother.
Hey, thank for the code.
I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error.
" TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
"
@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one.
This function may help confirm that a dense numpy array is being passed.
import numpy as np
import scipy.sparse
def is_dense(matrix):
return isinstance(matrix, np.ndarray)
Pass in X for matrix and it should return True.
Good luck fixing this.
Sir pls continue your machine learning tutorials ,yours tutorials are one of the best I have seen so far .
sure Gaurav, I just started deep learning series. check it out
@@codebasics
Kindly explain the concept of dummies in deep learning as well
Anyone can be a teacher , but real teacher eliminates the fear from students .. you did the same !! Excellent knowledge and skills
Sreenivasulu, your comment means a lot to me, thanks 😊
Hi,
Your explanation is very simple and effective
Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061
B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219
C) Accuracy=0.9417050937281082(94 percent)
Same bro
same bro.... thx for replying so that i can check my results
Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.
Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions?
what is the solution to this?
_init__() got an unexpected keyword argument 'categorical_features' sir I get this error when I specify categorical features
@@urveshdave1861 Have you got any answer for this? I am having the same error
@@urveshdave1861 okay .. i will do that. thanks
@@urveshdave1861 Hey I am also getting the same error. how did you resolve it?
This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!
I am glad it was helpful to you 🙂👍
Your ability to simplify things is amazing, thank you so much. You are a natural teacher.
I was confuse from where to start studying ml and then my friend suggested this series.... It's great :-)
any other courses or source you are following? and any development you have begun ?
want to know how much this playlist is helpful? kindly reply.
@@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.
@@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.
Wonderful Video.
This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today.
Thanks a lot, sir.
this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.
Glad it helped!
15:50 write your code like this:
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],
remainder='passthrough'
)
X = ct.fit_transform(X)
X
Ok so it will work fine otherwise it will give an error.
what is the use of this " (categories='auto') " and " 'one_hot_encoder' "
Thank you, you're a lifesaver! I was trying multiple ways since categorical_features has now been depreciated.
@@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed;
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(),
[0])], remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype=float)
Even in 23 your video is such a relief..kudos to your teaching.
First of all, 1000*Thanks for sharing such content on youtube..
I got an accuracy of 94.17% on training data.
Bandham, I am glad you liked it buddy 👍
I was shocked after the first 5 minutes of the video and have never thought it would be so easy and fast! Thanks ALOT1
Miyuki... I am glad you liked it
you really made it very easy to understand such new concepts, Thanks a lot
starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0]
here is the code update as of April 2020
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(columnTransformer.fit_transform(x), dtype = np.str)
X= X[:,1:]
model.fit(X,y)
model.predict([[1,0,2800]])
model.predict([[0,1,3400]])
The code is working but give a different prediction compared to dummies
Plus my X is showing 5 column instead of 4
I was entering the 0 and 1 wrongly. I am getting the same answer thank you for the code
thanks buddy
I must say this is the best course I've come across so far.
the god of data science......Amazing explanation sir..kudos to your patience in explanation
Glad it was helpful!
your are the best teacher on youtube , i have never seen before
You have gift of explaining things even to the layman. Big Up to you
Thanks a ton Wangs for your kind words of appreciation.
Merc: 36991.317
BMW: 11080.743
Score: 94.17%
Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
thanks for posting the answer bro
Could we upvote this comment to the top? Been looking for this for quite some time now. This is important, and this comment matters.
@@codebasics I used pandas dummy variable instead of using onehotencoding, because it is too confusing.
Got the same answer using OneHotEncoder after correcting tons of errors and watching videos over and over.
I will say this is one of the best tutorial i have seen in ML
This is the best machine learning playlist i have came across on youtube😃👍, Hats off to you sir.
This was really well done! Kudos to you! It's hard to find clear and concise free tutorials nowadays. Subscribed and hope to see more awesome stuff!
I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code
df=pd.get_dummies(df,
columns=['CarModel'],drop_first=True)
One of the best explanation for Encoding 👌👍
Glad it was helpful!
I wish I could give this videos 2 thumbs up! Great explanation of all the steps in one-hot encoding! Thank you!!
I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job
Awesome, you're explaining concepts in very simple manner.
Vishwa I am happy to help 👍
How can I like this video more than 100 times!
I am happy this was helpful to you.
The Data Science GOAT! One day I will send you a nice donation for all that you have contributed to my journey sir!
my model score 94% Accuracy .Thankyou sir for amazing video.
the best video series on ML sir ....Thank you very much sir....
You teach with passion! thank you for the series!
I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score.
Thanks for your awesome explanation
Think you very much...wonderful work..special think from Morocco in north of Africa
This is really the best series to get started with ML
How are u starting?
Glad it was helpful!
@@shinosukenohara.123 I am watching this channel, Krish Naik and Andrew NG course on Coursera
This is an amazing tutorial! saved me so much time and brought so much clarity!!! Thank you!
Highly Qualitative.
If anyone got struck at One hot encoder at 16:26 then type this command and execute pip install -U scikit-learn==0.20
Thanks 😃
stuck and still not executed using your solution
I also got them correct. Sir, this course is amazing. You have made it so easy to understand.
Glad to hear that
@14:01,pls explain how come you applied label encoding for nominal categories ,morever LE should be applicable to target column only
superb and precisely explained
Thank you 🙂
A PLACE TO RUN TO WHEN ONE IS STUCK, THANK UOU SO MUCH SIR
@13:20 we need to do :
dfle = df.copy() ?
because otherwise changes in dfle will reflect back to df
Thanks :)
Yes u r right
Thank you sir🎉. You made my ML Journey Better.. 🤩
You are doing a wonderful job, people like you inspire me to learn and share the knowledge i gain. It is very useful for me. All the best.
For Mercedec benz I got 51981.26, for BMW i got 39728.19 & score is 94.17% . Thank you very much to make ML easy.
Why did we apply LabelEncoder & then OneHotEncoding in 2nd method as we can directly apply OHE itself to thre data?
difficult topics are easily understood, Thank you so much for the content sir
thank you, this helped me so much with multivariate regression with many categorical features!
15:50 write this code
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough')
x = ct.fit_transform(x)
x
The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!
I agree
I learned a lot from the exercise that you gave at the end of the video, thank you so much sir!
someone plz help!! at 15:14 getting an error for { y = df.price }
It shows "AttributeError: 'DataFrame' object has no attribute 'price' "
That means there no column labelled as price. Again redo it. You might have lost the column while executing some drop command multiple times.
thanks sir nice lecture
sir you are really a great teacher
you teach everything so nicely
even tough thing becomes easy when you teach
thanks a lot
Excellent video - thank you!
You make it easy with your explanation !! Thank you !!
Simply excellent explanation with very simple examples!
Thanks for the excellent video.. but due to the recent enhancements, ColumnTransformer from sklearn.compose is to be used for OneHotEncoding.
Preeti, can you give me a pull request.
model.predict([[45000,4,0,0]])=array([[36991.31721061]]),
model.predict([[86000,7,0,1]])=array([[11080.74313219]]),
model.score(X,Y)=0.9417050937281082.
Thanks sir for these exercise
definitely one of the best videos to learn from!
That's a great tutorial of one-hot encoding. I was unable to find a complete example anywhere. Thanks for sharing.
Thanks Adnan for your valuable feedback
Your tutorial video is helping so much for knowing more about ML.
I am happy this was helpful to you.
Iam here from 2024 after 6 years and I want to say that this playlist is wonderful!
I hope that you update it because there're many changes in the syntax of sklearn now
Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.
Hi,
Since OneHotEncoder's categorical_features has been deprecated... Can you please mention here how to proceed?
That image on one hot encoding 🤣🔥
Hi Dhaval, your explanation on all the topics is crystal clear.
Can you please make videos on NLP also
Sir ,what is the best method to do label encoding for job designations like (management
,blue-collar,technician etc) .Please let me know the best practice.
Wait wait... I don't see the point 😕
The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps
Then why not use the pd.get_dummies instead of onehotencoding???
What's the advantage of using onehot?
I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient
@@codebasics thank you :]... btw you make grt videos
Hello Sir @codebasics , in the exercise question which column will I choose for X and Y value for plot the scatterplot graph??
bro at 16:43 onwarss, why you dropped first column ? and why you assigned the entire thing as X ?
Your videos are awesome
Glad you like them!
Mercedes = array([[36991.31721061]])
BMW = array([[11450.86522658]])
Accuracy = 0.9417050937281082
Thanks for your time and knowledge once again!
The import linear regression statement lol. Amazing tutorial. :D
12:45 what is the need for label encoder? why can't use onehot encoder directly?
Excellent as usual!
Please make regression video using preprocessing library with standaridization and normalization variables
I was learning through a paid course, and then I had to come here to understand this concept of dummy variable.
❤🎉🎉 Thank you. You earned a subscriber
Excellent video.., thank you so much.
Dhawal sir how to plot the scatter graph for this exercise as it has 3 independent variables(Car model , Mileage,Age) and 1 dependent variable (Sell prices)
at 15:04 why you transformed into 2d array? please explain
Very nice explanation, appreciated
Im not quite understanding with sklearn.preprocessing. firstly, we assign X with 2 values(town and area) then we got 4 values in TRANSFORM(X) to array. Secondly, how can we know what index in array(4 values) we should pick to drop?
Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?
Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.
What would help inform a decision to drop one of the dummy variables? You mentioned the linear regression classifier will typically be able to handle and nonlinear interaction between most dummy variables. When should we drop one?
Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?
Many Thanks ! Great Explanation :)
nice teaching, really outstanding thanks a lot
Hi, can we know which interger values are assigned to the categories through 'LabelEncoder'. Here, only three categories are available, so it can be easily distinguished Encoded values. What if there are 10 categories available, and how to know the exact Encoded values with respect to Categories?
yes sir same question
They're basically the same however pd.dummy variables are easier to use.
Thank u, sir.
yes I agree
This helped me a lot in my assignment, thank you so much code basics
Glad it helped!
Thank you so much for the detailed step by step explanation.
Glad it was helpful!
First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?
Dil jeet liya , yahi khoj rha tha
Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)
Point noted. I will redo this playlist when I get some free time from tons of priorities that are in my plate at the moment
@@codebasics Thank you for the reply and again : Great job for all the quality tutorials!
This module makes my code hot!
Thank you for this series. Such great help
Glad it was helpful!