Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

codebasics

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 23 гру 2024

КОМЕНТАРІ • 674

@codebasics 2 роки тому ⁺¹³
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@celestineokpataku 4 роки тому ⁺⁵⁵
I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are
@codebasics 4 роки тому ⁺⁵
That for the feedback my friend 😊👍
@chitz7435 2 місяці тому ⁺¹
100% aligned...am doing an external course but have to refer to ur session to understand the topic in external course...amazing effort..
@TheSignatureGuy 4 роки тому ⁺⁶⁰
For anyone stuck with the categorical features error.
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
X
Then you should be able to continue the tutorial without further issue.
@muhammadhattahakimkeren 4 роки тому ⁺¹
thanks bro
@fatimahazzahra6181 4 роки тому
thanks a lot! it helps
@souvikdas3189 Рік тому ⁺¹
Thank you brother.
@Ran_dommmm Рік тому ⁺¹
Hey, thank for the code.
I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error.
" TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
"
@TheSignatureGuy Рік тому
@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one.
This function may help confirm that a dense numpy array is being passed.
import numpy as np
import scipy.sparse
def is_dense(matrix):
return isinstance(matrix, np.ndarray)
Pass in X for matrix and it should return True.
Good luck fixing this.
@jhagaurav8292 6 років тому ⁺¹¹⁵
Sir pls continue your machine learning tutorials ,yours tutorials are one of the best I have seen so far .
@codebasics 5 років тому ⁺²³
sure Gaurav, I just started deep learning series. check it out
@samrahafeez5001 3 роки тому ⁺³
@@codebasics
Kindly explain the concept of dummies in deep learning as well
@sreenufriendz 5 років тому ⁺⁵
Anyone can be a teacher , but real teacher eliminates the fear from students .. you did the same !! Excellent knowledge and skills
@codebasics 5 років тому
Sreenivasulu, your comment means a lot to me, thanks 😊
@venkatesanrf 4 роки тому ⁺²¹
Hi,
Your explanation is very simple and effective
Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061
B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219
C) Accuracy=0.9417050937281082(94 percent)
@ANIMESH_JAIN04 7 місяців тому
Same bro
@fathoniam8997 6 місяців тому
same bro.... thx for replying so that i can check my results
@codebasics 5 років тому ⁺¹⁵
Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.
@urveshdave1861 4 роки тому
Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions?
what is the solution to this?
@bishwarupdey10 4 роки тому
_init__() got an unexpected keyword argument 'categorical_features' sir I get this error when I specify categorical features
@sejalmittal1326 4 роки тому
@@urveshdave1861 Have you got any answer for this? I am having the same error
@sejalmittal1326 4 роки тому
@@urveshdave1861 okay .. i will do that. thanks
@tanvisingh9298 4 роки тому
@@urveshdave1861 Hey I am also getting the same error. how did you resolve it?
@noubaddi8567 4 роки тому ⁺³
This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!
@codebasics 4 роки тому ⁺¹
I am glad it was helpful to you 🙂👍
@tech-n-data 2 роки тому ⁺⁴
Your ability to simplify things is amazing, thank you so much. You are a natural teacher.
@Genz111-o4r 4 роки тому ⁺²⁴
I was confuse from where to start studying ml and then my friend suggested this series.... It's great :-)
@rishabhjain7572 4 роки тому
any other courses or source you are following? and any development you have begun ?
@sauravmaurya6097 2 роки тому
want to know how much this playlist is helpful? kindly reply.
@carti8778 2 роки тому
@@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.
@carti8778 2 роки тому ⁺¹
@@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.
@vaishalibisht518 6 років тому ⁺¹¹
Wonderful Video.
This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today.
Thanks a lot, sir.
@shrutijain1628 4 роки тому ⁺⁵
this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.
@codebasics 4 роки тому
Glad it helped!
@programmingwithraahim 3 роки тому ⁺⁴⁹
15:50 write your code like this:
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],
remainder='passthrough'
)
X = ct.fit_transform(X)
X
Ok so it will work fine otherwise it will give an error.
@AxelWolf26 3 роки тому ⁺¹
what is the use of this " (categories='auto') " and " 'one_hot_encoder' "
@jollycolours 2 роки тому ⁺¹
Thank you, you're a lifesaver! I was trying multiple ways since categorical_features has now been depreciated.
@adilmajeed8439 2 роки тому ⁺⁸
@@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed;
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(),
[0])], remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype=float)
@tushargahtori1570 Рік тому
Even in 23 your video is such a relief..kudos to your teaching.
@bandhammanikanta1664 5 років тому ⁺³
First of all, 1000*Thanks for sharing such content on youtube..
I got an accuracy of 94.17% on training data.
@codebasics 5 років тому
Bandham, I am glad you liked it buddy 👍
@mk9834 4 роки тому ⁺³
I was shocked after the first 5 minutes of the video and have never thought it would be so easy and fast! Thanks ALOT1
@codebasics 4 роки тому ⁺¹
Miyuki... I am glad you liked it
@ymoniem1 4 роки тому ⁺²
you really made it very easy to understand such new concepts, Thanks a lot
starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0]
here is the code update as of April 2020
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(columnTransformer.fit_transform(x), dtype = np.str)
X= X[:,1:]
model.fit(X,y)
model.predict([[1,0,2800]])
model.predict([[0,1,3400]])
@petermungai5508 4 роки тому
The code is working but give a different prediction compared to dummies
@petermungai5508 4 роки тому
Plus my X is showing 5 column instead of 4
@petermungai5508 4 роки тому
I was entering the 0 and 1 wrongly. I am getting the same answer thank you for the code
@rameshkrishna1956 10 місяців тому
thanks buddy
@ZehraKhuwaja65 Рік тому
I must say this is the best course I've come across so far.
@hiver6411 3 роки тому ⁺¹
the god of data science......Amazing explanation sir..kudos to your patience in explanation
@codebasics 3 роки тому ⁺¹
Glad it was helpful!
@shadabtechno 10 місяців тому
your are the best teacher on youtube , i have never seen before
@wangangcwayi9420 4 роки тому ⁺⁴
You have gift of explaining things even to the layman. Big Up to you
@codebasics 4 роки тому
Thanks a ton Wangs for your kind words of appreciation.
@ankitparashar7 5 років тому ⁺⁵²
Merc: 36991.317
BMW: 11080.743
Score: 94.17%
@codebasics 5 років тому ⁺⁷
Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
@vishalrai2859 4 роки тому ⁺²
thanks for posting the answer bro
@mutiulmuhaimin9156 4 роки тому ⁺²
Could we upvote this comment to the top? Been looking for this for quite some time now. This is important, and this comment matters.
@Augustus1003 4 роки тому ⁺⁴
@@codebasics I used pandas dummy variable instead of using onehotencoding, because it is too confusing.
@clashcosmos4641 4 роки тому ⁺²
Got the same answer using OneHotEncoder after correcting tons of errors and watching videos over and over.
@omharne1386 2 роки тому
I will say this is one of the best tutorial i have seen in ML
@tanmaykapure81 3 роки тому ⁺¹
This is the best machine learning playlist i have came across on youtube😃👍, Hats off to you sir.
@snom3ad 5 років тому ⁺⁵
This was really well done! Kudos to you! It's hard to find clear and concise free tutorials nowadays. Subscribed and hope to see more awesome stuff!
@HashimAli-tz8fw Рік тому ⁺⁴
I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code
df=pd.get_dummies(df,
columns=['CarModel'],drop_first=True)
@datasciencewithshreyas1806 3 роки тому ⁺¹
One of the best explanation for Encoding 👌👍
@codebasics 3 роки тому
Glad it was helpful!
@ZOSELY Рік тому
I wish I could give this videos 2 thumbs up! Great explanation of all the steps in one-hot encoding! Thank you!!
@abhinavb717 Рік тому
I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job
@vishwa4908 5 років тому ⁺²
Awesome, you're explaining concepts in very simple manner.
@codebasics 5 років тому ⁺²
Vishwa I am happy to help 👍
@gokkulkumarvd9125 4 роки тому ⁺⁴
How can I like this video more than 100 times!
@codebasics 4 роки тому
I am happy this was helpful to you.
@weshallneversurrender 2 роки тому
The Data Science GOAT! One day I will send you a nice donation for all that you have contributed to my journey sir!
@himanshusingh-vt9do 8 місяців тому
my model score 94% Accuracy .Thankyou sir for amazing video.
@shekharbabar2496 4 роки тому
the best video series on ML sir ....Thank you very much sir....
@timse699 3 роки тому ⁺¹
You teach with passion! thank you for the series!
@phil97n 3 місяці тому
I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score.
Thanks for your awesome explanation
@NoureddineBahi 3 роки тому
Think you very much...wonderful work..special think from Morocco in north of Africa
@geekyprogrammer4831 3 роки тому
This is really the best series to get started with ML
@shinosukenohara.123 3 роки тому
How are u starting?
@codebasics 3 роки тому ⁺¹
Glad it was helpful!
@geekyprogrammer4831 3 роки тому
@@shinosukenohara.123 I am watching this channel, Krish Naik and Andrew NG course on Coursera
@rooshanghous6912 Рік тому
This is an amazing tutorial! saved me so much time and brought so much clarity!!! Thank you!
@istihademon1427 Місяць тому
Highly Qualitative.
@late_nights 4 роки тому ⁺¹⁰
If anyone got struck at One hot encoder at 16:26 then type this command and execute pip install -U scikit-learn==0.20
@dhananjaypatel3538 4 роки тому ⁺¹
Thanks 😃
@kketanbhaalerao 4 роки тому ⁺¹
stuck and still not executed using your solution
@farjadmir8842 4 роки тому
I also got them correct. Sir, this course is amazing. You have made it so easy to understand.
@codebasics 4 роки тому ⁺¹
Glad to hear that
@rafibasha4145 2 роки тому
@14:01,pls explain how come you applied label encoding for nominal categories ,morever LE should be applicable to target column only
@deekshithkumar3234 3 роки тому ⁺¹
superb and precisely explained
@codebasics 3 роки тому
Thank you 🙂
@nationhlohlomi9333 Рік тому
A PLACE TO RUN TO WHEN ONE IS STUCK, THANK UOU SO MUCH SIR
@brijesh0808 4 роки тому ⁺²
@13:20 we need to do :
dfle = df.copy() ?
because otherwise changes in dfle will reflect back to df
Thanks :)
@adarshdubey1784 3 роки тому
Yes u r right
@ramanandr7562 Рік тому
Thank you sir🎉. You made my ML Journey Better.. 🤩
@AruLcomments 4 роки тому
You are doing a wonderful job, people like you inspire me to learn and share the knowledge i gain. It is very useful for me. All the best.
@maruthiprasad8184 2 роки тому
For Mercedec benz I got 51981.26, for BMW i got 39728.19 & score is 94.17% . Thank you very much to make ML easy.
@tanmayck9887 2 роки тому ⁺¹
Why did we apply LabelEncoder & then OneHotEncoding in 2nd method as we can directly apply OHE itself to thre data?
@manasaraju8552 2 роки тому
difficult topics are easily understood, Thank you so much for the content sir
@elinem5311 4 роки тому ⁺¹
thank you, this helped me so much with multivariate regression with many categorical features!
@ayushmanjena5362 2 роки тому ⁺¹
15:50 write this code
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough')
x = ct.fit_transform(x)
x
@bharathdwarakanath1587 4 роки тому
The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!
@loycewaihiga6707 4 роки тому
I agree
@debaratighatak2211 3 роки тому ⁺¹
I learned a lot from the exercise that you gave at the end of the video, thank you so much sir!
@justchill2199 Рік тому ⁺¹
someone plz help!! at 15:14 getting an error for { y = df.price }
It shows "AttributeError: 'DataFrame' object has no attribute 'price' "
@pranav9339 Рік тому ⁺¹
That means there no column labelled as price. Again redo it. You might have lost the column while executing some drop command multiple times.
@jayshreedonga2833 2 роки тому
thanks sir nice lecture
sir you are really a great teacher
you teach everything so nicely
even tough thing becomes easy when you teach
thanks a lot
@thanusan 6 років тому ⁺³
Excellent video - thank you!
@mapa5000 Рік тому
You make it easy with your explanation !! Thank you !!
@asamadawais 3 роки тому
Simply excellent explanation with very simple examples!
@preetipisupati2308 4 роки тому ⁺²
Thanks for the excellent video.. but due to the recent enhancements, ColumnTransformer from sklearn.compose is to be used for OneHotEncoding.
@codebasics 4 роки тому
Preeti, can you give me a pull request.
@piyushjha8888 4 роки тому
model.predict([[45000,4,0,0]])=array([[36991.31721061]]),
model.predict([[86000,7,0,1]])=array([[11080.74313219]]),
model.score(X,Y)=0.9417050937281082.
Thanks sir for these exercise
@sarafatima2252 4 роки тому
definitely one of the best videos to learn from!
@Adnan25048 5 років тому ⁺¹
That's a great tutorial of one-hot encoding. I was unable to find a complete example anywhere. Thanks for sharing.
@codebasics 5 років тому ⁺¹
Thanks Adnan for your valuable feedback
@regithabaiju 4 роки тому
Your tutorial video is helping so much for knowing more about ML.
@codebasics 4 роки тому
I am happy this was helpful to you.
@hamzazidan6093 5 місяців тому
Iam here from 2024 after 6 years and I want to say that this playlist is wonderful!
I hope that you update it because there're many changes in the syntax of sklearn now
@codebasics 5 місяців тому ⁺¹
Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.
@sanjanatarekar5942 2 роки тому ⁺²
Hi,
Since OneHotEncoder's categorical_features has been deprecated... Can you please mention here how to proceed?
@komalsunandenishrivastava9211 4 місяці тому
That image on one hot encoding 🤣🔥
@srinivasreddy1709 4 роки тому ⁺²
Hi Dhaval, your explanation on all the topics is crystal clear.
Can you please make videos on NLP also
@betzthomas9693 4 роки тому ⁺¹
Sir ,what is the best method to do label encoding for job designations like (management
,blue-collar,technician etc) .Please let me know the best practice.
@armagaan007 5 років тому ⁺¹¹
Wait wait... I don't see the point 😕
The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps
Then why not use the pd.get_dummies instead of onehotencoding???
What's the advantage of using onehot?
@codebasics 5 років тому ⁺⁹
I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient
@armagaan007 5 років тому ⁺¹
@@codebasics thank you :]... btw you make grt videos
@aravindlap70 6 місяців тому
Hello Sir @codebasics , in the exercise question which column will I choose for X and Y value for plot the scatterplot graph??
@honeybansal9165 2 роки тому
bro at 16:43 onwarss, why you dropped first column ? and why you assigned the entire thing as X ?
@mallikasrivastava 3 роки тому ⁺¹
Your videos are awesome
@codebasics 3 роки тому ⁺²
Glad you like them!
@leooel4650 6 років тому ⁺¹
Mercedes = array([[36991.31721061]])
BMW = array([[11450.86522658]])
Accuracy = 0.9417050937281082
Thanks for your time and knowledge once again!
@scriptfox614 4 роки тому
The import linear regression statement lol. Amazing tutorial. :D
@kmishy 2 роки тому
12:45 what is the need for label encoder? why can't use onehot encoder directly?
@claude-olivierbatungwanayo9059 6 років тому ⁺¹
Excellent as usual!
@flamboyantperson5936 6 років тому ⁺⁶
Please make regression video using preprocessing library with standaridization and normalization variables
@MrArunlama Рік тому
I was learning through a paid course, and then I had to come here to understand this concept of dummy variable.
@Jobic-10 Рік тому
❤🎉🎉 Thank you. You earned a subscriber
@SrinivasA-vk7if 6 місяців тому
Excellent video.., thank you so much.
@IIT-JEEAcademy 2 роки тому
Dhawal sir how to plot the scatter graph for this exercise as it has 3 independent variables(Car model , Mileage,Age) and 1 dependent variable (Sell prices)
@Sara-fp1zw 3 роки тому
at 15:04 why you transformed into 2d array? please explain
@mohammadismailhashime5239 2 роки тому
Very nice explanation, appreciated
@mialam2318 3 роки тому
Im not quite understanding with sklearn.preprocessing. firstly, we assign X with 2 values(town and area) then we got 4 values in TRANSFORM(X) to array. Secondly, how can we know what index in array(4 values) we should pick to drop?
@cahitskttaramal3152 5 років тому ⁺⁵
Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?
@codebasics 5 років тому ⁺⁷
Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.
@maxb.w5170 3 роки тому
What would help inform a decision to drop one of the dummy variables? You mentioned the linear regression classifier will typically be able to handle and nonlinear interaction between most dummy variables. When should we drop one?
@prasadjoshi8213 4 роки тому ⁺³
Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?
@jayasreecarey7843 Рік тому
Many Thanks ! Great Explanation :)
@purnanandabaisnab2856 2 роки тому
nice teaching, really outstanding thanks a lot
@prashantgajjar1431 4 роки тому ⁺²
Hi, can we know which interger values are assigned to the categories through 'LabelEncoder'. Here, only three categories are available, so it can be easily distinguished Encoded values. What if there are 10 categories available, and how to know the exact Encoded values with respect to Categories?
@rajeshmandapati4407 2 роки тому
yes sir same question
@felixgallo5132 3 роки тому
They're basically the same however pd.dummy variables are easier to use.
Thank u, sir.
@codebasics 3 роки тому ⁺¹
yes I agree
@pranavakailash8751 3 роки тому
This helped me a lot in my assignment, thank you so much code basics
@codebasics 3 роки тому
Glad it helped!
@leelavathigarigipati3887 4 роки тому
Thank you so much for the detailed step by step explanation.
@codebasics 4 роки тому
Glad it was helpful!
@dineshgaddi1843 3 роки тому ⁺⁴
First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?
@infinity2creation551 11 місяців тому
Dil jeet liya , yahi khoj rha tha
@Dim-zt5ei 2 роки тому ⁺¹
Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)
@codebasics 2 роки тому ⁺¹
Point noted. I will redo this playlist when I get some free time from tons of priorities that are in my plate at the moment
@Dim-zt5ei 2 роки тому ⁺¹
@@codebasics Thank you for the reply and again : Great job for all the quality tutorials!
@rachitbhatt40000 3 роки тому
This module makes my code hot!
@isaackobbyanni4583 3 роки тому
Thank you for this series. Such great help
@codebasics 3 роки тому
Glad it was helpful!

Наступне

Автоматичне відтворення

Machine Learning Tutorial Python - 7: Training and Testing Data