Data Preprocessing 06: One Hot Encoding python | Scikit Learn | Machine Learning

Stats Wire

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 бер 2021
Data Preprocessing 06: One Hot Encoding python | Scikit Learn | Machine Learning
GitHub Jupyter Notebook: github.com/siddiquiamir/Pytho...
GitHub Data: github.com/siddiquiamir/Data
About this video: In this video, you will learn about One Hot Encoding in Python
Large Language Model (LLM) - LangChain
LangChain: • LangChain Tutorial for...
Large Language Model (LLM) - LlamaIndex
LlamaIndex: • LlamaIndex Tutorial fo...
Machine Learning Model Deployment
ML Model Deployment: • ML Model Deployment us...
Spark with Python (PySpark)
PySpark: https: • PySpark with Python
Data Preprocessing (scikit-learn)
Data Preprocessing Python: • Data Preprocessing Python
Social Media Links
UA-cam: / statswire
Twitter (X) : / statswire
#datascience #machinelearning #python #ai #ml #deeplearning #opencv #imageprocessing #ai #tensorflow #neuralnetworks #deeplearning #pandas

КОМЕНТАРІ • 102

@ngneerin 2 роки тому ⁺²⁵
This is so straightforward. No other source where it's so simply put
@StatsWire 2 роки тому
Thank you
@_danfiz 2 роки тому ⁺⁶
This is a good, direct steps to use ohe. This helps me alot. Thank you!
@StatsWire 2 роки тому
You're welcome!
@aliyildirim5343 8 місяців тому
Amazing explanation! Left no questions in my mind...
@StatsWire 8 місяців тому ⁺¹
Thank you!
@nirajkhatri2017 11 місяців тому ⁺¹
Great video regarding ohe using sklearn . Describe everything that we need to understand. Thank you
@StatsWire 11 місяців тому
Glad it was helpful!
@kyleiong7311 7 місяців тому
REALLLY REALLY HELPFUL YOU SAVE MY DAY!!!!!
@StatsWire 7 місяців тому
You're welcome!
@fefefefezzz Рік тому
Very good video! Helped me a lot! Thanks ❤❤
@StatsWire Рік тому
You're welcome!
@tymothylim6550 2 роки тому
Thanks a lot! Great tutorial!
@StatsWire 2 роки тому
You're welcome!
@Hellios92 Рік тому
Thanks a lot, really helpful video! :)
@StatsWire Рік тому
You're welcome!
@mdhasanuzzaman4039 Рік тому
Thank you so much
Great Tutorial
@StatsWire Рік тому
You're welcome!
@emanuelea9967 2 роки тому ⁺¹
Hi! Great video but I have a question for you. Can i map, from one categories, es "STATE", with 30 different states, an onehotencoder that map for example, 5 states in a new categories Europe, 6 in America, and so on, without create 30 different new binary categories with every states?
@StatsWire 2 роки тому
Yes you can!
@poojakumarirollno9880 10 місяців тому
great explaining sir .your video helped as .make more videos recarding data science
@StatsWire 10 місяців тому
Thank you for your kind words!
@ammarayounas170 8 місяців тому
thank you so much
@StatsWire 8 місяців тому
You're welcome!
@ibragim_on 2 роки тому
Greate tutorial!
@StatsWire 2 роки тому
Thank you
@ashutoshdongare5370 Рік тому ⁺³
Great Tutorial...Only thing is that ravel() does not work for uneven arrays, one need to use concat or hstack
@StatsWire Рік тому
Thank you for sharing. I will try.
@ZAZ069 5 місяців тому
thanks my man
@SMoon453 3 місяці тому
Thank you dude! I was wondering why ravel() wasn't working for me
@rubennadevi Рік тому
Thank you!
@StatsWire Рік тому
You're welcome!
@arenashawn772 5 місяців тому
I think if you specify “sparse_output = False” when initializing the OneHotEncoder, the resulted ohe instance will not be a scipy csr_matrix and you won’t need to use the toarray() method to see the resulted matrix. But obviously it uses more storage this way…
@StatsWire 5 місяців тому
Yes
@mazharalamsiddiqui6904 3 роки тому
Very nice tutorial
@StatsWire 3 роки тому
Thank you
@waleedahmad2012 9 місяців тому
I'm getting so many rows at the bottom where entire row is has NaN except for the encoded columns. What could be the issue.
@waleedahmad2012 9 місяців тому
I did remove all null values before encoding
@StatsWire 9 місяців тому
Can you please check your code again or post it here.
@mousabmohammadshtayat4788 Рік тому ⁺³
Hi, Great video. I faced one problem. I have three categorical columns, and the # of unique values in these columns are different. I tried to add (dtype= object) in the (np.array command as I found the solution in many sites) but the result was in three different arrays not in one array. so please if you can help me. Thank u
@franfernandez795 Рік тому ⁺¹
I have the exact same problem as you, have you found the solution?
@franfernandez795 Рік тому ⁺²⁰
I've found the solution! the method np.hstack(x) worked for me
@taylorgood1007 Рік тому ⁺¹
@@franfernandez795 life saver! thanks :)
@fatihmercan6023 Рік тому
@@franfernandez795 #danke
@bvbyballena472 Рік тому
@@franfernandez795 BLESS YOUR SOUL
@umeshk0697 Рік тому
Can you please help me I got error " For a sparse output, all columns should be a numeric or convertible to a numeric" for pipe.fit(X_train,y_train) I double checked all this I dont why encoder error is fromed.
PS- Car is already defined.
X=car.drop(columns='Emissions_CO_[mg/km]')
y=car['Emissions_CO_[mg/km]']

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test= train_test_split(X,y, test_size=0.2)

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

ohe = OneHotEncoder()
ohe.fit(X[['Manufacturer', 'Model', 'Fuel_Type']])
ohe.categories_

column_trans= make_column_transformer((OneHotEncoder(categories=ohe.categories_),['Manufacturer','Model','Fuel_Type']),
remainder= 'passthrough')

lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)

pipe.fit(X_train,y_train)
@alonzoslim 7 місяців тому
Hello. Thanks for this video. It's quite informative.
How can I deal with a situation where the categories are of varying lengths?
I got this error message, "ValueError: all arrays must be same length"
@StatsWire 7 місяців тому
It occurs when the categorical data you're trying to encode has varying lengths. One-hot encoding requires that all arrays (or columns) being encoded have the same number of unique categories.
@lucykelly499 6 місяців тому
I got the same error, how can it be resolved?@@StatsWire
@vamsikrishna-ft8rn 2 роки тому
Well explained bro
@StatsWire 2 роки тому
Thank you
@codingzone4690 2 роки тому
Very Quick !! Or So Simple. Unexpected Bruh!!
@StatsWire 2 роки тому
Thank you
@user-fp9zn7on4z 9 місяців тому
Amazing
@StatsWire 9 місяців тому
Thank you!
@dhm9818 Рік тому
I got an error in line 17 says >> ValueError: Shape of passed values is (988, 35), indices imply (988, 7) how can I fix it?
@StatsWire Рік тому
You made a mistake. Please follow steps again then you won't get the error.
@fullnesmindcristiano8638 Рік тому ⁺¹
Hello friend, is this method used to predict data or what is the method used to predict data?
@StatsWire Рік тому ⁺¹
This method is to convert categorical columns into numerical columns for machine learning model
@user-gq6vr7jn1v 8 місяців тому
good video bro but could be better if you droped the unnessessary columns like color and country
@StatsWire 8 місяців тому
Thank you. This video was only for encoding purpose not for feature selection :)
@jeanfabraruiz7994 Рік тому ⁺¹
After doing this, should I remove the columns color and country?
@StatsWire Рік тому ⁺¹
Yes, use dummy columns.
@Intellectual_House 10 місяців тому
what's the role of toarray() methode ?
@StatsWire 10 місяців тому
Its primary role is to convert a sparse or structured array into a dense NumPy array.
@aakashrai2749 10 місяців тому
Onehotencoder will convert the word into binary or number format right ?
@StatsWire 10 місяців тому
OneHotEncoder is a preprocessing technique used in machine learning to convert categorical data (e.g., words, categories, labels) into a numerical format.
@aakashrai2749 10 місяців тому
@@StatsWire ok thanks 👍
@bthekhoa2704 Рік тому
Hi, I don't know where I can download the data set
@StatsWire Рік тому
Hi, you can download the data and jupyternotebook from my GitHub account: github.com/siddiquiamir/Python-Data-Preprocessing
@ngneerin 2 роки тому ⁺¹
.ravel() or .flatten() is just not working it's returning array of array as it is
@StatsWire 2 роки тому
Can you please check all the steps to see that you are not missing anything
@victorarayal Рік тому
It seems the code "ravel()" does not work if the columns have different number of unique values =(
@StatsWire Рік тому
I did not try that. Can you check the official documentation?
@RR-hq4cv Рік тому ⁺²
Thank you for the tutorial! In cells [14] & [15] I couldn't make a straight array to later pass it as column names. So I used this line of code (from sklearn documentation): feature_labels = ohe.get_feature_names_out()
print(feature_labels)
@StatsWire Рік тому
Great!
@ajeyamandikal2010 Рік тому ⁺¹
Thanks bro, was searching for this!
@StatsWire Рік тому
@@ajeyamandikal2010 You're welcome
@AbdullahAlMamun-jm4qm 2 роки тому
Could you olease share the csv file of this data
@StatsWire 2 роки тому
Sure. Here is the dataset link
Github: github.com/siddiquiamir/Data/blob/master/data-one-hot-encoder.csv
@violetasaguier1370 Рік тому ⁺²
Very good video greetings from Argentina land of LEO MESSI
@StatsWire Рік тому ⁺¹
Thank you. I like Leo Messi a lot.
@anshulsharma7080 Рік тому
Include problem of Dummy variable trap in one hot encoding please.
@StatsWire Рік тому
Thank you for your feedback. I have added it to my list.
@roshini_begum 2 роки тому
hi whats the difference between one hot encoding and label encoder
@StatsWire 2 роки тому ⁺²
Hi Roshini, label enconder is used to label your target variable(Y) and one hot encoder is used to encode independent variables(X). One hot encoding will create new columns but label encoding will just create numbers instead of strings it will not create new columns
@roshini_begum 2 роки тому
@@StatsWire thanks alot
@roshini_begum 2 роки тому
also when do we use minmax scaler and standard scaler and whats the difference betn them
@StatsWire 2 роки тому ⁺¹
@@roshini_begum When we have outliers in the dataset we use standard scaler otherwise minmax scaler is good to use
@roshini_begum 2 роки тому
@@StatsWire thanks
@ngneerin 2 роки тому
It's also sad that such common use-case requires so many steps. Should be available in 1 step like pandas dummies
@StatsWire 2 роки тому
Yes, pandas dummies is easier
@uchennaonyema989 Рік тому
.ravel() isn’t working bro. It returns same two arrays as before
@StatsWire Рік тому
Please re-run the code and check.
@aravindng5157 9 місяців тому
Bro paithiyama neee avlo variables podra
@StatsWire 9 місяців тому
I did not understand the language but thank you :D
@VinitKhandelwal 2 роки тому
.ravel() did not work. I used .flatten()
@StatsWire 2 роки тому
That's great. I hope it's working for you

Наступне

Автоматичне відтворення

Data Preprocessing 07: Ordinal Encoding Sklearn | Machine Learning | Python