Python Pandas Tutorial 6. Handle Missing Data: replace function

codebasics

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 14 жов 2024

КОМЕНТАРІ • 158

@codebasics 2 роки тому ⁺¹
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@darrennewell3845 2 роки тому ⁺⁹
This helped me tremendously with cleaning up messy serial data that I was logging from a microcontroller into a useful data frame. Thank you for posting these free of charge and helping me finish my senior design project!
@engineerbaaniya4846 4 роки тому ⁺⁹
it was sweet and sort to the point best tutorials i have seen on whole youtube platform if anyone planning to learn pandas go for his playlist line by line it is amazing (best from all).....
@anushachand2443 4 роки тому ⁺²
You are so great in explaining the concepts, anyone can understand.
@s.sidharttan9241 4 роки тому
Yea im literally learning pandas from his vdeos
@ijeffking 7 років тому ⁺³
Your videos have always more to offer. Very useful for data analysis and in the process eventually for Machine Learning. Thank you very much
@joelu3440 5 років тому ⁺²³
Thanks, this is a fantastic tutorial. Just one question - at 7:14, you modified the temperatures column to hold 32 F and 32 C. Then you just removed the letters so that both of them became 32. Should you have done some sort of conversion first? 32 F and 32 C are not equal, so shouldn't you have used the (°F − 32) × 5/9 = °C formula to normalize all of them to C or all of to F?
@anweshgandham6776 5 років тому
Have u found the solution ..how to ahead in case of such situations .
Kmph and mph in same column .
Centigrade and farenheit in same column .
@kannanv8831 4 роки тому ⁺⁷
Technically you are correct. This values will confuse. But he is teaching only how to chop-off the letter which is fixed with this values. In real world, you will make the temperature column as F or C. You can not hold both with out letters.
@abhinavreddy1083 4 роки тому ⁺³
you are correct .......if we replace like that values cant give correct prediction on analysis
@PrasannaChowdharyborn9th 3 роки тому
@@anweshgandham6776 We can use regex to identify whether C or F is present in the cell and multiply the filtered records with respective conversion.
@eafadeev 3 роки тому
@@anweshgandham6776 you could use regex with a replace pattern being a function
@binodrai3653 3 роки тому ⁺¹
Thank you for making it free. One of the best pandas tutorial
@codebasics 3 роки тому
Glad it was helpful!
@ZeeshanYounas-m5v Рік тому
Sir! I am very excited to see this tutorial....i am starting your roadmap of data science..This is very usefull
@prashantkumarvishwakarma8645 3 роки тому
Sir, you are teaching very good way
@diplomatic_koboko 6 років тому ⁺³
You are absolutely fantastic. I am looking forward to whatever you do next
@rickrs5289 6 років тому ⁺²
Thank you very much for this awesome video! I have 2 small questions:
1. How to replace multiple occurrences of a same value in different columns with (a) same single value and (b) with different values ?
2. How to replace n number of values in different columns by (a) single value and (b) with different values ?
Could you please add example codes for this on Github?
Best regards.
@GoldPhoenix99 6 років тому ⁺³
These are an excellent video series, by the way. You've got my sub!
@SulemanTheTraveller 3 роки тому
Extremely helpful. Thank You sir for making us understand in such a easy way.
@abhi9029 5 років тому
Thanks for making this wonderful tutorial. It shows how powerful python is.
@codebasics 5 років тому
oh definately, python rules the world!
@bhats230284 3 роки тому
Superb explanation, I have started with this series and its helping me a lot. Many Thanks.
@Lifelicious28 4 роки тому
At 10:35,
you can also use below code i guess,
new_pr = df.replace(['A-Za-z'], regex=True)
instead of using the dictionary. Worked well for me.
@codebasics 4 роки тому
Yes regex can also be used. Thanks for the tip bhakti 👍
@1716_anujpradhan-wz7lu 8 місяців тому
But it will remove the data from event also, which we don't want.
@jennythedancer5139 2 роки тому
Watched Again. Thank You Very Much, Its' Very Helpful.
@kunjalsahu3504 2 роки тому
Thank u sir was strugling with one question related to replace able to solved it thank u...
@roopagaur8834 5 років тому ⁺²
Amazing teaching.!!!!! Thank you.
@Raja-tt4ll 4 роки тому ⁺¹
Awesome tutorial! Thanks
@talamuslu 6 років тому ⁺¹
You are absolutely fantastic. more videos pls
@HawkingMerchant 4 роки тому
A 13 minute vdo felt like a 1hour class that's the richness of you content excellent 😍😍
And the anaconda was best I never knew of it before your vedio that's so much useful thank you so much
@codebasics 4 роки тому
Vinay I am glad you liked it
@Raj_indian10 Рік тому
Good explanation.
@sihlengena5022 4 роки тому
thank you for helping me out.... I struggled the whole week... this helped meee
@codebasics 3 роки тому ⁺¹
Glad it helped!
@stormhawk252 4 роки тому
Amazing this came in so handy when completing my assignment.
@codebasics 4 роки тому ⁺¹
👍😊
@ritikajaiswal3824 2 роки тому
from all these months, your contents have helped me a lot. But still its tough for a newbie like me to understand how SQL,Python(pandas,matplotlib,numpy,seaborn),PowerBI/Tableau helps in data analysis. Can you make a sample project PLEASE!!!
@codebasics 2 роки тому
I have few projects on power bi/ tableau already on my channel. I don't have projects on SQL/Pandas etc and I will definitely add those in the future.
@kishorekumarviswanadhuni5055 6 років тому ⁺¹
Excellent Videos. Thanks for Uploading Videos
@nomoreospf 4 роки тому
Really great and clever tutorials. Thank you!
@codebasics 4 роки тому
Glad it was helpful!
@mansijain2250 3 роки тому ⁺¹
Sir,as there are different values 32F anf 32C.What if we want to convert all F values to C values.How we will handle this type of data?
@petiwalas Рік тому
excellent (as usual :-) one comment/question on the DatetimeIndex - I noticed that when you create a date_range dt = pd.date_range("01-01-2017","01-11-2017") dt itself is of type DatetimeIndex, so you shouldn't need to create another object instance idx and instead could use df.reindex(dt) directly... can you please explain the need to create this separate instance idx? Thank you.
@gowthamshetty9276 4 роки тому ⁺¹
Hey! thanks for the tutorial.
i have one doubt in this tutorial, can we also use na_values() method for replacing those 99999 values with Nan in data frame??
@blackisfav7222 4 роки тому ⁺³
Gowtham Shetty we can use.Also we even can replace with averages and mean modes of respective columns for all Nans as per practical problem demand.This is one of better ways to deal
@codebasics 4 роки тому ⁺²
Step by step guide on how to learn data science for free: ua-cam.com/video/Vn_mmOuQkSA/v-deo.html
Machine learning tutorials with exercises:
ua-cam.com/video/gmvvaobm7eQ/v-deo.html
@lyricathelyricsworld8945 3 роки тому
Sir please make a video on regex
@gurpreettata 4 роки тому
explained nicely
@codebasics 4 роки тому
Gurpreet , I am happy this was helpful to you
@subee128 Рік тому ⁺¹
Thanks
@harshKumar-uk3jx 4 роки тому
Amazing sir 👍🙌
@anveshreddy5905 3 роки тому
Really awesome👍
@codebasics 3 роки тому
Glad it was helpful!
@shaikansari6882 6 років тому
Thanks for the wonderful tutorial. It helped me a lot.
@pratipkhandelwal1101 6 років тому ⁺¹
Supposedly there are multiple special values in a column , so we are not able to add them manually into the replace list , so anyway how to know the special values without us checking the data columns row by row or without us seeing the dataframe ?
@su80061 5 років тому ⁺¹
Thank you so much. i learnt a great deal.
@godwingeorgethekkanath 3 роки тому
Great video. But 32F != 32C. We have to covert at least 1 unit. How to do that, if there are multiple units in a column?
@nataliaielnykova3173 6 років тому
It is simple and helpful! Thanks!!!
@skkkks2321 5 років тому
Again ,a big job done,A Great thank you.
@PRATIK1900 6 років тому
Great tutorials Sir, really helpful :)
One question. In the last section, where you showed how to replace a list of values with another list, that looked like it applied to the entire data frame ( e.g. we had multiple columns with "exceptional, "average", etc). So it would carry out this replacement in all the columns. Suppose we have 5 such columns (5 exams/subjects ) and I want this replacement in only two columns. then do I need to do something like this?
df_new = df.replace({
'exam1' : ["poor", "average", "excellent"]
'exam2' : ["poor", "average", "excellent"]
}, [0, 1, 2] )
is this correct code?
@PRATIK1900 6 років тому
so we have to write a line of code for every column we want to do this, right? Also, technically was my code wrong?
@rajatpati8808 6 років тому
df9 = pd.DataFrame({
'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'],
'score1': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'score2': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
})
df9
df9.score.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4],inplace = True)
df9.score1.replace(['poor', 'average', 'good'], [1,2,3],inplace = True)
df9
@lakshmanmaddi3763 3 роки тому
Excellent presentation sir. I would like to know your name please.
@muhammadazam8422 Рік тому
Hi Brother,
Thanks for making this wonderful tutorial.You are great in explaining the concepts.
How to extract the year in the string "June 13, 1980 (United States)".
Kindly Regards,
MA
@ak47ava 5 років тому ⁺²
If replacing values with the mean of that column, could i just do this ->
new_df = df.replace({
'temperature':-99999,
'windspeed':0,}, { df['temperature'].mean(), df['windspeed'].mean()} )
new_df
I got an error saying Value argument must be scalar, dict, or series?
@codebasics 5 років тому
df.temperature.replace(-999999,df.temperature.mean())
@balsamshallal6805 7 років тому
Hi, thank you for this tutorial, I would like to ask how could we combine monitoring data for each half hour to hourly data?
@balsamshallal6805 7 років тому
it has worked, thank you
@geocarvalhont 7 років тому ⁺¹
I really grateful, thanks
@nareshgb1 6 років тому
when you do the regex replace, the number format for the temp and windspeed columns changes from 99.9 to 99 - in fact its not clear whether the data is considered numeric anymore.
@mukeshkumar-kh2fh 2 роки тому
sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace .
Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...
@rony979 Рік тому
Hi,
Your tutorials are really helpful. Thanks for these clips.
my question are,
1. How can I keep the changes I am making? Cause, everytime I am trying the other option, data goes back to the original status and making the changes on it.
2. How can I combine multiple codes together?
For example,
I used this code and worked on the dataset.
new_df.replace({
'temperature': '-99999',
'windspeed': ['-99999', '-88888'],
'event': '0'
}, 'NaN')
Then when I used the following code, data set went back to original shape. Meaning, the changes occurred due to previous code were no longer there.
new_df.replace({
'temperature':'[A-Za-z]',
'windspeed': '[A-Za-z]',
},'', regex=True)
test
So, how can I make it stick without creating Dataframe every time?
I really appreciate your suggestion.
@lyricathelyricsworld8945 3 роки тому
Sir please make a video on regex
@jayshreedonga2833 Рік тому ⁺¹
thanks sir
@easydatascience2508 Рік тому
Hei, you can watch mine too. The channel has both Python and R playlists, and source files can be downloaded(link is in video description).
@brendachirata2283 5 років тому
I love your vids, thank you.
But may i please ask how to deal with missing values that comes in form of a hyphen in my data set.
Kind regards.
@brendachirata2283 5 років тому
@@codebasics ok, thank you
@shockey3084 5 років тому
good job dear
@jpcam4781 3 роки тому
i have a CSV and xlsx file (both the same data) but it cant use parse_dates or .astype to convert to datetime64 type. ?? any suggestions? Thanks for the videos very informative
@biswajitmondal7807 3 роки тому
Sir is it mandatory to learn ML we have to cover pandas,matplotlib,seaborn?
@boubacaramaiga4408 4 роки тому
Many thanks.
@TechieDishant 3 роки тому
nice sir
@moatlaredikamogelo6126 6 років тому
In the case where ? represents the missing value, how do you still implement the replace method. It seems not to work when the value is ? sign
@mohammedrashidakhtaransari8267 3 роки тому
Hello sir. I have a question. How to use excel sheet cell data to modify/.replace a text file. E.g. i have a excel file. In which i have a data in cell 1 e.g A1=10 20 30. And i want to use this cell vale to .replace a text file.
.replace('cell A1 data', '202020 (which is available in text').
@fatinafiqah.y938 6 років тому
Hi your video is great. But I dont know why as I import the excel file and want to solve the missing values in it with your method it just cannot work as it still in a NaN . Btw, is there any way that I can communicate with you to discuss my problem. Then when I try to use the scikit learn method, it just appears an error that my 'imputer' cannot be subscriptable. what does that means? Pls help me to solve this error.
@bartdziubek327 3 роки тому
Great tutorial :)
@kishorekumarviswanadhuni5055 6 років тому
Do you training videos on Machine Learning and R language also?
@kishorekumarviswanadhuni5055 6 років тому
Thank you so much. This is big help.
@raunakpatni1403 6 років тому
While dealing with 32 F, you used regex, but the data is now object type not int64 and you can not do mean, mode and other similar stuff on objects type data
@lakshsinghania Рік тому
then how to deal with that i mean change ?
@osamashawky622 4 роки тому
Amazing man
@codebasics 4 роки тому
Osama, I am glad you liked it
@bhavyashreethimmarayappa4945 3 роки тому
Thanks for the wonderful explanation! I have a query, Kindly address. When the 'replace' function was used on 'Temperature' and 'Windspeed' columns, the values were converted from 'int' to 'float'. Could you please explain how can we replace few values to NaN and retain the type of that column as 'integer'(not float)?
@sonalithakker6517 3 роки тому
Thank you
@codebasics 3 роки тому
You're welcome
@manojjha6597 6 років тому
Hi,
I had a small doubt here
The data set that u are using to explain this concept - " missing values treatment" is a very small data set where i can see my entire data set and visually observe wheather or not my data set contains any missing values and then do the treatment accordingly. but if data set is too big to be observed visually then how would i figure out wheather the data set contains any form of missing values ?
@didi098710 8 місяців тому
df.isna().any(axis=1)
@mvcutube 3 роки тому
the best
@chapidi99 4 роки тому
Why missing values need to be -99999 is it effecient to do. Does replacing with 999 or 9999, 99999 makes any difference?
@wingochambers2562 7 років тому
Great stuff -- thanks!
@pulkitprakash7566 5 років тому ⁺¹
Why did you use "np.NaN", and not simply "NaN" in "replace" at 2:05 ?
@codebasics 5 років тому ⁺²
There is nothing like NaN in python. You have to use np.nan from numpy module
@kneelakanta8137 2 роки тому
Can we have cheat sheet for all these pandas tutorials
@RAZONEbe_sep_aiii_0819 4 роки тому
let's say that I have a big dataset where a feature has many -ive values, I want to replace all of them by 0, can you tell me how to do that, U have taken a very small dataset here.
@naveenkuppili2889 6 років тому
How to implement the below replace fn only to "Score" column?
df.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4])
@yashchavan1350 3 роки тому
but 32 F is not equal to 32 C so how is the data correct .Is there any way to make this right or we need to multiply the conversion manually
@md.shafiqulislam5692 5 років тому
Great !!!
@ak47ava 5 років тому
Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column.
What i mean is how would i replace values of the column with the mean of that respected column.
@codebasics 5 років тому
df.column1.mean should give you a mean value and then you use that in your replace function.
@suchismitadash2399 3 роки тому
I have a situation where I need to exchange the column values between two data frames based on some criteria. I have a code ready for that as well using replace () but it is not working in few scenarios.
Can you please help. I can email you my code and data frame details
@sameer-verma 7 років тому
when doing the below, I also want to replace 'No Event' with 'Sunny':
df_new = df.replace({
'temprature':'[A-Za-z]',
'windspeed':'[A-Za-z]'
},'',regex=True)
:
Is it possible, I tried doing this way
df_new = df.replace({
'temprature':'[A-Za-z]',
'windspeed':'[A-Za-z]'
},'',regex=True, 'No Event', '')
@badamsuraj2327 3 роки тому
Sir I just code ML with excel sheet contains a small data then when I was run the program it showing error - no such file or directory.Is there any solution for this
@jatinfalwaria7087 7 років тому
man hats off.
@jatinfalwaria7087 7 років тому
Buddy can u share your email with me or mail me at jatin97.intruder@gmail.com,coz i have some doubts to be cleared😊
@StasoMalgaray 5 років тому
please tell it to me how can i replace this pattern (#R##N##R##N#) this columns contain text too
@nexusbiswa7895 2 роки тому
while using dictionary for replace my temperature column is nt replaceing
@shivatarun9125 6 років тому
I have one question.
How to replace a particular column of all values which has greater than a particular value.
Example: x['ApplicantIncome']>5070 , it has to replace with 5000 which has greater than the values of all 5070 ?
@shivatarun9125 6 років тому
Awesome , thanks for yur repsonse.
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean if x >5070 else x).
for me getting an TypeError: '>' not supported between instances of 'method' and 'int'
while giving anyvalue instead of train.mean its working.
@shivatarun9125 6 років тому
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean() if x >5070 else x)
TypeError: object of type 'int' has no len()
please help me out
@rajatpati8808 6 років тому
df10 = pd.DataFrame({
'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'],
'income': [5071,6000, 6500, 7500, 8000, 3000],

})
df10.income = df10.income.apply(lambda x: 5000 if x >5070 else x)
@codebasics 5 років тому
Machine learning tutorials with exercises are available at:
ua-cam.com/video/gmvvaobm7eQ/v-deo.html
@shihabuddinahmed8955 6 років тому
How can I download csv file from github which link is given..is it possible?
@mrakesh00 3 роки тому
how to replace particular column values with their mean where some column values are =0
pls, help me out.
@panagiotisgoulas8539 5 років тому
How to replace values under a certain condition? Example I want to replace all values on temperature column that are above 32 with a word
@panagiotisgoulas8539 5 років тому
@@codebasics Thank you teacher, your videos are really helpful
@mostafaalaywan3704 5 років тому ⁺¹
the code (new_df=df.replace(-9999,np.NaN)
new_df ) don't work , what i have to do ?
@Abhishek-jy4ul 5 років тому ⁺¹
df = pd.read_csv('filename',na_values=(-1))
df
@yanamadalaharishkumar5041 4 роки тому
converter and replace both act same ?
@sudheerpapasani541 4 роки тому
How to handle with data's like suppose age=200 how to rectify this
@theengineervlogger1470 4 роки тому ⁺²
why for 0 only we are using like '0'
@anuradha3868 5 місяців тому
Because the event type is 'str' you can check it in such a way "print (type(df.event[0]))" hope this helps 😊
@mansijain2250 3 роки тому ⁺¹
How np.NaN works.Can anybody help me out?
@Arvindraj-os9ep 8 місяців тому
You will have to import numpy as np to use it
@PavanVarma369 3 роки тому
You haven thought how to replace special characters like '?'
@dhananjaykansal8097 5 років тому
For some really really strange reason. My replace function just doesn't seem to work. Like it doesn't show any error. It basically does nothing. I really fail to understand what's wrong.
new_df = df.replace({
'Temperature': -99999,
'Windspeed':[-99999,-88888],
'Event': 0
},np.NaN)
new_df
@prickingpringle5187 5 років тому
it should be small letters 'temperature','windspeed','event', also 'event':'0'
new_df = df.replace({
'temperature': -99999,
'windspeed':[-99999,-88888],
'event':'0'
},np.NaN)
new_df
@dhananjaykansal8097 5 років тому
@@prickingpringle5187 In my file I've put like that. Where every first alphabet is capital.
@codebasics 5 років тому ⁺¹
Can you print df just before the replace call and make sure column names and values you want to replace are same as what you are passing in replace call as parameters. You code looks correct to me so not sure why it would not work!
@dhananjaykansal8097 5 років тому
@@codebasics I tried each and every thing. Lastly I'll try on some another pc or reinstall my anaconda and try again with Jupyter. Because in PyCharm pandas doesn't work for me.
@ak47ava 5 років тому
@@codebasics Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column.
What i mean how would i replace values of the column with the mean of that respected column.
@abhinandansingh39 6 років тому
why we use np before Nan in np.NaN ?
@jugrajsingh9450 5 років тому
codebasics have you made some video's on numpy
@gunjankumar2267 6 років тому
hello,
data.replace({'Dependents':['+']},'', regex=True)
data.replace({'Dependents':'+'},'', regex=True)
data.replace('+',' ', regex=True)
i tried all the method.. facing same error all the time
error: nothing to repeat at position 0
how to remove that '+' sign from the data set.
@gunjankumar2267 6 років тому
Dependents columns has value ---> 0, 1, 2, 3+, nan
total no of row is 614.
regex=False
it reflects complete dataset---like print(data) and no change in the values of Dependent column
@gunjankumar2267 6 років тому
codebasics Thanks A Lot... Finally it worked... Thanku
@hackytech7494 4 роки тому
why you used np.NAN to replace ??
@tomparatube6506 3 роки тому ⁺¹
np.nan means numpy Not A Number. This is to tell Pandas to flag it as a blank
@hackytech7494 3 роки тому
@@tomparatube6506 ok. Got it 👍

Наступне

Автоматичне відтворення

Python Pandas Tutorial 7. Group By (Split Apply Combine)