This helped me tremendously with cleaning up messy serial data that I was logging from a microcontroller into a useful data frame. Thank you for posting these free of charge and helping me finish my senior design project!
it was sweet and sort to the point best tutorials i have seen on whole youtube platform if anyone planning to learn pandas go for his playlist line by line it is amazing (best from all).....
Thanks, this is a fantastic tutorial. Just one question - at 7:14, you modified the temperatures column to hold 32 F and 32 C. Then you just removed the letters so that both of them became 32. Should you have done some sort of conversion first? 32 F and 32 C are not equal, so shouldn't you have used the (°F − 32) × 5/9 = °C formula to normalize all of them to C or all of to F?
Technically you are correct. This values will confuse. But he is teaching only how to chop-off the letter which is fixed with this values. In real world, you will make the temperature column as F or C. You can not hold both with out letters.
Thank you very much for this awesome video! I have 2 small questions: 1. How to replace multiple occurrences of a same value in different columns with (a) same single value and (b) with different values ? 2. How to replace n number of values in different columns by (a) single value and (b) with different values ? Could you please add example codes for this on Github? Best regards.
A 13 minute vdo felt like a 1hour class that's the richness of you content excellent 😍😍 And the anaconda was best I never knew of it before your vedio that's so much useful thank you so much
from all these months, your contents have helped me a lot. But still its tough for a newbie like me to understand how SQL,Python(pandas,matplotlib,numpy,seaborn),PowerBI/Tableau helps in data analysis. Can you make a sample project PLEASE!!!
excellent (as usual :-) one comment/question on the DatetimeIndex - I noticed that when you create a date_range dt = pd.date_range("01-01-2017","01-11-2017") dt itself is of type DatetimeIndex, so you shouldn't need to create another object instance idx and instead could use df.reindex(dt) directly... can you please explain the need to create this separate instance idx? Thank you.
Hey! thanks for the tutorial. i have one doubt in this tutorial, can we also use na_values() method for replacing those 99999 values with Nan in data frame??
Gowtham Shetty we can use.Also we even can replace with averages and mean modes of respective columns for all Nans as per practical problem demand.This is one of better ways to deal
Step by step guide on how to learn data science for free: ua-cam.com/video/Vn_mmOuQkSA/v-deo.html Machine learning tutorials with exercises: ua-cam.com/video/gmvvaobm7eQ/v-deo.html
Supposedly there are multiple special values in a column , so we are not able to add them manually into the replace list , so anyway how to know the special values without us checking the data columns row by row or without us seeing the dataframe ?
Great tutorials Sir, really helpful :) One question. In the last section, where you showed how to replace a list of values with another list, that looked like it applied to the entire data frame ( e.g. we had multiple columns with "exceptional, "average", etc). So it would carry out this replacement in all the columns. Suppose we have 5 such columns (5 exams/subjects ) and I want this replacement in only two columns. then do I need to do something like this? df_new = df.replace({ 'exam1' : ["poor", "average", "excellent"] 'exam2' : ["poor", "average", "excellent"] }, [0, 1, 2] ) is this correct code?
Hi Brother, Thanks for making this wonderful tutorial.You are great in explaining the concepts. How to extract the year in the string "June 13, 1980 (United States)". Kindly Regards, MA
If replacing values with the mean of that column, could i just do this -> new_df = df.replace({ 'temperature':-99999, 'windspeed':0,}, { df['temperature'].mean(), df['windspeed'].mean()} ) new_df I got an error saying Value argument must be scalar, dict, or series?
when you do the regex replace, the number format for the temp and windspeed columns changes from 99.9 to 99 - in fact its not clear whether the data is considered numeric anymore.
sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace . Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...
Hi, Your tutorials are really helpful. Thanks for these clips. my question are, 1. How can I keep the changes I am making? Cause, everytime I am trying the other option, data goes back to the original status and making the changes on it. 2. How can I combine multiple codes together? For example, I used this code and worked on the dataset. new_df.replace({ 'temperature': '-99999', 'windspeed': ['-99999', '-88888'], 'event': '0' }, 'NaN') Then when I used the following code, data set went back to original shape. Meaning, the changes occurred due to previous code were no longer there. new_df.replace({ 'temperature':'[A-Za-z]', 'windspeed': '[A-Za-z]', },'', regex=True) test So, how can I make it stick without creating Dataframe every time? I really appreciate your suggestion.
i have a CSV and xlsx file (both the same data) but it cant use parse_dates or .astype to convert to datetime64 type. ?? any suggestions? Thanks for the videos very informative
Hello sir. I have a question. How to use excel sheet cell data to modify/.replace a text file. E.g. i have a excel file. In which i have a data in cell 1 e.g A1=10 20 30. And i want to use this cell vale to .replace a text file. .replace('cell A1 data', '202020 (which is available in text').
Hi your video is great. But I dont know why as I import the excel file and want to solve the missing values in it with your method it just cannot work as it still in a NaN . Btw, is there any way that I can communicate with you to discuss my problem. Then when I try to use the scikit learn method, it just appears an error that my 'imputer' cannot be subscriptable. what does that means? Pls help me to solve this error.
While dealing with 32 F, you used regex, but the data is now object type not int64 and you can not do mean, mode and other similar stuff on objects type data
Thanks for the wonderful explanation! I have a query, Kindly address. When the 'replace' function was used on 'Temperature' and 'Windspeed' columns, the values were converted from 'int' to 'float'. Could you please explain how can we replace few values to NaN and retain the type of that column as 'integer'(not float)?
Hi, I had a small doubt here The data set that u are using to explain this concept - " missing values treatment" is a very small data set where i can see my entire data set and visually observe wheather or not my data set contains any missing values and then do the treatment accordingly. but if data set is too big to be observed visually then how would i figure out wheather the data set contains any form of missing values ?
let's say that I have a big dataset where a feature has many -ive values, I want to replace all of them by 0, can you tell me how to do that, U have taken a very small dataset here.
Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean is how would i replace values of the column with the mean of that respected column.
I have a situation where I need to exchange the column values between two data frames based on some criteria. I have a code ready for that as well using replace () but it is not working in few scenarios. Can you please help. I can email you my code and data frame details
when doing the below, I also want to replace 'No Event' with 'Sunny': df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True) : Is it possible, I tried doing this way df_new = df.replace({ 'temprature':'[A-Za-z]', 'windspeed':'[A-Za-z]' },'',regex=True, 'No Event', '')
Sir I just code ML with excel sheet contains a small data then when I was run the program it showing error - no such file or directory.Is there any solution for this
I have one question. How to replace a particular column of all values which has greater than a particular value. Example: x['ApplicantIncome']>5070 , it has to replace with 5000 which has greater than the values of all 5070 ?
Awesome , thanks for yur repsonse. train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean if x >5070 else x). for me getting an TypeError: '>' not supported between instances of 'method' and 'int' while giving anyvalue instead of train.mean its working.
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean() if x >5070 else x) TypeError: object of type 'int' has no len() please help me out
For some really really strange reason. My replace function just doesn't seem to work. Like it doesn't show any error. It basically does nothing. I really fail to understand what's wrong. new_df = df.replace({ 'Temperature': -99999, 'Windspeed':[-99999,-88888], 'Event': 0 },np.NaN) new_df
it should be small letters 'temperature','windspeed','event', also 'event':'0' new_df = df.replace({ 'temperature': -99999, 'windspeed':[-99999,-88888], 'event':'0' },np.NaN) new_df
Can you print df just before the replace call and make sure column names and values you want to replace are same as what you are passing in replace call as parameters. You code looks correct to me so not sure why it would not work!
@@codebasics I tried each and every thing. Lastly I'll try on some another pc or reinstall my anaconda and try again with Jupyter. Because in PyCharm pandas doesn't work for me.
@@codebasics Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column. What i mean how would i replace values of the column with the mean of that respected column.
hello, data.replace({'Dependents':['+']},'', regex=True) data.replace({'Dependents':'+'},'', regex=True) data.replace('+',' ', regex=True) i tried all the method.. facing same error all the time error: nothing to repeat at position 0 how to remove that '+' sign from the data set.
Dependents columns has value ---> 0, 1, 2, 3+, nan total no of row is 614. regex=False it reflects complete dataset---like print(data) and no change in the values of Dependent column
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
This helped me tremendously with cleaning up messy serial data that I was logging from a microcontroller into a useful data frame. Thank you for posting these free of charge and helping me finish my senior design project!
it was sweet and sort to the point best tutorials i have seen on whole youtube platform if anyone planning to learn pandas go for his playlist line by line it is amazing (best from all).....
You are so great in explaining the concepts, anyone can understand.
Yea im literally learning pandas from his vdeos
Your videos have always more to offer. Very useful for data analysis and in the process eventually for Machine Learning. Thank you very much
Thanks, this is a fantastic tutorial. Just one question - at 7:14, you modified the temperatures column to hold 32 F and 32 C. Then you just removed the letters so that both of them became 32. Should you have done some sort of conversion first? 32 F and 32 C are not equal, so shouldn't you have used the (°F − 32) × 5/9 = °C formula to normalize all of them to C or all of to F?
Have u found the solution ..how to ahead in case of such situations .
Kmph and mph in same column .
Centigrade and farenheit in same column .
Technically you are correct. This values will confuse. But he is teaching only how to chop-off the letter which is fixed with this values. In real world, you will make the temperature column as F or C. You can not hold both with out letters.
you are correct .......if we replace like that values cant give correct prediction on analysis
@@anweshgandham6776 We can use regex to identify whether C or F is present in the cell and multiply the filtered records with respective conversion.
@@anweshgandham6776 you could use regex with a replace pattern being a function
Thank you for making it free. One of the best pandas tutorial
Glad it was helpful!
Sir! I am very excited to see this tutorial....i am starting your roadmap of data science..This is very usefull
Sir, you are teaching very good way
You are absolutely fantastic. I am looking forward to whatever you do next
Thank you very much for this awesome video! I have 2 small questions:
1. How to replace multiple occurrences of a same value in different columns with (a) same single value and (b) with different values ?
2. How to replace n number of values in different columns by (a) single value and (b) with different values ?
Could you please add example codes for this on Github?
Best regards.
These are an excellent video series, by the way. You've got my sub!
Extremely helpful. Thank You sir for making us understand in such a easy way.
Thanks for making this wonderful tutorial. It shows how powerful python is.
oh definately, python rules the world!
Superb explanation, I have started with this series and its helping me a lot. Many Thanks.
At 10:35,
you can also use below code i guess,
new_pr = df.replace(['A-Za-z'], regex=True)
instead of using the dictionary. Worked well for me.
Yes regex can also be used. Thanks for the tip bhakti 👍
But it will remove the data from event also, which we don't want.
Watched Again. Thank You Very Much, Its' Very Helpful.
Thank u sir was strugling with one question related to replace able to solved it thank u...
Amazing teaching.!!!!! Thank you.
Awesome tutorial! Thanks
You are absolutely fantastic. more videos pls
A 13 minute vdo felt like a 1hour class that's the richness of you content excellent 😍😍
And the anaconda was best I never knew of it before your vedio that's so much useful thank you so much
Vinay I am glad you liked it
Good explanation.
thank you for helping me out.... I struggled the whole week... this helped meee
Glad it helped!
Amazing this came in so handy when completing my assignment.
👍😊
from all these months, your contents have helped me a lot. But still its tough for a newbie like me to understand how SQL,Python(pandas,matplotlib,numpy,seaborn),PowerBI/Tableau helps in data analysis. Can you make a sample project PLEASE!!!
I have few projects on power bi/ tableau already on my channel. I don't have projects on SQL/Pandas etc and I will definitely add those in the future.
Excellent Videos. Thanks for Uploading Videos
Really great and clever tutorials. Thank you!
Glad it was helpful!
Sir,as there are different values 32F anf 32C.What if we want to convert all F values to C values.How we will handle this type of data?
excellent (as usual :-) one comment/question on the DatetimeIndex - I noticed that when you create a date_range dt = pd.date_range("01-01-2017","01-11-2017") dt itself is of type DatetimeIndex, so you shouldn't need to create another object instance idx and instead could use df.reindex(dt) directly... can you please explain the need to create this separate instance idx? Thank you.
Hey! thanks for the tutorial.
i have one doubt in this tutorial, can we also use na_values() method for replacing those 99999 values with Nan in data frame??
Gowtham Shetty we can use.Also we even can replace with averages and mean modes of respective columns for all Nans as per practical problem demand.This is one of better ways to deal
Step by step guide on how to learn data science for free: ua-cam.com/video/Vn_mmOuQkSA/v-deo.html
Machine learning tutorials with exercises:
ua-cam.com/video/gmvvaobm7eQ/v-deo.html
Sir please make a video on regex
explained nicely
Gurpreet , I am happy this was helpful to you
Thanks
Amazing sir 👍🙌
Really awesome👍
Glad it was helpful!
Thanks for the wonderful tutorial. It helped me a lot.
Supposedly there are multiple special values in a column , so we are not able to add them manually into the replace list , so anyway how to know the special values without us checking the data columns row by row or without us seeing the dataframe ?
Thank you so much. i learnt a great deal.
Great video. But 32F != 32C. We have to covert at least 1 unit. How to do that, if there are multiple units in a column?
It is simple and helpful! Thanks!!!
Again ,a big job done,A Great thank you.
Great tutorials Sir, really helpful :)
One question. In the last section, where you showed how to replace a list of values with another list, that looked like it applied to the entire data frame ( e.g. we had multiple columns with "exceptional, "average", etc). So it would carry out this replacement in all the columns. Suppose we have 5 such columns (5 exams/subjects ) and I want this replacement in only two columns. then do I need to do something like this?
df_new = df.replace({
'exam1' : ["poor", "average", "excellent"]
'exam2' : ["poor", "average", "excellent"]
}, [0, 1, 2] )
is this correct code?
so we have to write a line of code for every column we want to do this, right? Also, technically was my code wrong?
df9 = pd.DataFrame({
'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'],
'score1': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'score2': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
})
df9
df9.score.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4],inplace = True)
df9.score1.replace(['poor', 'average', 'good'], [1,2,3],inplace = True)
df9
Excellent presentation sir. I would like to know your name please.
Hi Brother,
Thanks for making this wonderful tutorial.You are great in explaining the concepts.
How to extract the year in the string "June 13, 1980 (United States)".
Kindly Regards,
MA
If replacing values with the mean of that column, could i just do this ->
new_df = df.replace({
'temperature':-99999,
'windspeed':0,}, { df['temperature'].mean(), df['windspeed'].mean()} )
new_df
I got an error saying Value argument must be scalar, dict, or series?
df.temperature.replace(-999999,df.temperature.mean())
Hi, thank you for this tutorial, I would like to ask how could we combine monitoring data for each half hour to hourly data?
it has worked, thank you
I really grateful, thanks
when you do the regex replace, the number format for the temp and windspeed columns changes from 99.9 to 99 - in fact its not clear whether the data is considered numeric anymore.
sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace .
Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...
Hi,
Your tutorials are really helpful. Thanks for these clips.
my question are,
1. How can I keep the changes I am making? Cause, everytime I am trying the other option, data goes back to the original status and making the changes on it.
2. How can I combine multiple codes together?
For example,
I used this code and worked on the dataset.
new_df.replace({
'temperature': '-99999',
'windspeed': ['-99999', '-88888'],
'event': '0'
}, 'NaN')
Then when I used the following code, data set went back to original shape. Meaning, the changes occurred due to previous code were no longer there.
new_df.replace({
'temperature':'[A-Za-z]',
'windspeed': '[A-Za-z]',
},'', regex=True)
test
So, how can I make it stick without creating Dataframe every time?
I really appreciate your suggestion.
Sir please make a video on regex
thanks sir
Hei, you can watch mine too. The channel has both Python and R playlists, and source files can be downloaded(link is in video description).
I love your vids, thank you.
But may i please ask how to deal with missing values that comes in form of a hyphen in my data set.
Kind regards.
@@codebasics ok, thank you
good job dear
i have a CSV and xlsx file (both the same data) but it cant use parse_dates or .astype to convert to datetime64 type. ?? any suggestions? Thanks for the videos very informative
Sir is it mandatory to learn ML we have to cover pandas,matplotlib,seaborn?
Many thanks.
nice sir
In the case where ? represents the missing value, how do you still implement the replace method. It seems not to work when the value is ? sign
Hello sir. I have a question. How to use excel sheet cell data to modify/.replace a text file. E.g. i have a excel file. In which i have a data in cell 1 e.g A1=10 20 30. And i want to use this cell vale to .replace a text file.
.replace('cell A1 data', '202020 (which is available in text').
Hi your video is great. But I dont know why as I import the excel file and want to solve the missing values in it with your method it just cannot work as it still in a NaN . Btw, is there any way that I can communicate with you to discuss my problem. Then when I try to use the scikit learn method, it just appears an error that my 'imputer' cannot be subscriptable. what does that means? Pls help me to solve this error.
Great tutorial :)
Do you training videos on Machine Learning and R language also?
Thank you so much. This is big help.
While dealing with 32 F, you used regex, but the data is now object type not int64 and you can not do mean, mode and other similar stuff on objects type data
then how to deal with that i mean change ?
Amazing man
Osama, I am glad you liked it
Thanks for the wonderful explanation! I have a query, Kindly address. When the 'replace' function was used on 'Temperature' and 'Windspeed' columns, the values were converted from 'int' to 'float'. Could you please explain how can we replace few values to NaN and retain the type of that column as 'integer'(not float)?
Thank you
You're welcome
Hi,
I had a small doubt here
The data set that u are using to explain this concept - " missing values treatment" is a very small data set where i can see my entire data set and visually observe wheather or not my data set contains any missing values and then do the treatment accordingly. but if data set is too big to be observed visually then how would i figure out wheather the data set contains any form of missing values ?
df.isna().any(axis=1)
the best
Why missing values need to be -99999 is it effecient to do. Does replacing with 999 or 9999, 99999 makes any difference?
Great stuff -- thanks!
Why did you use "np.NaN", and not simply "NaN" in "replace" at 2:05 ?
There is nothing like NaN in python. You have to use np.nan from numpy module
Can we have cheat sheet for all these pandas tutorials
let's say that I have a big dataset where a feature has many -ive values, I want to replace all of them by 0, can you tell me how to do that, U have taken a very small dataset here.
How to implement the below replace fn only to "Score" column?
df.replace(['poor', 'average', 'good', 'exceptional'], [1,2,3,4])
but 32 F is not equal to 32 C so how is the data correct .Is there any way to make this right or we need to multiply the conversion manually
Great !!!
Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column.
What i mean is how would i replace values of the column with the mean of that respected column.
df.column1.mean should give you a mean value and then you use that in your replace function.
I have a situation where I need to exchange the column values between two data frames based on some criteria. I have a code ready for that as well using replace () but it is not working in few scenarios.
Can you please help. I can email you my code and data frame details
when doing the below, I also want to replace 'No Event' with 'Sunny':
df_new = df.replace({
'temprature':'[A-Za-z]',
'windspeed':'[A-Za-z]'
},'',regex=True)
:
Is it possible, I tried doing this way
df_new = df.replace({
'temprature':'[A-Za-z]',
'windspeed':'[A-Za-z]'
},'',regex=True, 'No Event', '')
Sir I just code ML with excel sheet contains a small data then when I was run the program it showing error - no such file or directory.Is there any solution for this
man hats off.
Buddy can u share your email with me or mail me at jatin97.intruder@gmail.com,coz i have some doubts to be cleared😊
please tell it to me how can i replace this pattern (#R##N##R##N#) this columns contain text too
while using dictionary for replace my temperature column is nt replaceing
I have one question.
How to replace a particular column of all values which has greater than a particular value.
Example: x['ApplicantIncome']>5070 , it has to replace with 5000 which has greater than the values of all 5070 ?
Awesome , thanks for yur repsonse.
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean if x >5070 else x).
for me getting an TypeError: '>' not supported between instances of 'method' and 'int'
while giving anyvalue instead of train.mean its working.
train['ApplicantIncome']=train.ApplicantIncome.apply(lambda x: train.mean() if x >5070 else x)
TypeError: object of type 'int' has no len()
please help me out
df10 = pd.DataFrame({
'score': ['exceptional','average', 'good', 'poor', 'average', 'exceptional'],
'student': ['rob', 'maya', 'parthiv', 'tom', 'julian', 'erica'],
'income': [5071,6000, 6500, 7500, 8000, 3000],
})
df10.income = df10.income.apply(lambda x: 5000 if x >5070 else x)
Machine learning tutorials with exercises are available at:
ua-cam.com/video/gmvvaobm7eQ/v-deo.html
How can I download csv file from github which link is given..is it possible?
how to replace particular column values with their mean where some column values are =0
pls, help me out.
How to replace values under a certain condition? Example I want to replace all values on temperature column that are above 32 with a word
@@codebasics Thank you teacher, your videos are really helpful
the code (new_df=df.replace(-9999,np.NaN)
new_df ) don't work , what i have to do ?
df = pd.read_csv('filename',na_values=(-1))
df
converter and replace both act same ?
How to handle with data's like suppose age=200 how to rectify this
why for 0 only we are using like '0'
Because the event type is 'str' you can check it in such a way "print (type(df.event[0]))" hope this helps 😊
How np.NaN works.Can anybody help me out?
You will have to import numpy as np to use it
You haven thought how to replace special characters like '?'
For some really really strange reason. My replace function just doesn't seem to work. Like it doesn't show any error. It basically does nothing. I really fail to understand what's wrong.
new_df = df.replace({
'Temperature': -99999,
'Windspeed':[-99999,-88888],
'Event': 0
},np.NaN)
new_df
it should be small letters 'temperature','windspeed','event', also 'event':'0'
new_df = df.replace({
'temperature': -99999,
'windspeed':[-99999,-88888],
'event':'0'
},np.NaN)
new_df
@@prickingpringle5187 In my file I've put like that. Where every first alphabet is capital.
Can you print df just before the replace call and make sure column names and values you want to replace are same as what you are passing in replace call as parameters. You code looks correct to me so not sure why it would not work!
@@codebasics I tried each and every thing. Lastly I'll try on some another pc or reinstall my anaconda and try again with Jupyter. Because in PyCharm pandas doesn't work for me.
@@codebasics Hello, if i wanted to replace 0 values lets say with the mean of that one column and -9999 with the mean of the other column.
What i mean how would i replace values of the column with the mean of that respected column.
why we use np before Nan in np.NaN ?
codebasics have you made some video's on numpy
hello,
data.replace({'Dependents':['+']},'', regex=True)
data.replace({'Dependents':'+'},'', regex=True)
data.replace('+',' ', regex=True)
i tried all the method.. facing same error all the time
error: nothing to repeat at position 0
how to remove that '+' sign from the data set.
Dependents columns has value ---> 0, 1, 2, 3+, nan
total no of row is 614.
regex=False
it reflects complete dataset---like print(data) and no change in the values of Dependent column
codebasics Thanks A Lot... Finally it worked... Thanku
why you used np.NAN to replace ??
np.nan means numpy Not A Number. This is to tell Pandas to flag it as a blank
@@tomparatube6506 ok. Got it 👍