Data Cleaning in Pandas | Python Pandas Tutorials

Поділитися
Вставка
  • Опубліковано 9 січ 2025

КОМЕНТАРІ • 424

  • @fede77
    @fede77 Рік тому +382

    For those struggling with the regular expression at 14:57 , you might need to explicitly assign regex = True (based on the FutureWarning displayed in the video). That is:
    df['Phone_Number'] = df['Phone_Number'].str.replace('[^a-zA-Z0-9]', '', regex=True)

    • @wenkanglee9596
      @wenkanglee9596 Рік тому +9

      gosh you're observant

    • @ronnelsupnet9850
      @ronnelsupnet9850 Рік тому +3

      Thank you!

    • @rhodaime79
      @rhodaime79 Рік тому +7

      My goodness. You saved me. I’ve been at this for about an hour. Thank you 🙏 thank you 🙏

    • @DevanshAsawa
      @DevanshAsawa Рік тому +3

      Thanks a lot dude !!!!!! Helped a lot !!!!!!!

    • @rnjesus9950
      @rnjesus9950 Рік тому +4

      Legend.

  • @rahulraj3855
    @rahulraj3855 Рік тому +270

    Fan from India I just got 2 offers from very good companies thanks to your videos and it helped me transition from a customer success support to Data Analyst

    • @rozakhan2811
      @rozakhan2811 Рік тому +1

      Hey tell me how can I do it too ri8 now I'm working as a customer support executive please help me to grow..

    • @dywa_varaprasad
      @dywa_varaprasad Рік тому +1

      hey Rahul, how do you learn DA ? Can you share your experience it will be helpful for us!!

    • @sandeepthukral3018
      @sandeepthukral3018 Рік тому +1

      Hi bro is this course sufficient for beginner to land a job

    • @abdullahalmahfuz6700
      @abdullahalmahfuz6700 Рік тому +6

      Is this a spam comment?

    • @KingofWorld1922
      @KingofWorld1922 Рік тому

      ​@rozakhan2811 skills need is a basic thing...what you want..in that be strong..And way of Alex Teach Videos are Effective..

  • @tomaronson4419
    @tomaronson4419 11 місяців тому +126

    For splitting the address at 21:29, you may want to add a named parameter to the value of 2, as in n=2:
    df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(',', n=2, expand=True)

    • @JayDenton-n1n
      @JayDenton-n1n 11 місяців тому +2

      This helps! Thank you so much!

    • @nataliarobinson5671
      @nataliarobinson5671 11 місяців тому

      Thank you very much

    • @OmarRabeh
      @OmarRabeh 10 місяців тому

      thank you very much

    • @OkallTheAnalyst
      @OkallTheAnalyst 10 місяців тому

      Thank you!

    • @janrelleelam3628
      @janrelleelam3628 8 місяців тому

      OMG! Thank you so very much. I have been trying to figure this out for about four days now. I figured out the phone number issue and then how to split the address, but for the life of me splitting the address into named columns with the changes committed the df was not working. THANK YOU!

  • @bennet5467
    @bennet5467 Рік тому +32

    Thanks for this content, this was so helpful!!
    I think i have some optimizations, correct me if im wrong :D
    27:04 instead of calling the replace function multiple times, you can create a mapping just like: replace_mapping = {'Yes': 'Y', 'No': 'N'} and call it like: df = df.replace(replace_mapping), so you dont have to specify mapping for each column and need to call .replace() just once.
    34:16 instead of the for loop + manually dropping row per row, you can make use of the .loc function like: df = df.loc[df["Do_Not_Contact"] == "N"] in order to filter the rows based on filter criterium.

    • @ivanovalle9764
      @ivanovalle9764 11 місяців тому

      Where did you learn that you could use a dictionary format to replace multiple values in one line? this is really useful, thanks!

    • @yanpaucon1043
      @yanpaucon1043 8 місяців тому

      Thank You. 34:16 is really helpful. I appreciate your kindness.

  • @ashwanikumarkaushik2531
    @ashwanikumarkaushik2531 Рік тому +53

    This is one of the best videos regarding data cleaning I have ever watched. Really crisp and covers almost all the important steps. It also dives deep into concepts that are really important, but you rarely see anybody applying them.
    Must watch for everybody, who is looking to get into data field or are already in the field.

  • @DreaSimply21
    @DreaSimply21 Рік тому +8

    I like how in some of your videos you show us the long way and then the short cut, instead of just showing the short cut. I think that way gives the person who is learning a better breakdown of what they are doing.

  • @millenniumkitten4107
    @millenniumkitten4107 Рік тому +69

    Some of the phone numbers are removed while doing the formatting. If you look in the excel file, you'll see that some of the numbers are strings and some are integers. When you run the string method during the formatting, it replaces the numeric values with NaN and they are later removed completely. If you want to avoid losing that data you'll need to use
    df["Phone_Number"] = df["Phone_Number"].astype(str)
    before formatting. You also won't need to convert to string in the lambda after doing this.

    • @millenniumkitten4107
      @millenniumkitten4107 Рік тому +11

      If you want to replace the empty values in No Not Contact you'll need to use
      df["Do_Not_Contact"].astype(str).replace("","N")
      Technically those values are not empty, they are NaNs which is why replace is giving them 'NNN' instead of just the one 'N'. It's treating it as if NaN equals three blank spaces

    • @atomicafk8704
      @atomicafk8704 Рік тому +1

      that's what i've noticed too, great work

    • @jameslindsay4705
      @jameslindsay4705 Рік тому

      You are a genius, thanks :)

    • @jaldaamol46
      @jaldaamol46 9 місяців тому

      Thanks man, this worked.

    • @guilhermeramon9523
      @guilhermeramon9523 6 місяців тому

      Obrigado ! Estava observando isso no meu dataframe e não entendia porque estava acontecendo !

  • @farahandini3799
    @farahandini3799 Рік тому +21

    I really like when you make mistakes, because it tells that no one perfect. I sometimes anxious when I watch tutorials and they seem to be so good. You also implicate the struggles that you experiencing throughout the process is real. Thanks for the tutorial Alex.

  • @sj1795
    @sj1795 Рік тому +2

    Found this REALLY helpful! I love how you walk us through mistakes as well as explain WHY you do what you do throughout your videos. It adds so much value to each video. As always, THANK YOU ALEX!!

  • @margotonik
    @margotonik 10 місяців тому +2

    I enjoyed working on this project. Thank you Alex and a huge thank you to those guys who helped in the struggling minutes!

  • @jeanaimegakwerere8591
    @jeanaimegakwerere8591 Рік тому +3

    Thank you sir, you can't imagine how i fill confident in cleaning data after completing this video with real data practices. Thank you once again.

  • @sabuhiasadli6083
    @sabuhiasadli6083 2 місяці тому +1

    what seems to be a daunting task at the beginning turns out to have an easy explanation with the right tools, thank you Alex !!!

  • @georgekalathoor
    @georgekalathoor 11 місяців тому +18

    instead of applying lambda function to convert Phone_Number column elements to string , we can also use
    df['Phone_Number'] = df['Phone_Number'].astype(str)
    and use dictionary as an argument to be passed inside replace method to avoid Yes becoming YYes df['Paying Customer']= df['Paying Customer'].replace({'Y':'Yes','N':'No'})

  • @L3GAT0Dantes
    @L3GAT0Dantes Рік тому +21

    If you're getting an error when trying to split the address, this is what worked for me; I had to remove the number of values to look for.
    df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(',', expand=True)

    • @arpandebnath6115
      @arpandebnath6115 Рік тому +4

      df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(pat=',', n=2, expand=True) use this you have to include pat

    • @toni_munoz
      @toni_munoz Рік тому +1

      thank you!

    • @warinside7831
      @warinside7831 Рік тому +1

      what does that exactly?

  • @stefan5249
    @stefan5249 2 місяці тому

    Thank you for the data example, now I can connect all the code snippets that I learned individually and can finally use them together in your example!
    Really one of the best exercises I have found so far!
    Thank you so much, Alex!!

  • @morris9973
    @morris9973 Рік тому

    I've been struggling with Pandas a bit and this video cleared some things for me!
    what frustrates me from the way my teachers would teach Pandas, their solutions are sometimes too efficient, in the sense that a student that started from zero who's taking an exam, will never be able to come up with these hyper efficient and elegant one-liners in their code. what I appreciate in your video is how you achieve the same results, but in a way that a beginner can easily remember and apply on an exam. thank you! I'll be checking out more of your videos.

  • @iinph
    @iinph Рік тому +1

    thank you for your work Alex! I went through the entire video 1 by 1 twice and I can tell I learned a lot from this video , finally understanding why we need to learn Loops etc. and how simple cleaning methods work on Jupyter.

  • @dullfire8140
    @dullfire8140 Рік тому +4

    man lets go,you are our hero who can not afford paid courses

  • @A4O_TSL
    @A4O_TSL Рік тому +3

    Alex your are the GOAT! for real thank you for all the tutorials and your help for everyone who want's to become a data analyst1

  • @HunzaFolk
    @HunzaFolk 10 місяців тому +1

    I am studying Data Collection and Data Visualization at Kings College, your channel is reccomned by our lecturers to understand data cleaning.

  • @drumkick1397
    @drumkick1397 Рік тому +21

    I discovered that replace() has an argument regex (regular expression). It is set as regex = True but when we change it to regex = False, it only looks for exact matches, meaning it won't change 'Yes' to 'Yeses', only 'Y' to 'Yes'. We can write df["Paying Customer"].replace('Y', 'Yes', regex = False) and it will work as expected.

  • @emmanuelnwachukwu6071
    @emmanuelnwachukwu6071 Рік тому

    This is the best video I have ever watched on data cleaning using pandas.. even the mistakes were good to learn from.

  • @pip9601
    @pip9601 8 місяців тому +7

    at 15:19 i would like to say something. in the new version from jupyter, if u write the code from alex the data will be same. To fix this, u can input regex = True after the ''. CODE: df['Phone_Number'].str.replace('[^a-zA-Z0-9]', '', regex = True). But overall i can't say anything except thank u alex for this awesome tutorial !!!!

  • @Elly-we9uc
    @Elly-we9uc Рік тому +4

    Also, to clean the Do_Not_Contact field, one can use: df['Do_Not_Contact'] = df['Do_Not_Contact'].replace({'N': 'No', 'Y': 'Yes'})

  • @anikkantisikder2179
    @anikkantisikder2179 Рік тому +27

    For the address column: df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(",", n=2, expand = True). Defining only 2 was giving me an error. so i had to change it to n=2

    • @DreaSimply21
      @DreaSimply21 Рік тому +3

      This helped me, thank you! However, what does '"n" mean?

    • @bobojonkasymov2279
      @bobojonkasymov2279 Рік тому +3

      n=2 parameter indicates that the split should occur at most two times, producing three resulting parts.@@DreaSimply21

    • @championsadiq7411
      @championsadiq7411 Рік тому +1

      Thank you for this. It helped me a great deal

  • @menyajasper4940
    @menyajasper4940 Рік тому +1

    This is really very important to both the beginners and pro. Kudos!!

  • @YR-up8vk
    @YR-up8vk Рік тому +2

    Thank you Alex for this detailed breakdown. Just a side note for those who don't like to use loops e.g. for, while
    For 31:00, you could do the following code 'df.drop(df[df['Do_Not_Contact'] == 'Y'].index, inplace=True'

    • @LuisRivera-oc6xh
      @LuisRivera-oc6xh Рік тому +3

      I'd say that's complicating the code. You can simply do
      df = df[df['Do_Not_Contact'] != "Y"]

    • @vickygalih5571
      @vickygalih5571 Рік тому

      @@LuisRivera-oc6xh i literally use this at the first time learning pandas myself

    • @ghanem87
      @ghanem87 Рік тому

      df = df.drop(df[df['Do_Not_Contact'] == 'Y'].index)
      df = df.drop(df[df['Do_Not_Contact'] == ''].index)
      OR
      df = df[df['Do_Not_Contact'] == 'N']

  • @Elly-we9uc
    @Elly-we9uc Рік тому +3

    Timestamp 32:42. I simply use
    #Filter out "Do_Not_Contact" == "Yes"
    df[df['Do_Not_Contact']!='Yes']

  • @danielblum5691
    @danielblum5691 Рік тому +1

    Thank you for this video. I just finished this part of the data analytics course and I definitely learned something new and helpful.

  • @MegaDave8520
    @MegaDave8520 Рік тому +8

    And I was already looking for some Pandas tutorial. Thank you, Alex, this was much needed. :)

  • @khaibaromari8178
    @khaibaromari8178 Рік тому +2

    Simply amazing! Well-explained and comprehensive. Loved it!

  • @DataScienceconMilton
    @DataScienceconMilton Рік тому +2

    Great Pandas data cleaning video. Thank you very much for sharing your knowledge.

  • @22MSHIVASHANKAR
    @22MSHIVASHANKAR Рік тому +1

    Alex, I loved the Video. It have Correct Explanation. Thank you so much for your Video.
    There is a Small Mistake while you are typing
    #Another Way to drop null value
    df.dropna(subset='Column_name',inplace = True). I hope you will notify the Error.
    Thank you.
    Have a Great day!

  • @villjack
    @villjack Рік тому +1

    My fav thing to do in pandas, thanks for making tutorial.

  • @rnjesus9950
    @rnjesus9950 Рік тому

    After making it this far through the course over the last 2 months, looking at these last 4 videos I'm getting strong final exam vibes. Python has not felt intuitive to me at all, but I recognize its value. I guess it feels like taking Spanish 1 and having Spanish 2 tests. I'm definitely looking forward to applying what I've learned here to solidify the lessons more. I'm contracting for a company already and writing a proposal for them to transition to My SQL Server. I guess the fact that I feel overwhelmed with all the info means I'm actually learning how little I actually know, which is a good thing for growth in the long run. Rambling here, but I am incredibly thankful for the course, Alex.

  • @bharatsaraswat
    @bharatsaraswat Рік тому

    Very well done! Great video. I am working on analyzing and cleaning scraped data from web and this guide is helpful, especially where you mentioned the mistakes.

  • @JK-tk2do
    @JK-tk2do Рік тому

    Oh my.. I am going to watch every single video you created..

  • @yashjohngaming2928
    @yashjohngaming2928 Рік тому

    Best video available on internet so far for data cleaning in Pandas. Best explanation. 😇😇

  • @jtmoleleki3604
    @jtmoleleki3604 10 місяців тому

    Thank you Alex. Your videos are very helpful. Now I can resume cleaning my data.

  • @dawewatwese6301
    @dawewatwese6301 Рік тому +5

    Hi Alex, idk if you will see this comment. So I was doing the same codes, and I noticed when you eliminated the characters for the phone numbers at 14:57 you also deleted the phone numbers that did not have any characters in them. You can see that at index 3 for Walter White, before he had a phone number but after he had NaN. If you can tell me how to correct it, it would be very great. I also never commented on your videos, but i like them very much, they are very good, and helpful. Thanks for everything

    • @GlennLee-qz4st
      @GlennLee-qz4st Рік тому +2

      Not sure if you're still looking for a solution, but from some online searching, I found a solution to avoid deleting phone numbers that did not have any error/contain no characters, by adding .astype(str) before .str.replace, this seems fix the issue and the code should look something like this:
      df["Phone_Number"] = df['Phone_Number'].astype(str).str.replace('[^a-zA-Z0-9]','',regex=True)
      Also note you'll have to add in regex=True manually.
      Maybe it's deleting as it somehow interpret whole number as non-numeric and deleting it erroneously, not 100% sure tho, still a beginner, and it might cause issue with other types of data.

    • @TasosKaraiskos
      @TasosKaraiskos 2 місяці тому

      @@GlennLee-qz4st for me, walter white's telephone number is being deleted before the str.replace instruction is written. it's deleted as soon as i run
      df['Last_Name'] = df['Last_Name'].str.lstrip('...')
      df['Last_Name'] = df['Last_Name'].str.lstrip('/')
      df['Last_Name'] = df['Last_Name'].str.rstrip('_')
      for some reason.

  • @nitinvishwakarma9624
    @nitinvishwakarma9624 6 місяців тому

    Thank you, this is most elborative and simplest videos i saw

  • @nguyenthikieuoanh8966
    @nguyenthikieuoanh8966 6 місяців тому

    thanks for your effort making this amazing video. It helps me alot. I've been struggling on Data cleaning and your video is helpful

  • @ritwikmukherjee3572
    @ritwikmukherjee3572 5 місяців тому

    Hello Alex, thank you for such a wonderful tutorial . I have one suggestion regarding the last part where you are filtering
    # Filtering the Data with "Do_Not_Contact" Column with N and " "
    Filter1 = df["Do_Not_Contact"]=="N"
    Filter2= df["Do_Not_Contact"]==""
    df[Filter1 | Filter2]

  • @enyinnayajaja
    @enyinnayajaja Рік тому

    Thank you Alex for this video on data cleaning with pandas. It is very detailed and explanatory

  • @salaimani
    @salaimani 10 місяців тому +1

    How you are at 23:27 apply the changes and go back to the previous steps in Jupiter notebook

  • @traetrae11
    @traetrae11 Рік тому +2

    Thank you Alex. That Lambda example is going to be very useful.

  • @aaspirant5392
    @aaspirant5392 Рік тому +1

    You are great, Alex. Your teaching skills excellent.

  • @chernobarry6035
    @chernobarry6035 Рік тому +1

    Your explanation was super cool

  • @Mwalimu-wa-Math
    @Mwalimu-wa-Math 9 місяців тому

    38:36 df[['Street_address','State','Zip_code']]=df['Address'].str.split(" ",n=2, expand=True)

  • @ZeuSonRed
    @ZeuSonRed Рік тому

    I not only survived! on 20:46 you can place AND in .replace('nan--' AND 'Na--' , ' '). Thank you 1:1

  • @sumeetkajale3679
    @sumeetkajale3679 Рік тому +2

    Hey alex, we don't need to take any course because you are there 😉
    I am doing your bootcamp of becoming a data analyst

    • @AlexTheAnalyst
      @AlexTheAnalyst  Рік тому +2

      Do it! I try my best to bring the best free content I can :)

  • @Datatalksbro
    @Datatalksbro 4 місяці тому

    # Step 1: Convert to string and clean non-digit characters
    beta['Phone_Number'] = beta['Phone_Number'].apply(lambda x: ''.join(filter(str.isdigit, str(x))) if pd.notna(x) else x)
    # Step 2: Format the phone number to xxx-xxx-xxxx if it is exactly 10 digits long
    beta['Phone_Number'] = beta['Phone_Number'].apply(lambda x: f'{x[0:3]}-{x[3:6]}-{x[6:10]}' if pd.notna(x) and len(x) == 10 else x)
    print(beta)

  • @alwaysbehappy1337
    @alwaysbehappy1337 Рік тому +2

    Thanks Alex, Please post more videos.

  • @mastermatt6090
    @mastermatt6090 10 місяців тому

    I was intimidated by the Machine learning module but now I am not. Thanks a lot dude

  • @sdivi6881
    @sdivi6881 11 місяців тому +3

    If any one is getting an error on df['Address'].str.split(",",2, expand=True), you can omit 2 and use df["Address"].str.split(",", expand=True)

    • @Gratitude-x3g
      @Gratitude-x3g 6 місяців тому

      @sdivi6881 Thank you so much 😊😊😊

  • @mohitjoshi8984
    @mohitjoshi8984 Рік тому +1

    Hello Alex on time of cleaning the Phone_Numder column(14:00 to 21:39 ) the code is executed.
    But at the table there are no changes .
    Please help me on this

    • @fede77
      @fede77 Рік тому

      you might have a newer pandas version, just add regex = True as an extra parameter:
      df['Phone_Number'] = df['Phone_Number'].str.replace('[^a-zA-Z0-9]', '', regex=True)

    • @sasikiran4813
      @sasikiran4813 3 місяці тому

      df['Phone_Number']= df['Phone_Number'].astype(str)

  • @balajijadhav6080
    @balajijadhav6080 5 місяців тому

    Thank you so much sir i have start my data cleaning from you From india 💌

  • @fitnessfreak984
    @fitnessfreak984 Рік тому +1

    Hey, Alex, I just Started your Pandas Tutorial, and I was waiting for Data Cleaning video, when i open my UA-cam, First your Video is seen.. This is boon for me 😇🥺 Thanks, I hope you will Upload Matploib, Numpy and Many More Libraries video ❤🤗

  • @jamilsonedu917
    @jamilsonedu917 Рік тому

    Using regular expressions for manipulating data is beneficial because it allows you to change strings as needed, especially when dealing with different types of strings.

  • @omkar8101
    @omkar8101 Рік тому +3

    Thanks a lot Alex for the video ! This was exactly what I was looking for. May I request you to try and upload video on how to write Python ETL code which uses table in a cloud database like snowflake, saves it in a csv format, transforms it and then again uploads it on snowflake. And all these steps are being captured in a log file which is in txt format !

    • @MehmoodAyazKhan
      @MehmoodAyazKhan Рік тому

      vouching for this @Alex. It'd be really appreciated TIA

  • @wenkanglee9596
    @wenkanglee9596 Рік тому +1

    29:42
    Just sharing my approach to remove the "don't call" rows
    df = df[df['Do_Not_Contact'] != 'Y']
    You can apply this to the missing phone number and the rest as well.

    • @Charlay_Charlay
      @Charlay_Charlay 11 місяців тому +1

      Man i love the comments section. Thank you for sharing this. This is a very simple method.

    • @wenkanglee9596
      @wenkanglee9596 11 місяців тому

      @@Charlay_Charlay glad that helped! You're welcome!

  • @50cent10891
    @50cent10891 Рік тому

    Great video! I enjoyed learning from you! Thanks for making things easier to understand

  • @sauravsubedi7089
    @sauravsubedi7089 10 місяців тому +1

    Instead of striping each symbols one by one in 9:11 i think its better to use
    characters_to_remove = ['/','...','_']
    for x in characters_to_remove:
    df["Last_Name"] = df["Last_Name"].str.strip(x)

  • @ramakrishnaraolakkaraju3750

    Thanks for the video. Helped a lot in understanding Pandas.

  • @wintur2856
    @wintur2856 7 місяців тому

    9:45 What if your data set is larger and you can't look throughout the entire list to see what you need to clean? 😔

  • @Niranga.555
    @Niranga.555 Рік тому +1

    Hey Alex, Thanks for the super content ...!

  • @sarthakmehta5418
    @sarthakmehta5418 5 місяців тому

    at 14:53 we can see walter white-02 index have a phone number but it dissappears after the replace command whyy??

  • @bolajiogunfowote8603
    @bolajiogunfowote8603 Рік тому

    The video I needed to have a realistic practice in data cleaning.thanks

  • @gudiatoka
    @gudiatoka Рік тому

    Great video mam, need more this type of tutorials

  • @FarizDarari
    @FarizDarari 10 місяців тому

    Many thanks for the dataset+code+video!!! 🔥🔥

  • @pewolo_nyenh
    @pewolo_nyenh Рік тому

    For explanation purposes, it is great.
    For getting the final result, I would have done differently though

  • @hamzaabdullahmoh
    @hamzaabdullahmoh Рік тому

    A Glorious Thank You!! Please Keep This UP!!!!

  • @selimc3347
    @selimc3347 Рік тому +1

    Your work are amazing. Thank you so Much

  • @kostas_alexiou
    @kostas_alexiou Рік тому

    Alex i have a question regarding the part in 18:50 where you change the phone number column into string using the str() inside the lambda , can i get the same result using first df["Phone_Number"].astype() and then do the lambda ? or is there a nuance and it works only using str() ? Thanks for the great work !

  • @vasavipasumarthi9601
    @vasavipasumarthi9601 10 місяців тому

    Really u fone a good job i became a big fan of u thank u so much for doing this

  • @markobe08
    @markobe08 8 місяців тому

    for those struggling on 33:55
    df['Do_Not_Contact'].replace('', pd.NA, inplace=True)
    df['Do_Not_Contact'].fillna('N', inplace=True)

    • @AgathaMenc-uv3ob
      @AgathaMenc-uv3ob 6 місяців тому

      Pol miliona złotych i przeprosiny publiczne

    • @AgathaMenc-uv3ob
      @AgathaMenc-uv3ob 6 місяців тому

      Adwokat będzie rozmawiał nie ja

    • @AgathaMenc-uv3ob
      @AgathaMenc-uv3ob 6 місяців тому

      Tak łatwo komuś życie spieprzyć?? Tak łatwo??? Więc oko za oko..

  • @Legomancer
    @Legomancer 9 місяців тому

    at about 33:54, whoa! unless you were specifically told to do this, you are altering the data! Changing no value to 'N' is a no-no unless you have been told to do so. Otherwise you're adding information that was not there. We don't know if Harry Potter wants to be contacted or not and that's probably for someone above our pay grade to decide! :D

  • @shotihoch
    @shotihoch 9 місяців тому

    Not an analyst (never wanted to be), but it was very interesting. Thanks!

  • @neildelacruz6059
    @neildelacruz6059 Рік тому +1

    Thanks for this absolutely great video.

  • @yanpaucon1043
    @yanpaucon1043 8 місяців тому

    Thank you so much, Alex. You are the Best

  • @yvonnemukhono3566
    @yvonnemukhono3566 8 місяців тому

    Very helpful, and well explained.

  • @md.shahriarabidswapnil604
    @md.shahriarabidswapnil604 Рік тому

    thank you very much. your video helped me a lot. good luck

  • @avinashparchake7935
    @avinashparchake7935 Рік тому +5

    in Last_Name columns we can used replace function in order remove regular expression like ( ./-)
    code:
    df["Last_Name"]= df["Last_Name"].str.replace("[./_]","" ,regex= True)

    • @DreaSimply21
      @DreaSimply21 Рік тому +1

      OMG Thank youuuu!!! I knew someone on here had to know the answer to how to use regex lol.

    • @bolajiawofuwa8116
      @bolajiawofuwa8116 Рік тому

      Thanks

  • @meryemOuyouss2002
    @meryemOuyouss2002 Рік тому

    Thank you soo much sir you're really a great professor 👏❤

  • @modern_jacob
    @modern_jacob Рік тому +19

    If the df["Phone_Number"].replace('[^a-zA-Z0-9]', ''") is not working for you. Try, df["Phone_Number"].replace('[^a-zA-Z0-9]', ''", regex=True)

  • @selvas5043
    @selvas5043 Рік тому +1

    Super Explanation Thanks

  • @nirmalpandey600
    @nirmalpandey600 8 місяців тому

    Amazing explanations!

  • @higiniofuentes2551
    @higiniofuentes2551 Рік тому +1

    In the case of column Phone_Number with all the variant of NaN, first "stringuify" the column, and after do the format thing and then replace with nothing all the content of the column when the content contains 2 -
    Thank you!

    • @juanlora5609
      @juanlora5609 Рік тому

      df["Phone_Number"].str.replace('[^A-Za-z0-9]', '', regex=True)

  • @hueytemplar
    @hueytemplar 8 місяців тому

    For those struggling getting an error @23:24 use the below code.
    df["Address"].str.split(',', n=1, expand=True)

  • @maryemmdini9408
    @maryemmdini9408 Рік тому

    very well explained video thank youuuu

  • @Insightss.....
    @Insightss..... 7 місяців тому

    I'm in love with ur videos

  • @rahulkhanvilkar290
    @rahulkhanvilkar290 Місяць тому

    Here, Instead of looping two times 31:00 , just filter df. df1 = df[df['Do_Not_Contact'] != "Y"]. Then df1 = df1[df1['Do_Not_Contact'] != "Y"] ✌

  • @higiniofuentes2551
    @higiniofuentes2551 Рік тому

    Thank you for this very useful video!

  • @avocado23474
    @avocado23474 4 місяці тому

    Thank you a lot, Alex! ^^

  • @adeolaa.366
    @adeolaa.366 11 місяців тому

    great video thank you. when we did the first lambda, the reason was because lambda is faster. so why did we go against using a lambda when it was time to check if the customer can be called or not?

  • @SurendraSingh-bd5wc
    @SurendraSingh-bd5wc 11 місяців тому

    Really enjoyed the video

  • @G2Chanakya
    @G2Chanakya Рік тому +1

    My only doubt is, you saw the first 20 rows and decide only \ or .. or _ could be preceding, or only "Nan" or "N/A" is only there in that row, while replacing it. What if the 50th row has "%Mike" as a name or what if "Null" is there one of the columns?? How do we deal with it. Great recap for me other than this. Thank you.

  • @17art3an
    @17art3an Рік тому

    Thank you, great video!

  • @SoggyBagelz
    @SoggyBagelz Рік тому +1

    Yesss love these vids

  • @alexandermackintosh1755
    @alexandermackintosh1755 Рік тому

    Great video thanks! Can’t help thinking that tools like chatGPT, github copilot al, GPT engineer can pretty much tell you how to/do this all for you so maybe I am wasting my time learning this 😅