How do I find and remove duplicate rows in pandas?

Поділитися
Вставка
  • Опубліковано 7 січ 2025

КОМЕНТАРІ • 233

  • @fredcalo
    @fredcalo 8 років тому +3

    I spent hours trying to figure this stuff out through reading chapters and chapters in Python books. Then I come here, and everything I was trying to figure out was explained in 9 minutes. This was IMMENSELY helpful, thanks!

    • @dataschool
      @dataschool  8 років тому

      Awesome!! That's so great to hear!

  • @mea97905
    @mea97905 8 років тому +26

    I like your concise and precise videos. I really appreciate your efforts.

    • @dataschool
      @dataschool  8 років тому +3

      Thanks, I appreciate your comment!

  • @reubenwyoung
    @reubenwyoung 5 років тому +3

    Thanks so much for this! You helped me combine 629 files and remove 250k duplicate rows!
    You're the man! *Subscribed*

  • @jordyleffers9244
    @jordyleffers9244 4 роки тому +5

    lol, just when I felt you wouldn't handle the exact subject I was looking for: there came the bonus! Thanks!

  • @hongyeegan733
    @hongyeegan733 4 роки тому

    wow! you are already teaching data science in 2014 when it is not even popular! Btw, your videos are really good, you speak slow and clear, easy to understand and for me to catch. Kudos to you!

    • @dataschool
      @dataschool  4 роки тому

      Thanks very much for your kind words!

  • @shashwatpaul3330
    @shashwatpaul3330 4 роки тому +1

    I have watched a lot of your videos; and I must say that the way, you explain is really good. Just to inform you that I am new to programming let alone Python.
    I want to learn a new thing from you. Let me give you a brief. I am working on a dataset to predict App Rating from Google Play Store. There is an attribute by name "Rating" which has a lot of null values. I want to replace those null values using a median from another attribute by name "Reviews". But I want to categorize the attribute "Reviews" in multiple categories like:
    1st category would be for the reviews less than 100,000,
    2nd category would be for the reviews between 100,001 and 1,000,000,
    3rd category would be for the reviews between 1,000,001 and 5,000,000 and
    4th category would be for the reviews anything more than 5,000,000.
    Although, I tried a lot, I failed to create multiple categories. I was able to create only 2 categories using the below command:
    gps['Reviews Group'] = [1 if x

  • @minaha9213
    @minaha9213 2 роки тому +1

    just find your channel , just watched this as my first watch for your videos , and pressed subscribe !!! , cause your explanation for the idea as whole is very remarkable 😃 thanks a lot .

  • @emanueleco7363
    @emanueleco7363 4 роки тому

    You are the greatest teacher in the world

  • @rashayahya
    @rashayahya 5 років тому +2

    I always find what I need in your channel.. and more... Thank you

  • @dhananjaykansal8097
    @dhananjaykansal8097 5 років тому

    I didn't find much in Duplicates. Thanks so much sir. I can't thank u enough.

  • @MrTheAnthonyBielecki
    @MrTheAnthonyBielecki 7 років тому +1

    Exactly what I needed! Why not set up a Patreon so we can show some love?

    • @dataschool
      @dataschool  7 років тому

      Thanks for the suggestion! I am planning to set one up soon, and will let you know when it's live :)

    • @dataschool
      @dataschool  6 років тому +1

      I just launched my Patreon campaign! I'd love to have your support: www.patreon.com/dataschool/overview

  • @Beny123
    @Beny123 6 років тому +3

    Thank you! here is a way to extract the non-duplicate rows df=df.loc[~df.A.duplicated(keep='first')].reset_index(drop=True)

  • @ranveersharma1666
    @ranveersharma1666 4 роки тому

    love u brother . u r changing so many lives, thanku ....the best teacher award goes to Data school.

    • @dataschool
      @dataschool  4 роки тому

      Thanks very much for your kind words!

  • @oeb5542
    @oeb5542 5 років тому +2

    A very much appreciated efforts. Thanks a million for sharing with us your python knowledge. It has been a wonderful journey with your precise explanation. keep the hard work! Warm regards.

  • @tushargoyaliit
    @tushargoyaliit 6 років тому +1

    Myself from Punjab .M studying at IIT even then i got satisfaction of pandas from ur videos only . Thanks
    please give all u done in text format or like tutorial ,

    • @dataschool
      @dataschool  6 років тому +1

      Is this what you are looking for?
      nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb

  • @cablemaster8874
    @cablemaster8874 4 роки тому +1

    Really, your teaching method is very good, your videoes give more knowledge, Thanks Data School

  • @supa.scoopa
    @supa.scoopa 9 місяців тому +1

    THANK YOU for the keep tip, that's exactly what I was looking for!

  • @Kristina_Tsoy
    @Kristina_Tsoy Рік тому +1

    Kevin your videos are super helpful! thank you!!!

  • @cradleofrelaxation6473
    @cradleofrelaxation6473 2 роки тому

    This is so helpful!
    Pandas has the best duplicates handling. Better than spreadsheets and SQL.

  • @jeffhale739
    @jeffhale739 6 років тому +1

    Great video, Kevin! Super useful!

  • @narbigogul5723
    @narbigogul5723 6 років тому

    That's exactly what I was looking for, great explanation, thanks for sharing!

  • @randyle2511
    @randyle2511 7 років тому

    I like it the way you explain things...it's very clearly and precisely. My problem is little more complex where I want to remove the entire row where it met the following conditions.
    If any rows in Latitude column that has the same value as previous row (-1) AND the same row in the Longitude column that has the same values as previous row THEN remove the whole entire row that duplicated. Basically we have to compare two consecutive ROWS and COLUMNS and IF both conditions are met then remove the entire row. Let's say if there are 15 rows have the same values(i.e, If Lat[1,1] == Lat[0,1] & Lon[1,2] ==Lon [0,2] then remove, else skip, # Lat = Col1, Long = Col2) in both Latitude and Longitude columns then remove them all except keep one.
    Hope you got my points... :-). Looking forward to see your code.

    • @dataschool
      @dataschool  7 років тому

      Glad you like the videos! It's not immediately obvious to me how I would approach this problem, but I think that the 'shift' function from pandas might be useful. Good luck! Sorry that I can't provide any code.

  • @harneetlamba9512
    @harneetlamba9512 5 років тому

    Hi, In the above video, at 1:12 minutes - the pandas DataFrame is displayed in Tabular form, with all the variables separated by vertical line. But in latest jupyter notebook, we get a single line below variable name. Can we get the same display as earlier, with new Jupyter version ?

    • @dataschool
      @dataschool  5 років тому

      There's probably a way, but it's probably not easy. I'm sorry!

  • @cyl1040
    @cyl1040 4 роки тому +1

    I can solve the duplicate data from my CSV file~~~ Thank you.
    However, I suggest you can do more in this video. I think you can show after the delete result list. Such as:
    >> new_data=df.drop_duplicates(keep='first')
    >> new_data.head(24898)
    If you have to add it, I think this video will be more perfect~~~

  • @mariusnorheim
    @mariusnorheim 6 років тому

    How can I remove duplicate rows based on 2 column values?
    I want to drop a row if two column values are the same. E.g. I have one column with Country = [USA, USA, Canada, USA] and an income column with values = [1000, 900, 900, 900]. I only want to drop the duplicate where both the country AND the income is 900. While if one row has country = Canada and income = 900 and second row has USA with income 900 I want to keep them both. Answers appreciated!
    Your videos are really helpful for learning pandas. Keep up the good work!

    • @dataschool
      @dataschool  6 років тому

      Sorry, I'm not quite clear on what the rules are for when a row should be kept and when it should be dropped.
      Perhaps you could think of this task in terms of filtering the DataFrame, rather than using the drop duplicates functionality?

    • @mariusnorheim
      @mariusnorheim 6 років тому

      Thanks for the reply! I managed to improve my code to avoid the duplicates in the first place. Keep up your great work with the videos, really helpful for improving my skills!

    • @dataschool
      @dataschool  6 років тому

      Great to hear! :)

  • @balajibhaskarraokondhekar1823
    @balajibhaskarraokondhekar1823 3 роки тому

    You have done very Good jobs about under standing of DataFrame and make very easy to understanding DataFrame it so easy with the people which are working in excel
    Best wishes from me

  • @jessicafletcher0610
    @jessicafletcher0610 2 роки тому

    OMG I WANT TO THAT YOU SOOOO MUCH 😊I been on the problem for days and the way you explain it make so easy then how I learned in class. I was so happy not to see that error message 😂 Thank you

    • @dataschool
      @dataschool  Рік тому

      You're so very welcome! Glad I could help!

  • @deki90to
    @deki90to 3 роки тому +1

    HOW DO YOU KNOW WHAT I NEED? YOU ARE MY FAV TEACHER FROM NOW

  • @imad_uddin
    @imad_uddin 3 роки тому +1

    Thanks a lot. It was a great help. Much appreciated!

  • @mahdibouaziz5353
    @mahdibouaziz5353 4 роки тому

    you're amazing we need more videos in your channel

    • @dataschool
      @dataschool  4 роки тому

      I do my best! I've got 20+ hours of additional videos available to Data School Insiders at various levels: www.patreon.com/dataschool

  • @goldensleeves
    @goldensleeves 4 роки тому

    At the end are you saying that "age" + "zip code" must TOGETHER be duplicates? Or are you saying "age" duplicates and "zip code" duplicates must remove their individual duplicates from their respective columns? Thanks

  • @halildurmaz7827
    @halildurmaz7827 3 роки тому

    Clean and informative !

  • @alishbakhan1084
    @alishbakhan1084 2 роки тому

    Thank you so much💕 your videos are really amazing...can you tell how to read any csv(without header on first line) and set first row with non null values as header...

  • @rajoptional
    @rajoptional 4 роки тому

    Amazing and thanks bro , the right place for data queries

  • @chandrapatibhanuprakashap1862
    @chandrapatibhanuprakashap1862 2 роки тому

    It helps me a lot. Can you explain how do we get the count of each duplicated value.

  • @robind999
    @robind999 6 років тому

    simple and useful. thanks Kevin.

  • @jatinshetty
    @jatinshetty 4 роки тому

    Yo! You are a superb teacher!

  • @dandixon9466
    @dandixon9466 8 років тому

    Great work man!

  • @anthonygonsalvis121
    @anthonygonsalvis121 3 роки тому

    Very methodical explanation

  • @abdulazizalsuayri4908
    @abdulazizalsuayri4908 7 років тому

    full of useful info. Thanx man

  • @ravinduabeygunasekara833
    @ravinduabeygunasekara833 6 років тому +1

    Great video! Btw, how do you know all these stuff? Do you take classes or read books?

    • @dataschool
      @dataschool  6 років тому +6

      Work experience, reading documentation, trying things out, teaching, reading tutorials, etc.

  • @cafdo
    @cafdo 4 роки тому

    Great video. This helped me tremendously.
    How would you go about finding duplicates "case insensitive" with a certain field?

  • @peekayji
    @peekayji 7 років тому

    Great! Very well explained.

  • @prakmyl
    @prakmyl 4 роки тому +1

    Awesome videos Kevin. Thanks a to for the knowledge share.

  • @KaiZergTV
    @KaiZergTV 2 роки тому

    Thank you so much, you made my day. Finally i found the row of code, that i really needed to finish my task:)(Code Line 17)

  • @lindafl2528
    @lindafl2528 3 роки тому

    hello, thank you for the video, I'm wondering if you can make some tutorials about the API requests

    • @dataschool
      @dataschool  3 роки тому

      Thanks for your suggestion!

  • @ItsWithinYou
    @ItsWithinYou 3 роки тому

    If I have a datataframe with a million rows and 15 columns, how do I figure out if any columns in my dataframe has mixed data type?

  • @brianwaweru9764
    @brianwaweru9764 3 роки тому

    wait Kevin, keep=first means what is duplicated are the rows towards the bottom, meaning they have a much higher index. Keep= last means ?? Oh men am getting mixed up. Could someone please explain to me. Kevin,Please?

  • @emilyyyjw
    @emilyyyjw 4 роки тому +1

    Hi, I am wondering whether you could identify an issue that I am having whilst cleaning a dataset with the help of your tutorials. I will post the commands that I have used below:
    df["is_duplicate"]= df.duplicated() # make a new column with a mark of if row is a duplicate or not
    df.is_duplicate.value_counts()
    -> False 25804
    True 1591
    df.drop_duplicates(keep='first', inplace=True) #attempt to drop all duplicates, other than the first instance
    df.is_duplicate.value_counts() #
    -> False 25804
    True 728
    I am struggling to identify why there are still some duplicates that are marked 'True'?
    Kind regards,

    • @dataschool
      @dataschool  4 роки тому +1

      That's an excellent question! The problem is that by adding a new column called "is_duplicate", you actually reduce the number of rows which are duplicates of one another! Instead of adding that column, you should first check the number of duplicates with df.duplicated().sum(), then drop the duplicates, then check the number of duplicates again. Hope that helps!

  • @deltatv9335
    @deltatv9335 6 років тому

    Hey Buddy, You are amazing and you remind me of Sheldon Cooper (BBT) because of the way you talk and also both of you are super smart. :-)
    One request- Please cover outliers sometime. Thanks.

    • @dataschool
      @dataschool  6 років тому

      Ha! Many people have commented something similar :) And, thanks for your topic suggestion!

  • @JoshKelson
    @JoshKelson 5 років тому

    Trying to figure out how to replace values above/below a threshold with the mean or median. If I find values that are skewing the data from a column, but don't want to exclude the whole row and drop the row, I just want to replace the value in one of the columns with a mean/median value. Can't figure out how to do this! IE: I want to replace all values in column 'age' that are above 130 (erroneous data), with the mean age of all the other values in 'age' column.

    • @dataschool
      @dataschool  5 років тому

      I'm sorry, I don't know the code for this off-hand. However, this would be a great question to ask during one of my monthly live webcasts with Data School Insiders: www.patreon.com/dataschool (join at the "Classroom Crew" level to participate)

  • @somantalha4888
    @somantalha4888 2 роки тому +1

    beneficial videos. ❤

  • @jamesdoone3516
    @jamesdoone3516 8 років тому

    Really great gob. Thank you very much!!

  • @ayatbadayatbad7688
    @ayatbadayatbad7688 5 років тому

    Thank you for this useful tutorial. Quick question, how do you check whether a value in column A is present in column B or not; not necessarily on the same row. It is like the samething that VLOOKUP function looks for in Excel. Many thanks for your feed-back!

    • @dataschool
      @dataschool  5 років тому

      I'm not sure I understand your question, I'm sorry!

  • @engineeringlife2775
    @engineeringlife2775 Рік тому

    Bonus Question 7:55

  • @benogidan
    @benogidan 7 років тому

    cheers for this :) will definitely consider purchasing the package

    • @dataschool
      @dataschool  7 років тому

      You're very welcome! The pandas library is open source, so it's free!

    • @benogidan
      @benogidan 7 років тому

      sorry i meant on your website, the course ;)

    • @dataschool
      @dataschool  7 років тому

      Awesome! Let me know if you have any questions about the course. More information is here: www.dataschool.io/learn/

  • @rationalindian5452
    @rationalindian5452 3 роки тому +1

    Brilliant video .

  • @MrMukulpandey
    @MrMukulpandey 2 роки тому

    love to have more videos like this

    • @dataschool
      @dataschool  2 роки тому +1

      Thanks for your support!

  • @chandramohanbettadpura4993
    @chandramohanbettadpura4993 5 років тому

    I have some missing dates in my dataset and want to add the missing dates to the dataset. I used isnull() to track these dates but I don't know how to add those dates into my dataset..Can you please help.Thanks

    • @dataschool
      @dataschool  5 років тому

      You might be able to use fillna and specify a method: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

  • @asadghnaim2332
    @asadghnaim2332 3 роки тому

    When I use the parameter keep=False I get a number of rows less than the first and last combined what is the reason of that??

  • @omgthisana10
    @omgthisana10 8 місяців тому

    very well explained ty !

    • @dataschool
      @dataschool  8 місяців тому

      You're very welcome!

  • @DimasAnggaFM
    @DimasAnggaFM 5 років тому +1

    great video!!

  • @mansoormujawar1279
    @mansoormujawar1279 7 років тому

    Because of your quality panda series I started following you. @duplicate - in my use case instead of drop duplicate I would like to keep 1st instance and just remove other duplicate values from specific column, so shape will remain same after removing duplicate values from column. Really appreciate if you got some time to answer this, thanks.

    • @dataschool
      @dataschool  7 років тому

      Glad you like the series! I'm not sure I understand your question - perhaps the documentation for drop_duplicates will help? pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

  • @reazahmed7004
    @reazahmed7004 3 роки тому

    How do I access iPython Jupyter Notebook link? it is not available in the github repository.

    • @dataschool
      @dataschool  3 роки тому

      Is this what you were looking for? nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb

  • @sherlocksu1131
    @sherlocksu1131 8 років тому

    HI, when you mention the "inplace" in the video, I am happy that PD have this parameter for experiment, but a problem comes, should I rember all the method that have the inplace parameter ;and rember the method that affect the origial dataframe in case that I use the DF already change when doing the calculation.
    That is a hugh job to remove all the method that have 'inplace' parameter or doesnot have ,isn't it..... TOT

    • @sherlocksu1131
      @sherlocksu1131 8 років тому

      That is a huge

    • @dataschool
      @dataschool  8 років тому

      The 'inplace' parameter is just for convenience. I do recommend trying to memorize when that parameter is available. But if you forget, that's fine, because you can always write code like this:
      ufo = ufo.drop('Colors Reported', axis=1)
      ...instead of this:
      ufo.drop('Colors Reported', axis=1, inplace=True)

    • @sherlocksu1131
      @sherlocksu1131 8 років тому

      Is all inplace argument in method way default by "False"?
      My problem is that: I worry that somethimes the method change original dataframe by method that have "inplace parameter"; somethimes the method does not change original dataframe.
      so i confuse when it affect the original DataFrame , since the wrong judgemet might be lead to bad conclusion.

    • @dataschool
      @dataschool  8 років тому

      I think that 'inplace' is always False (by default) for all pandas functions.

  • @ajithtolroy5441
    @ajithtolroy5441 6 років тому

    This is what I want, thanks for sharing :)

  • @mmarva3597
    @mmarva3597 3 роки тому

    Thank you for this content! I have a question : how can we handle quasi redundant values in different columns ? (Imagine two different columns each containing similar values ​​at 80%). Thanks a lot

    • @dataschool
      @dataschool  3 роки тому

      When you say "handle", what is your goal? If you want to identify close matches, you can do what is called "fuzzy matching". Here's an example: pbpython.com/record-linking.html Hope that helps!

    • @mmarva3597
      @mmarva3597 3 роки тому

      ​@@dataschool Merci beaucoup for the reply. Let me explain my question : I have two variables/features named categories (milk, snack,pasta,oil,etc) and categories_en(en:milk , en:snack, en: pasta). My goal is to keep only one feature since both features share the same information. It was suggested that running a chi square test would help me decide which feature to keep but it seems silly to me :( ( I have almost 2millions records)

    • @dataschool
      @dataschool  3 роки тому +1

      It probably doesn't matter which feature you keep, if they contain roughly the same information.

  • @zhaoqilong1994
    @zhaoqilong1994 8 років тому

    is that any simple regular expression on python tutorial available?

    • @dataschool
      @dataschool  8 років тому +1

      For learning regular expressions, I like these two resources:
      developers.google.com/edu/python/regular-expressions
      www.pythonlearn.com/html-270/book012.html

  • @subuktageenshaikh2041
    @subuktageenshaikh2041 7 років тому

    Hi, I have a doubt how do i remove duplicates from rows which are text or sentences like in RCV1 data set.

    • @dataschool
      @dataschool  7 років тому

      The same process showed in the video will work for text data, as long as the duplicates are exact matches. Does that answer your question?

  • @artistz1831
    @artistz1831 6 років тому

    Hey Kevin, I am confused for the drop duplicates here: the number of duplicated age and zipcode is 14; but after your drop the duplicates, the shape is 927. The total shape is 943, so the correct shape should be 943 - 14 = 929? Thanks a lot for your help!!!

    • @dataschool
      @dataschool  6 років тому

      I disagree with your statement "the number of duplicated age and zipcode is 14"... could you explain how you came to that conclusion? Thanks!

  • @srincrivel1
    @srincrivel1 6 років тому +1

    you're doing god's work son!

  • @sagarbhadani1932
    @sagarbhadani1932 6 років тому

    Hi, need help. Suppose if we have table such as transaction contains atleast 1 common item in the item column. How to code which are the transactions having coffee atleast?
    Transaction Item
    1 Tea
    2 Cookies
    2 Coffee
    3 cookies
    4 Bread
    4 Cookies
    4 Coffee

    • @dataschool
      @dataschool  6 років тому

      I'm not sure off-hand, good luck!

  • @antonyjoy5494
    @antonyjoy5494 3 роки тому

    This is case of complete duplicates. So what should we do when we have to deal with incomplete duplicates..Ex age,gender and occupation same but zip is different..
    could you also make a video on that please..

  • @prakmyl
    @prakmyl 4 роки тому

    i get a error when i run users.drop_duplicates(subset=['age','zip_code']).shape . error "'bool' object is not callable" even i get the same error if i run users.duplicated().sum()

    • @dataschool
      @dataschool  4 роки тому +1

      Remove the .shape, and see what the results look like. Also, compare your code against mine in this notebook: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb

  • @anantgosai8884
    @anantgosai8884 3 роки тому +1

    That was so accurate, thanks a lot genius!

  • @asifsohail5900
    @asifsohail5900 3 роки тому

    How can we efficiently find near duplicates from a dataset?

  • @oasisgod1421
    @oasisgod1421 3 роки тому

    Great video. But I'd like just to find a duplicate column and then go to another column and find the duplicate and go to another column and find the duplicate and remain only one row with certain information.

  • @krzysztofszeremeta1125
    @krzysztofszeremeta1125 6 років тому

    how is the best way to compare data from tow file (in the same schema)

    • @dataschool
      @dataschool  6 років тому

      I don't know if there's one right way to do this... it depends on the details. Sorry I can't give you a better answer!

  • @KimmoHintikka
    @KimmoHintikka 7 років тому

    I had weird error with this one. Setting index col with index_col='user_id' does not work for me it raises KeyError: 'user_id' error. Instead I had to run users = pd.read_table('bit.ly/movieusers', sep='|', header=None, names=user_cols) first and then users.set_index('user_id') for this tutorial to work

    • @dataschool
      @dataschool  7 років тому

      Interesting! I'm not sure why that would be. But thanks for mentioning the workaround!

  • @da_ta
    @da_ta 6 років тому

    thanks for tips and bonus ideas

  • @hiericzhu
    @hiericzhu 7 років тому

    Hi, I have question here. I want to mark the continue duplicate value like this [1,1,1,0,2,3,2,4,2], my expected result is [True,True, True,False,False,False,False,...].
    But the pandas.duplicated(keep=False) returns
    [True,True,True,False,True,False,True,False,True], The function treat the '2' in 2,x,2,y,2,z,2 sequence as duplicated. but it is not I want. How to remove it? I just want to mark the 1,1,1 as true. thanks.

    • @dataschool
      @dataschool  7 років тому

      How about just using code like this:
      df.columnname == 1
      Does that help?

  • @arpitmittal7865
    @arpitmittal7865 4 роки тому

    very useful videos.. can you please tell me how to find duplicate of just one specific row?

    • @dataschool
      @dataschool  4 роки тому

      Sorry, I don't fully understand. Good luck!

  • @SahibzadaIrfanUllahNaqshbandi
    @SahibzadaIrfanUllahNaqshbandi 7 років тому

    Thanks for good channel. I like it very much.
    I have a query.
    I am working on tweets, I have to remove duplicate tweets as well as tweets which are different in at most one word.
    I can do first part, Will you please guide me how can I do the second part?? Thanks

    • @dataschool
      @dataschool  7 років тому

      That's probably beyond the scope of what you can do with pandas. Perhaps you can take advantage of a fuzzy string matching library.

    • @SahibzadaIrfanUllahNaqshbandi
      @SahibzadaIrfanUllahNaqshbandi 7 років тому

      Thanks...I will look into it.

  • @VNTHOTA
    @VNTHOTA 5 років тому

    You should have used sort_values option with users.loc[users.duplicated(keep=False)].sort_values(by='age')

    • @dataschool
      @dataschool  5 років тому

      Thanks for your suggestion!

  • @killaboody7889
    @killaboody7889 5 років тому +3

    you are amazing.
    thank you ever much

  • @duckthatgivesafuk8471
    @duckthatgivesafuk8471 5 років тому

    I really need help guys.
    I have a table that has a column : Column name - " Neighbourhood"
    This Column has A LOT of names repeated MANY times.
    To be specific, the column "Neighbourhood" has 10 Names that are repeated ALOT of times.
    My question is :
    I NEED HELP IN CREATING A SEPARATE COLUMN SPECIFYING HOW MANY TIMES EACH ELEMENT IN "NEIGHBORHOOD" HAS BEEN COUNTED.
    If anyone help me please.

    • @dataschool
      @dataschool  5 років тому

      I'm not positive this would work, but I might start by creating a dictionary out of value_counts, and then use that as a mapping for the new column. Anyway, I hope you were able to figure out a solution!

  • @Drivebyeasy
    @Drivebyeasy 7 років тому

    Hello I want to know the concept of ReSampling please help

    • @dataschool
      @dataschool  7 років тому

      I'm sorry, I don't have any resources to offer you. Good luck!

  • @syyamnoor9792
    @syyamnoor9792 6 років тому +1

    you are a hero...

    • @dataschool
      @dataschool  6 років тому

      That's very kind of you! :)

  • @muhammadbashar572
    @muhammadbashar572 7 років тому

    hi good afternoon. how do i remove different letter from values for example i have got column which contain customer income like J:10,000, P:50,000 . i want to make it like 10000,50000

    • @dataschool
      @dataschool  7 років тому

      You can use string methods to strip the first two characters, and then the astype function to change the type from string to integer. These videos might be helpful to you:
      ua-cam.com/video/bofaC0IckHo/v-deo.html
      ua-cam.com/video/V0AWyzVMf54/v-deo.html
      Good luck!

  • @Animesh19007
    @Animesh19007 5 років тому

    How to keep rows that contains null values in any column and remove completed rows?

    • @dataschool
      @dataschool  4 роки тому

      Does this help? ua-cam.com/video/fCMrO_VzeL8/v-deo.html

  • @ashishacharya8427
    @ashishacharya8427 7 років тому

    replace similar duplicate values with one of the values how to solve it??

    • @dataschool
      @dataschool  7 років тому

      I think the process would depend a lot on the particular details of the problem you are trying to solve.

  • @harshitagrwal9975
    @harshitagrwal9975 Рік тому

    user id are not same then how it can be duplicated?

  • @maheshaknur
    @maheshaknur 7 років тому

    Thanks for this video :)
    How can we remove
    duplicates,delete columns,delete rows and insert new columns using python script ?

    • @dataschool
      @dataschool  7 років тому

      Glad you liked the video! This video shows how to remove rows or columns: ua-cam.com/video/gnUKkS964WQ/v-deo.html
      Does that help to answer your question?

  • @sasa4840
    @sasa4840 6 років тому

    Thanks my question how we can sort months name

    • @dataschool
      @dataschool  6 років тому

      This video might be helpful to you: ua-cam.com/video/yCgJGsg0Xa4/v-deo.html

  • @muralikrishnapolipallivenk2572
    @muralikrishnapolipallivenk2572 7 років тому

    Hi I am big fan of you work, and I have learned a lot from the videos, can you please help me on how can I use
    v-lookups of excel in pands

    • @dataschool
      @dataschool  7 років тому

      This might help: medium.com/importexcel/common-excel-task-in-python-vlookup-with-pandas-merge-c99d4e108988
      Good luck!

  • @oszi7058
    @oszi7058 5 років тому

    You are amazing!

  • @bharatin1331
    @bharatin1331 4 роки тому

    How to Remove Leading and Trailing space in data frame

  • @sujaysonar8425
    @sujaysonar8425 5 років тому

    Thanks for the video

  • @Ishkatan
    @Ishkatan 2 роки тому

    Good lesson, but the datatype has to match. I found I had to process my pandas tables with .astype(str) before this worked.

  • @grijeshmnit
    @grijeshmnit 5 років тому +1

    💯+ like. Thank you very much sir.