How to use groupby() to group categories in a pandas DataFrame

Поділитися
Вставка
  • Опубліковано 17 лис 2024

КОМЕНТАРІ • 147

  • @ShiladityaBiswasNow
    @ShiladityaBiswasNow 3 роки тому +38

    Thanks a lot! You saved me days! I'm literally crying rn. So pricise and to the point. Love the content

    • @ChartExplorers
      @ChartExplorers  3 роки тому +3

      I'm glad it helped! Groupby was always a sore spot for me learning, but now that I know it I use it all the time.

  • @DuniyaJahan1
    @DuniyaJahan1 2 роки тому +2

    🙏🙏🚩🚩🙏🙏Truly sir great lecture I had been trying to understand group by in pandas since last 25 days, but no-one was able to clear my confusion. But you sir explained me brilliantly and I am really so obliged of you. Thanks and I subscribed you and share on Facebook page, from Banaras City, India 😄😄😄🙏🙏🙏🙏🙏🙏

  • @rashadm.sadigov4366
    @rashadm.sadigov4366 Рік тому

    Dude thank you sooo much. Finally someone with proper english explained things properly

  • @lightningmi
    @lightningmi 2 роки тому +3

    Good step by step tutorial. But one thing you missed by Groupby multi columns, and apply different aggregate function. example: [column A, column B] A=sum, B=average. something like that

  • @crystalchaung1576
    @crystalchaung1576 2 роки тому

    I had to watch this a couple times too hear that part around 4:18 about why groupby will only return those who survived. It is good you added that. Now that I understand that, I can take a shot at age groups for the Titanic.

  • @athief
    @athief 2 роки тому

    It's great to have a 5-min quick & dirty dive, but a couple more seconds here and there to say that "agg" means "aggregate", that if we want more than one column summarised we must provide a list (hence the double brackets), etc. It provides a simple explanation that facilitates memory.

  • @Aleqsie
    @Aleqsie 10 місяців тому

    ok this is a mad comprehensive information that is explained amazingly briefly and clearly within just 7 min.

  • @imad_uddin
    @imad_uddin 3 роки тому +4

    I have seen three of your videos so far, all were very well thought out. Really helpful. You deserve many more subscribers!

  • @sgerodes
    @sgerodes 3 роки тому +3

    Brilliant. It had exactly what i needed. Multiple groups and the splitting trick

  • @skye5107
    @skye5107 11 місяців тому

    Thanks a lot i am searching this in entire weeks on articles.

  • @jackfarah7494
    @jackfarah7494 10 місяців тому

    Simple and informative i love this video and am saving it for future references! Thank you!

  • @mrb7931
    @mrb7931 Рік тому

    Thanks a lot! You saved me day , now i can calculate mean by categorizing datasets

  • @blueciel_03
    @blueciel_03 10 місяців тому

    Thanks a lot, it's really informative for my upcoming exam.

  • @lawngreenlyp
    @lawngreenlyp 3 роки тому

    This is a very good video for explanation. Thanks so much from Hong Kong.

  • @Monkeysal07
    @Monkeysal07 3 роки тому

    THANK YOU!!! that last tip is a life saver

  • @saisarath623
    @saisarath623 2 роки тому +1

    Really helpful tricks. Thank you!

  • @tonianibal7585
    @tonianibal7585 2 роки тому

    Thank you very much for sharing! It really helped me, was exactly what I was looking for. People like you are blessed ang good people helping to develop this world! I just subscribed, follow and will share in my groups!

  • @rohitekka2674
    @rohitekka2674 3 роки тому +1

    concise, short , illustrious!! Thanks alot!!!

  • @carolinamalosabastos2648
    @carolinamalosabastos2648 11 місяців тому

    Great video! so clear... It helps me a lot! Tks from Brazil!)

  • @zebramc3693
    @zebramc3693 Рік тому

    Thank you for your detailed demonstrations.

  • @InteligenciadeNegocios
    @InteligenciadeNegocios 2 роки тому

    This is one of the best videos EVER! really helpfull! Thanks a LOT!

  • @XuanTran-ri1hn
    @XuanTran-ri1hn 2 роки тому +4

    Hi. Thank you for your video. May I ask how do you know exactly that which age group is divided to which bin? Although these ages are put into 3 bins but I am unclear which exact age which bin contains? For example: what age range for 'young' in this case?

    • @JopieSchaft
      @JopieSchaft 2 роки тому +1

      ​@Adeel KhanI can think of 3 approaches to this:
      - Group by age_bins, then take the minimum and maximum age: df.groupby(['age_bins']).['age'].agg(['min', 'max'])
      - Use retbins=True in the pd.cut() function; I think retbins returns the bounds of your bins.
      - Define the bins yourself, i.e. bins=[0, 20, 60, 120] (instead of bins=3 as in the video) will divide the passengers into a 60 bin

  • @youknownothing_
    @youknownothing_ Рік тому

    great video. it would be great if you also provide the link for the notebook

  • @afonsoosorio2099
    @afonsoosorio2099 2 роки тому

    Awesome 👌. Clear crystal 🔮.
    I specially like the bin trick, straightforward. That is really amazing 👏 😍. I had to break into intervals using numpy select ( ) or user defined function with apply ( ) to get the same result with the bin method.
    Keep it up.

  • @aishwaryapattnaik3082
    @aishwaryapattnaik3082 2 роки тому +1

    Just what we needed . Awesome content 🙌🏼

  • @andrenevares7543
    @andrenevares7543 2 роки тому

    Great explanation! Good JOB! Thumbs up!

  • @ssrwarrior7978
    @ssrwarrior7978 3 роки тому

    wow, u made it easy for me and saved lot of time.. THANK YOU

  • @fashaikh5339
    @fashaikh5339 3 роки тому +1

    VERY CLEAR , PLEASE IF YOU CAN EXPLAIN HOW DOING INTERSECTION IN CASE WE HAVE (ONE -TO -MANT) RELATIONAL DATA BASE ?. THANKS

  • @denisml42
    @denisml42 2 роки тому +4

    Thanks for the great video. Im wondering about how you could group the ages in intervals of 10 years. I feel like you probably wouldnt use cut for that since you would need to know the highest / lowest age in order to determine how many cuts you need. Do you have a recommendation on how to do that?

  • @MohsinAli-yd9js
    @MohsinAli-yd9js 3 роки тому +2

    at 5:39. in setting labels for 'age_bins' how did it get to know that from which age group is young, which one is middle and old. like you did not set the parameters from 0 to 20 for young, 21 to 60 for middle and above 60 for old. or either it does it implicitly.

    • @JopieSchaft
      @JopieSchaft 2 роки тому

      Using bins=3 as a parameter to the pd.cut() function automatically divides the group into 3 equally sized categories. See my comment to Xuan Tran for an explanation of how you can find out what it does or what you could do differently.

  • @vitorribeirosa
    @vitorribeirosa Рік тому

    Neat and objective!!!
    Thanks for sharing. I do appreciate your content.

  • @ThanhVo-zs7ns
    @ThanhVo-zs7ns 2 роки тому

    Very good and funny videos bring a great sense of entertainment!

  • @ericc1317
    @ericc1317 2 роки тому +1

    The as_index=0 tip is great! When doing this with .count() instead of sum, like for example I’m doing a project with the code format Df.groupby([‘x’][‘y’],as_index=False)[‘y’].count(), is there any way to keep the original y column along with the new y “count” column in a resulting data frame? With this method it replaces the original y with the count of y.

  • @VRUNO
    @VRUNO 2 роки тому

    you got a new follower Sir!
    really clear, really good explained, God, finally I understand :D thanks so much!

  • @Jitendrakumar-du1ng
    @Jitendrakumar-du1ng 2 роки тому +1

    thanks for the great video, it really helped me.

  • @onurkoc6869
    @onurkoc6869 2 роки тому

    you are telling very well proffessor:))

  • @mohamedfawzy5453
    @mohamedfawzy5453 Рік тому

    Great explanation! Thank you.

  • @TheShrikhande
    @TheShrikhande 3 роки тому +1

    What if I have a dataframe with two date columns (start-date, end-date) along with other attributes and I wish to create bins for each year incorporating both those date columns.
    How do you think I can manage to do that?

  • @pazenriqueguillermo
    @pazenriqueguillermo 2 роки тому +1

    Great Video! One question... Let say you do like the first example, group survivers by class and sum(), but I want the result sorted in a descending order ( the class with most survivers to the least...) How would you do that?

    • @coledd9487
      @coledd9487 2 роки тому +1

      .sort_values(ascending=False)

  • @coledd9487
    @coledd9487 2 роки тому +1

    Hey there, for some reason when i try doing Single Group, Multiple Columns (like in 2:19), I keep getting an error basically stating that it thinks my 'fare' column is filled with strings - as opposed to floats. As such, I can't do sum/mean/numeric methods on that data.
    I can't seem to get around it.

    • @ChartExplorers
      @ChartExplorers  2 роки тому

      Hey Cole DD, sometimes when you read in your data pandas thinks the data is a string even though it should be integers or floats. This video here ua-cam.com/video/evKYySLSzyk/v-deo.html discusses how to convert datatypes of columns and some common problems that you may run into when doing so. Let me know if that works.

  • @ZirothTech
    @ZirothTech 2 роки тому

    Great video, thanks!

  • @hansrc4469
    @hansrc4469 2 роки тому

    When I use groupby for multiple columns like you did, it show me a message that used list instead of square brackets.

  • @tinayesibanda3070
    @tinayesibanda3070 Рік тому

    How can I combine groupby then do distinct count on one of the cat column then sum on some of the numeric column

  • @rohanbangash5827
    @rohanbangash5827 2 роки тому

    How would we put the result of a groupby function as a column in our dataframe?

  • @bnadir3930
    @bnadir3930 2 роки тому

    Great video ! how can I get max() value grouped by column and yet get the intire dataframe colums to be presented ?

  • @nivviyer_
    @nivviyer_ 2 роки тому +1

    Thank you so much sir !!

  • @osoriomatucurane9511
    @osoriomatucurane9511 Рік тому

    Hi Bradon, Awesome tutorial. 4:41, survived by class, mean and sum. Proportion would have been more meaningful. How to get percentagem there, I mean the proportion of survived (survived rate) by class. Using transform?????
    For aggregation only allowed sum, mean, count,......

  • @maxons.e4643
    @maxons.e4643 2 роки тому

    How do you sort the data when different conditions are involved in the groupby?

  • @gabriellopes0
    @gabriellopes0 Рік тому

    Great explanation!

  • @febriannuralam4760
    @febriannuralam4760 26 днів тому

    i love it, keep it up mate

  • @ibrar6121
    @ibrar6121 Рік тому

    In the Quick Tip Section, How did the program know that 29 is Middle_age, 2 is Young_age and 50 is old???

  • @michaelcruz1322
    @michaelcruz1322 3 роки тому +1

    How did python determine which age_bin to place the individual into? You never specified the age-ranges associated with the categories?

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      Hi Michael, good question. The age bins was were grouped with the pandas cut method. By default the cut method will turn continuous data into categorical data by grouping it into three bins (you can specify how many bins you want - but if you don't it will make three bins). So if you have 12 values it will create three bins with 4values in each bin. pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

    • @Monkeysal07
      @Monkeysal07 3 роки тому

      Maybe this will allow you to specify the ranges of the bins. The length of the labels have to be -1 inferior with respect to the length of the bins
      df['age_cat'] = pd.cut(df['age'],
      bins=[x for x in range(0,100, 5)],
      labels=[x for x in range(5,100, 5)],
      right=True)

  • @AIdevel
    @AIdevel 2 роки тому

    I have a problem it keeps giving me keyError it doesn’t identify the name of the columns how can I solve it ? Please help me

  • @mohamedkhaled902
    @mohamedkhaled902 Рік тому

    Very helpful , keep it up ❤

  • @kiko1955
    @kiko1955 2 роки тому

    Como hago un grafico con el resultado de un groupby.
    How do I make a graph with the result of a groupby?

  • @nurshibumi
    @nurshibumi 2 роки тому

    thank u for your time and exertion!
    i have a question, i have a dataset, there are a few columns in it including "Fuel_Type". Fuel types are petrol, diesel and CNG. all i want is to group by the fuel_type and store the copy of datasets in variables both petrol and diesel. how can I do that, i have been searching for hours :))) pls answer me

  • @AimarZayyan
    @AimarZayyan 3 роки тому

    Hi, how do i get with specific value column pclass sum for ex : 1 only

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      I'm not sure I understand your question. Are you looking to filter the dataframe so that only pclass = 1 is contained in the dataframe? You could use a boolean mask pclass1 = df[df['pclass'] == 1]. If that's what you are looking for you can check out this video on filtering which I think you will find helpful ua-cam.com/video/ni9ng4Jy3Z8/v-deo.html

  • @govindrajput8503
    @govindrajput8503 2 роки тому

    hi thanks for this. How do I show group by results for more than one variable with more than one aggregate function without the index. so basically mulitple groups as columns + aggregated on more than one function

  • @ahovebismark4001
    @ahovebismark4001 2 роки тому

    so please, I need a personal favor, I need to make labels for a plot I generated from a groupby method, any help with that?

  • @jakobstigsson9687
    @jakobstigsson9687 2 роки тому

    Hey, thanks for the video. I have a dataframe that has a column with 0-4 in value, but I wish to group it by 0 and then 1-4. How would that be possible? Is it a big difference?

  • @pritisingh2432
    @pritisingh2432 3 роки тому +1

    Hey I'm having problem in groupby as it is giving Data error and No numeric type to aggregate. Could you please help ?

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      Hi Priti, will you run df.dtypes and let me know if there are any numeric (float or int) datatypes in your dataframe? If they are all objects check out this video on how to convert objects into numberic values ua-cam.com/video/evKYySLSzyk/v-deo.html (hopefully that will solve your problem. If this doesn't solve your problem will you copy and past your groupby statement and send it to me please?

    • @pritisingh2432
      @pritisingh2432 3 роки тому

      @@ChartExplorers # Visualize Churn Rate by Gender
      plot_by_gender = churn_dataset.groupby('gender').Churn.mean().reset_index()
      plot_data = [
      go.Bar(
      x=plot_by_gender['gender'],
      y=plot_by_gender['Churn'],
      width = [0.3, 0.3],
      marker=dict(
      color=['orange', 'green'])
      )
      ]
      plot_layout = go.Layout(
      xaxis={"type": "category"},
      yaxis={"title": "Churn Rate"},
      title='Churn Rate by Gender',
      plot_bgcolor = 'rgb(243,243,243)',
      paper_bgcolor = 'rgb(243,243,243)',
      )
      fig = go.Figure(data=plot_data, layout=plot_layout)
      po.iplot(fig)
      This is giving me the error .Can you suggest an alternative

  • @rajibroy1170
    @rajibroy1170 Рік тому

    You are a savior

  • @premprakash6863
    @premprakash6863 2 роки тому

    I want to group by on mobile number and want to merge messages received, how can i do that?

  • @czr372
    @czr372 Рік тому

    Saved me looots of hours haha! thanx!

  • @pramishprakash
    @pramishprakash Рік тому

    Great video sir

  • @ericzheng4815
    @ericzheng4815 2 роки тому

    When trying out this example: df['age_bins'] = pd.cut(df['age'], 3, labels=('young','middle_age', 'old')), I got a error returned. TypeError: can only concatenate str (not "float") to str. I don't know why. I looked at the manual, the code seems good to me.

  • @javierclement3047
    @javierclement3047 Рік тому

    It seems to me like this function doesn’t really need to exist. I feel like I could make all of these manipulations relatively easily with Boolean operations.
    Can someone explain the advantage of using groupby()? Because it’s easier? Or is there something I’m missing?

  • @fashaikh5339
    @fashaikh5339 3 роки тому +1

    I have data frame contains three columns, one for restaurants_id , the second for his categories (one or plus categories) and the third column is for his zone. I need to calculate for each restaurant how many restaurants in his zone that share this restaurant in one category at least, and put the result in a new column ?

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      Hi F Ashaikh, is it possible for you to email me your data (or provide me with some made up data that is similar to the data you have). That will help me see what is going on a little better. My email is bradonvalgardson@gmail.com

    • @fashaikh5339
      @fashaikh5339 3 роки тому

      I did , thank you very much for your help.

  • @aliyananwar3727
    @aliyananwar3727 2 роки тому

    I came here to understand concept of groupby but left with emotions we men sacrificed. 🥺

  • @sebastianperalta4775
    @sebastianperalta4775 3 роки тому

    Thanks for the video.

  • @MagnusAnand
    @MagnusAnand 3 роки тому

    excellent tutorial

  • @danielrico3352
    @danielrico3352 2 роки тому

    Thanks for the video! I have a question. If you want to select one specific biological sex, How could I write that code? For example just females.
    df.groupby(["pclass", [sex] == female])["survived].sum()
    It would be right to write it like this?
    Thanks in advance!

  • @ainahannani4489
    @ainahannani4489 3 роки тому

    How do I make a poisson distribution of a groupby column?

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      I'm not sure. I would need to see your data and know more context to better understand what you are trying to accomplish.

  • @houndofjustice5
    @houndofjustice5 3 роки тому +1

    Hello is there any way to put all values in their column depending on their index if value i m trying to group by is lets say Switzerland and it has multiple Happiness ratings for each year how do i put all ratings in same column for each year but just seperate them by comma without summing them up?

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      Great question Ivan. Try this out and see if it works for you.
      First I create a dictionary of data with 3 different countries and some happiness scores.
      Then I create a DataFrame with this data.
      The I use groupby function to group each country and then use apply(list) to create a list of all the values in each group.
      data_dict = {'country':['country_1','country_2','country_3','country_1','country_',
      'country_2','country_3','country_2','country_3','country_1, 'happiness':[3,1,3,5,7,4,1,2,3,4]}
      df = pd.DataFrame(data_dict)
      df_grouped = df.groupby('country'['happiness'].apply(list)

    • @houndofjustice5
      @houndofjustice5 3 роки тому

      @@ChartExplorers thank you for swift answer i managed to do it for one column but i m trying to do it for multiple columns basically just uniting rows with same country values but seperate them with comma its working when i do it for happiness score but if i try to add happiness rank it just throws out happiness score and happiness rank not values just those strings i tried as list but yea still not working
      I did it with this code which works for Happiness Score:
      frame.groupby(['Country'])['Happiness Score'].apply(lambda x:' , '.join(x.astype(str))).reset_index()

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      @@houndofjustice5 I think I see what you are asking. So you want to groupby country and then list out all the values for that country in the happiness and rank columns.
      Let me know if this works. If not, I am setting up a discord server for Chart Explorers. That might be a better medium for problem solving.
      # Example Data
      data_dict = {'country':['country_1','country_2','country_3','country_1','country_1',
      'country_2','country_3','country_2','country_3','country_1'],
      'happiness':[3,1,3,5,7,4,1,2,3,4],
      'rank':[1,2,3,4,5,6,7,8,9,10]}
      df = pd.DataFrame(data_dict)
      # groupby with list for multiple columns
      df_grouped = df.groupby('country')[['happiness','rank']].agg(lambda x: list(x))

    • @SudhirKumar-ry4gk
      @SudhirKumar-ry4gk 3 роки тому

      Please help as I have data of employees in which they did multiple sale, I want if any employee did sale more the 50000 againt it each emp I'd of that person print excellent rest low.
      Like
      Emp I'd. Sale status
      Emp1001 5000. Excellent
      Emp1001 45000. Excellent
      Emp1001 2000. Excellent
      Emp1002 5000. Low
      Emp1003 2500. Low

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      Hi @@SudhirKumar-ry4gk, so you are wanting to group by employee Id and for employees that had sales greater than $50,000 mark them as excellent otherwise mark them as low? Is that correct?

  • @russellmubaya2662
    @russellmubaya2662 3 роки тому

    Can we then plot a graph of any sort using the generated table we've just grouped ?
    @Chat Explorers

  • @shaikhjunaid8693
    @shaikhjunaid8693 2 роки тому

    Sir how will you solve the problem when you have to determine who are the top5 highest rated players for every position in fifa dataset?

    • @YoungerLei
      @YoungerLei Рік тому

      Hi, it might be fifa.groupby(by='position').apply(lambda group: group.sort_values(by='rate', ascending=False').head(n=5)

  • @MachineLearningPro
    @MachineLearningPro 11 місяців тому

    Great video

  • @paar6128
    @paar6128 Рік тому

    Waow, your're amazing man :))

  • @yili6498
    @yili6498 2 роки тому

    very clear, thxxx

  • @MatthieuKhairallah
    @MatthieuKhairallah Рік тому

    Thanks a lot!

  • @VKRealsta
    @VKRealsta 2 роки тому

    Thanks by heart

  • @crunchnos
    @crunchnos 3 роки тому

    Thank you so f much!

  • @shoaibsoomro
    @shoaibsoomro 2 роки тому

    at 5:54 while applying pd.cut did not work for me it gives error
    TypeError: can only concatenate str (not "float") to str
    Solution: used the two lines that solved the issue.
    df['age'] = df['age'].replace('?',0) #clean data
    df['age']=df.age.astype('float64') #convert data type to float

  • @srideviponmalarp
    @srideviponmalarp Рік тому

    Can you send dataset

  • @jaskaransingh3200
    @jaskaransingh3200 Рік тому

    Nice. helpful

  • @richarda1630
    @richarda1630 3 роки тому +1

    nice ! thanks :)

  • @pursh2002
    @pursh2002 3 роки тому

    # function that groups data by attribute1 and calculates per-group statistics for attribute2
    mean and count , how do we make a function for this
    def get(data, attr1, attr2, statistic):

    • @ChartExplorers
      @ChartExplorers  3 роки тому

      Hi Pursh, I'm not sure if I understand exactly what you are trying to accomplish.
      Are you trying to obtain the mean and count on groups based on multiple columns/attributes?
      df.groupby(['pclass','sex], as_index=False)['survived'].agg(['mean','count'])
      If this is the case I'm not sure the purpose of creating a function to do this.

  • @isaacenobun6370
    @isaacenobun6370 3 роки тому

    Thanks man

  • @apz9022
    @apz9022 3 роки тому

    I have a dataframe that has around 20 columns and 800 rows. One column contains multiple duplicate information that I am using as the group, and based on one of the other columns I want to filter the dataframe to show unique values based on the highest number of this column using max(). I still want to retain all of the other columns and end up with a dataframe that contains these unique values including the original columns.
    group = df_UE5_Compatability_info.groupby('lookup')['Function Count'].max()
    where "lookup" is the column I want to group by (containing multiples of the same value) and filter to show the rows with the highest number for "Function Count", how do I make the dataframe contain the other remaining columns associated with the resultant rows determined by the groupby? I am struggling. Difficult to describe in words.. sorry

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      Hi Alan, you did a great job explaining thanks providing me an example of what you have done. 😀 If I'm understanding correctly (please correct me if I'm wrong), you have 1 column that contains categories and you want to get the max value for each of those categories in every column that you have (using groupby).
      Here is a simple example I made that will get the max value for every column in the dataframe based on the groups in Col_4.
      import pandas as pd
      # Create practice df
      df = pd.DataFrame({'Col_1':[1,2,3,4,5],
      'Col_2':[6,7,8,9,10],
      'Col_3':[11,12,13,14,15],
      'Col_4':['Group_1','Group_2','Group_1','Group_1','Group_2']
      })
      # groupby Col_4 (in your case use lookup)
      group = df.groupby('Col_4').max()
      group.head()
      You will notice here, instead of adding a list of columns to perform the groupby function on I excluded it. This will perform the operation on all the columns. In your example, you should be able to do the following to get your answer:
      group = df_UE5_Compatability_info.groupby('lookup').max()

    • @apz9022
      @apz9022 3 роки тому

      @@ChartExplorers Thanks for the reply. Below is a sample dataset (made up) to try and better explain and one that is more representative to my actual dataset.
      df = pd.DataFrame({'lookup':['abc123','abc124','abc123','abc125','abc125'],
      'Supported':['no','yes','no','yes','yes'],
      'Percentage':[0.9,0.6,0.6,0.7,0.6],
      'Number of features':[1,6,10,8,11],
      'Platform':['Release 1.0','Release 1.0','Release 2.0','Release 1.0','Release 2.0']
      })
      The output should look like the following:
      lookup Supported Percentage Number of features Platform
      0 abc123 no 0.9 1 Release 1.0
      1 abc124 yes 0.6 6 Release 1.0
      2 abc123 no 0.6 10 Release 2.0
      3 abc125 yes 0.7 8 Release 1.0
      4 abc125 yes 0.6 11 Release 2.0
      Column "lookup", Row 0 and 2 are common values, as are rows 3 and 4.
      My goal is to have one row per value in column "lookup", filtered on the highest value in column "Number of features" and all other columns values for the selected row should be shown in the output data frame.
      Using the following group = df.groupby('lookup').max() creates:
      Supported Percentage Number of features Platform
      lookup
      abc123 no 0.9 10 Release 2.0
      abc124 yes 0.6 6 Release 1.0
      abc125 yes 0.7 11 Release 2.0
      But the percentage is wrong for rows abc123 and abc125, as its has included the highest percentage in each of the groups. My desired result is as follows:-
      abc123 no 0.6 10 Release 2.0
      abc124 yes 0.6 6 Release 1.0
      abc125 yes 0.6 11 Release 2.0
      where values for columns "Supported', 'Percentage' are taken "as-is' from the dataframe row that contains the row with the highest "Number of features'
      In my script I am using group = df.groupby('lookup')['Number of features'].max() which returns the following, but I am missing the other columns, in this example Supported, Percentage and Platform.
      lookup
      abc123 10
      abc124 6
      abc125 11
      Also, if I try to save the dataframe to csv, I only get the following
      Number of features
      10
      6
      11
      I would have expected to have this csv output?
      lookup Number of features
      abc123 10
      abc124 6
      abc125 11
      Thanks again.. and I hope this is more descriptive?

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      @@apz9022 thanks for providing the example, that clarifies things a lot. If you use the same dataframe you created in your example you should be able to use the following code:
      new_df = pd.DataFrame(pd.DataFrame(columns=df.columns))
      for item in df['lookup'].unique():
      temp_df = df[df['lookup']==item]
      row = temp_df[temp_df['Number of features'] == temp_df['Number of features'].max()]
      alist.append(row)
      new_df = pd.concat([new_df, row], ignore_index=True)
      new_df
      Sadly, this uses a for loop. There might be another way to do this would avoid the for loop (I need to work on it a little more to get it to work - I'll let you know if I get it to work). I'm also going to look into groupby a little more. There are some cool things you can do with groupby, but this has several constraints that I do not think groupby will support. With 800 rows and 20 columns performance should not be an issue (but it's always nice to squeeze as much performance out as possible just for fun!).
      Hope this works. Let me know.

    • @apz9022
      @apz9022 3 роки тому

      @@ChartExplorers Thanks.. what is "alist.append" ? I get an error stating "alist" is not defined?

    • @apz9022
      @apz9022 3 роки тому +1

      @@ChartExplorers Thanks.. updated my code and its working like a charm! Thanks. One point, alist.append(row) did not work for me? I have left it out and it still seems to work. What does this do?

  • @brainwaves2389
    @brainwaves2389 3 роки тому +1

    thanks

  • @souravde2283
    @souravde2283 3 роки тому +1

    Awesome.

  • @azrflourish9032
    @azrflourish9032 3 роки тому

    why '?' is needed while reading a csv file??

    • @ChartExplorers
      @ChartExplorers  3 роки тому +1

      Good question, I should have explained this in the video. In the csv file missing data is represented with '?'. When we read in missing data into pandas we can tell it that missing data is represented by then pandas will treat it as a missing value rather than getting confused.

    • @azrflourish9032
      @azrflourish9032 3 роки тому

      @@ChartExplorers oh, thank you (^ ^)

  • @marchanselthomas
    @marchanselthomas Рік тому

    to the point!

  • @jha6783
    @jha6783 Рік тому

    how do you know what is young, middle_age or old. This is not defined.

  • @laychansethaaerd
    @laychansethaaerd 3 роки тому +1

    Perfect

  • @mohammadmfd682
    @mohammadmfd682 3 роки тому

    very good

  • @shekharmandal4569
    @shekharmandal4569 Рік тому

    goat

  • @Abdullah_Alhathloul
    @Abdullah_Alhathloul 8 місяців тому

    nice

  • @abhishekpanda85
    @abhishekpanda85 8 місяців тому

    simpler way to explain things...

  • @xowp.
    @xowp. 2 роки тому

    i love u