When should I use a "groupby" in pandas?

Поділитися
Вставка
  • Опубліковано 30 вер 2024
  • The pandas "groupby" method allows you to split a DataFrame into groups, apply a function to each group independently, and then combine the results back together. This is called the "split-apply-combine" pattern, and is a powerful tool for analyzing data across different categories. In this video, I'll explain when you should use a groupby and then demonstrate its flexibility using four different examples.
    SUBSCRIBE to learn data science with Python:
    www.youtube.co...
    JOIN the "Data School Insiders" community and receive exclusive rewards:
    / dataschool
    == RESOURCES ==
    GitHub repository for the series: github.com/jus...
    "groupby" documentation: pandas.pydata.o...
    "agg" documentation: pandas.pydata.o...
    "plot" documentation: pandas.pydata.o...
    == LET'S CONNECT! ==
    Newsletter: www.dataschool...
    Twitter: / justmarkham
    Facebook: / datascienceschool
    LinkedIn: / justmarkham

КОМЕНТАРІ • 637

  • @coolfooly
    @coolfooly 6 років тому +108

    Seriously this was an amazing explanation. I cant like it enough.

  • @krishnendusaha5940
    @krishnendusaha5940 6 років тому +46

    Great video. It's better than many of the so- called great courses on the internet! Cheers!

  • @zohaibhasan5061
    @zohaibhasan5061 9 місяців тому +2

    I am a beginner to pandas. Amazing videos that were suggested by a friend. I have a doubt where I am stuck. For me in Python 3 drinks.groupby('continent').mean() does not work. min, max and count works but this does not (not without specifying 1 or more column names)

    • @dataschool
      @dataschool  8 місяців тому +2

      Excellent question! In the current version of pandas, if a DataFrame contains non-numeric data and you want to calculate the mean of all numeric columns after a groupby, you have to include the argument numeric_only=True. Thus: drinks.groupby('continent').mean(numeric_only=True). Hope that helps!

    • @zohaibhasan5061
      @zohaibhasan5061 8 місяців тому

      @@dataschool Yes it sure helps!!! Thanks for pointing out this argument

  • @omkarkulkarni9202
    @omkarkulkarni9202 8 років тому +6

    %matplotlib inline was answer to my 1 hour of headache to search why the hell plot() is not working! :P

    • @dataschool
      @dataschool  8 років тому +2

      Ah yes! That is very important, and easy to miss :)

  • @shiyunjiang8085
    @shiyunjiang8085 5 років тому +2

    Can I ask how to calculate the group by aggregation and only count unique number?Thanks ( BTW YOU ARE AMAZING)

    • @dataschool
      @dataschool  5 років тому

      Thanks for your kind words! As for your question, could you clarify? I don't quite understand. Thanks!

  • @shahzan525
    @shahzan525 4 роки тому +3

    Your content is amazing , I visited some others channel , they are not taught like you,
    I see this video in suggestions & I learn perfectly how to use groupsby and where to use ...
    Thanks man........

  • @mitchellyula4447
    @mitchellyula4447 2 місяці тому +1

    Thank you so much! I never used the groupby function until now and this helped me complete my pandas assignment! very clear explanation and your videos have helped me a ton!

  • @soumenhalder4831
    @soumenhalder4831 4 роки тому +3

    Me a Ph.D. student in experimental high energy physics getting spirit from all these tricks, and avoiding some boring tasks. Thank you very much, sir.

  • @jakeb2553
    @jakeb2553 4 роки тому +2

    What do you recommend if I want to group by conditionals? Let's say I have a column "Blood Pressure" and I want to group by values 120.

  • @bobvance9519
    @bobvance9519 Рік тому +1

    Could you use the "for each" formulation when you're grouping by multiple categories?

    • @dataschool
      @dataschool  Рік тому

      Great question! Yes, you would say "for each combination of these categories".

  • @sajeeb4
    @sajeeb4 5 років тому +2

    Group By multiple column
    data_frame.groupby(by=['continent','country']).beer_servings.mean()

  • @Jettiesburg
    @Jettiesburg 7 років тому +5

    Thanks, top video!
    Only suggestion would be to expand on this with some strategies for creating a new DF with the grouped/agg values.
    Thanks again

    • @dataschool
      @dataschool  7 років тому

      Thanks for your suggestion!

  • @rafaelzanfolin7745
    @rafaelzanfolin7745 2 роки тому +1

    Thank you sir! Eu really saved my life, hugs from Brazil!!

  • @raghavendra4955
    @raghavendra4955 4 роки тому +1

    Y there are less views🙃🙃.. Its an amazing explanation..

  • @mkaberli
    @mkaberli 5 років тому +3

    You’re my go to person for informative videos on pandas, Numpy, and Matplotlib.

  • @fet1612
    @fet1612 5 років тому

    Q&A Series: Video #14 _06(TC02:10)
    ==========================================
    If you are trying to figure out how this actually works, this might help:
    Let's filter the dataframe by Africa for a moment.
    ===========================
    drinks[drinks.continent == 'Africa'] > drinks (A) [ ] (B) drinks.continent
    ===========================
    drinks[]

  • @fet1612
    @fet1612 5 років тому

    Q&A Series: Video #14 _04(TC01:14)
    ==========================================
    drinks.beer_servings.mean()
    What if you wanted to know beer_servings across continents? You wanted to know, how it varied from Europe to Asia to Africa.
    So this is where a GROUPBY (groupby) is USEFUL. (TC01:34)

  • @marcinzaremba1811
    @marcinzaremba1811 3 роки тому +2

    That is exactly what I was looking for and couldn't find at the same time. Thank you!!!

  • @maggietang1369
    @maggietang1369 4 роки тому +1

    Great videos,helped a lot.Thanks!

  • @KimmoHintikka
    @KimmoHintikka 7 років тому +9

    On the agg method. Is there a way to simply return which countries the min and max belong to? I know that now that, min 0 for specific continent I could just filter dataframe based on continent and beer_servings value of 0, but could I somehow do that in the initial agg call to also print out the biggest and smallest consumers?

    • @dataschool
      @dataschool  7 років тому +3

      Great question! If there's an easy way to do this, I can't think of it at the moment. Let me know if you come up with something!

    • @ИгорьБелов-э6е
      @ИгорьБелов-э6е 7 років тому

      Any update on that? Thanks!

    • @adelsaleh5769
      @adelsaleh5769 7 років тому +1

      import pandas as pd
      drinks =pd.read_csv('data/drinks.csv')
      print('*****************************')
      max_bear= drinks['beer_servings'].max()
      min_bear= drinks['beer_servings'].min()
      print ('max_bear:',max_bear,',min_bear',min_bear)
      print('*****************************')
      print(drinks[drinks['beer_servings']==max_bear])
      print('+++++++++++++++++++++++++')
      print(drinks[drinks['beer_servings']==min_bear])

    • @BNOOOOGERS
      @BNOOOOGERS 7 років тому

      You can get the corresponding index values of a maximum with groupby().idxmax(). But if the country is in a column rather than the index, I think this is a 2-step process where you then index the country column by the results of the groupby.idxmax.

    • @derdere7803
      @derdere7803 6 років тому

      df.loc[df.reset_index().groupby('continent').beer_servings.idxmax()]
      df.loc[df.reset_index().groupby('continent').beer_servings.idxmin()]

  • @fet1612
    @fet1612 5 років тому

    Q&A Series: Video #14 _03(TC01:10)
    ==========================================
    drinks.beer_servings.mean()
    The average beer_servings is 106

  • @rin_645
    @rin_645 3 роки тому +2

    4:49 and dot aggggggggg~ hahaha 😂
    But seriously tho, what an absurdly good explanation of groupby function. I wish you could be a professor in my university🤪

  • @mmmmmmm12828
    @mmmmmmm12828 11 місяців тому +1

    drinks.groupby('continent').mean() does not work. but drinks.groupby('continent').min() works well

    • @dataschool
      @dataschool  8 місяців тому

      In the current version of pandas, if a DataFrame contains non-numeric data and you want to calculate the mean of all numeric columns after a groupby, you have to include the argument numeric_only=True. Thus: drinks.groupby('continent').mean(numeric_only=True). Hope that helps!

  • @sarandam5125
    @sarandam5125 5 років тому +2

    Thank you very much for an amazing explanation. I doubted .groupby method for a long time and finally found this video, This video totally clear my doubt out :D

    • @dataschool
      @dataschool  5 років тому

      You are very welcome! So glad it was helpful to you :)
      You might be interested in my latest video, which has some usage examples of groupby: ua-cam.com/video/dPwLlJkSHLo/v-deo.html

  • @raveeshmalhotra7347
    @raveeshmalhotra7347 4 роки тому

    greetings BROTHER,
    at 1:47 you have written command drinks.groupby('continent').beer_servings.mean() , is there another way of writing which includes df['beer_servings'] instead of writing in attribute format (.beer_servings)??

  • @T779298RA1
    @T779298RA1 4 роки тому

    drinks. groupby('continent').mean().plot(kinds='bar')
    OR
    d.c.plot()
    Just sayin'

  • @Tony-rx2zj
    @Tony-rx2zj 4 роки тому

    i recently noticed someone use this approach, drinks.groupby("continent").agg({"beer_servings":"mean"}) instead of drinks.groupby("continent").beer_servings.agg("mean") . Is it more preferable to use the first solution as it returns "beer_serving" label on column?

  • @robertobarriosduran2622
    @robertobarriosduran2622 4 роки тому +2

    Even for my bad english level, this explanation was so clear. Big Thanks!

  • @339059331
    @339059331 3 роки тому +1

    Awesome as usual! thanks very much!

  • @naveens696
    @naveens696 2 роки тому

    What if in groupby function....i want continents grouped with only 2 columns ( beer_servings and spirit _servings)

  • @saifeddinenasralli5656
    @saifeddinenasralli5656 4 роки тому

    Hello,
    what if i want to group by a string type of columns
    i mean i could use groupby('first_column).second_column.sum() but this will give me the strings attached to each others without a space between them :/
    so how can i add the space ??

  • @weskitlasten417
    @weskitlasten417 6 років тому +2

    Thanks! I realize this is years old, but worth a shot...
    One thing I would love to see clearly explained is how to change values in one column based on a condition on values in another column. For example, if drinks.spirits > 100 and drinks.total_liters_of pure_alcohol >10, change drinks.continent (or other column) to "lush". As an addition, if values in one column (say drinks.name) contain the word "dirty" change drinks.briny to True. Etc

    • @dataschool
      @dataschool  6 років тому +5

      For your first example, the code would look something like this:
      drinks.loc[(drinks.spirits > 100) & (drinks.total_liters_of pure_alcohol >10), 'continent'] = 'lush'
      loc is super useful! Here's more information on loc: ua-cam.com/video/xvpNA7bC8cs/v-deo.html

  • @ashwinghosalkar8307
    @ashwinghosalkar8307 4 роки тому

    Newbie in python. Can anyone tell me if this group by will work for a large dataset having 1 lakh rows and 500 columns?

  • @tomrhee1
    @tomrhee1 5 місяців тому

    Unfortunately, some of the values in the drinks data frame turned out to be NaN. I wonder if you could show us how to handle missing values to this. Thanks.

  • @guidogaggl4020
    @guidogaggl4020 5 років тому +1

    1:01 I know this is pedantic, but the values aren't weighted by the population size of teh countries so it's not actually a meaningful mean

    • @dataschool
      @dataschool  5 років тому

      Whether or not that is the "correct" calculation depends on the question you are trying to answer.

  • @ranger52289
    @ranger52289 Рік тому

    Great explanation! How could you show the information for the country in each continent with, say, the highest beer servings?

  • @aditimohapatra312
    @aditimohapatra312 3 місяці тому

    sir why in the last 2 cases where we didn't specify, in there with mean it is not executed but with min, max and count, it is being executed without showing any error? same for the visual form also?? help

  • @ckt991
    @ckt991 2 роки тому

    I really like your videos, thanks. What if you wanted to check for condition before doing any math? Life if the min > 0, then do some math operation?

  • @uguree
    @uguree 3 роки тому +1

    I will start your Sci-kit series too as I enjoyed so much watching all these pandas series as very clear.

  • @johnrogers3315
    @johnrogers3315 4 роки тому +1

    succinct and effective, many thanks

  • @lanpham7560
    @lanpham7560 4 роки тому

    Could you explain the difference between count() vs nunique()

  • @AhmedThahir2002
    @AhmedThahir2002 2 роки тому

    For the multiple aggregation, is there a way to round the result to 2 decimal places

  • @MrCraptakular
    @MrCraptakular 7 років тому +1

    Hello. Firstly, thank you for this magnificent video series. I really mean that! I have a question, continuing with your beer example. If you didn't have a continent column in the source data how would you accomplish the same output?

    • @dataschool
      @dataschool  7 років тому

      Glad you like the series! Regarding your question, the continent column has to be part of the DataFrame, otherwise you can't group by it. Hope that answers your question!

  • @hariaiyar1119
    @hariaiyar1119 4 роки тому +1

    Amazing explanation bro.

  • @RandomStuff-dw2bj
    @RandomStuff-dw2bj 3 роки тому

    how can we do different functions for multiple columns - mean for 2 columns and sum for other three columns? 2 question: how can we do linear regression for each group after group by ?

  • @BigAC7
    @BigAC7 3 роки тому +1

    This is great! I'm taking an absolutely hateful pandas course that's basically a needle-in-a-haystack scavenger hunt. Faced with a .groupby() problem where the course materials in no way logically lead to the answer, I wound up here. The problem doesn't precisely conform to your tutorial, but just three of your lines were enough to convey the basic mechanism of action that led to an answer. Thanks!

  • @ΕλένηΓκούμα-ν2λ
    @ΕλένηΓκούμα-ν2λ 4 роки тому

    nice presentation. if it is possible can you help me with error by using get_group()? it gives an error.thnx in advance

  • @mustafagunes5740
    @mustafagunes5740 4 роки тому

    How about multiple inner connected groups?
    Countries has cities, cities has streets, street has houses...

  • @Kristina_Tsoy
    @Kristina_Tsoy 2 роки тому +1

    Amazing video as always, Kevin! Thank you for great tip at 03:48

  • @dipakshah1008
    @dipakshah1008 3 роки тому +1

    You explain pandas very well! Pls keep posting ...would love to see something on how to apply custom functions to tables (avoiding iteration use)

    • @ChartExplorers
      @ChartExplorers 3 роки тому

      Hi, it sounds like what you are looking for is vectorization. I would recommend reading up on numpy where() and numpy select(). I looked through Kevin's (data school's) videos and didn't see anything on this topic. He is amazing at explaining things, in the mean time you many be interested in this video where I talk about vectorizing a little bit at the end of the video here:
      ua-cam.com/video/CG3EV7UBELA/v-deo.html

  • @law5
    @law5 Місяць тому +1

    Fantastic explanation :D

  • @arashchitgar7445
    @arashchitgar7445 5 років тому +3

    WOW... I wish all the tutorials were that clear. Thanks man!

    • @dataschool
      @dataschool  5 років тому +1

      You are too kind! 😄

    • @uguree
      @uguree 3 роки тому

      Corey and Kevin you are brilliant. Thanks Corey as he advertised your channel, so pleased. It's like here go how to make a you tube content to teach

  • @davebeckham5429
    @davebeckham5429 4 роки тому +1

    Excellent tutorials as always Kevin. Thanks for sharing.

  • @sulaimankhan8033
    @sulaimankhan8033 4 роки тому +1

    Your'e rate of speech is inline with your coding..., I appreciate it.. thanks

  • @adrianadominguezp.1553
    @adrianadominguezp.1553 6 років тому

    Does this mean that these are equivalent? Assuming dplyr in R:
    ## Python
    drinks.groupby('continent').beer_servings.agg(['count', 'min', 'max', 'mean'])
    ## R
    drinks %>%
    group_by(continent) %>%
    summarise(count = n(), min = min(beer_servings), max = max(beer_servings), mean = mean(beer_servings))

    • @dataschool
      @dataschool  6 років тому

      I don't keep up with dplyr any more, I'm sorry! Perhaps someone else can answer.

  • @wardomedet
    @wardomedet 7 років тому +3

    thank you for this tutorial series you are really a good teacher it helped me a lot
    I have a question would you consider making a tutorial video for pandas memory handling methods(I am not sure if the terminology is right ) how can we remove objects from memory when we are done or is there a way to control these processes
    thank you!!

    • @dataschool
      @dataschool  7 років тому

      Thanks for the suggestion! I'll consider it for the future!

    • @brianstampe7056
      @brianstampe7056 6 років тому

      I second this request!

    • @rajeshd3574
      @rajeshd3574 5 років тому

      I second this request even

  • @alanhernandez7201
    @alanhernandez7201 3 роки тому

    What happens if i have more that 264000 rows, and they are strings ?

  • @nevilparekh6400
    @nevilparekh6400 5 років тому +1

    Great Tutorials. :)
    Thanks for such a nice series.
    Could you please come up with videos on Heat Map & Correlation Table?
    thanks in advance.

    • @dataschool
      @dataschool  5 років тому

      Thanks for your suggestion!

  • @johnnyrotten43
    @johnnyrotten43 5 років тому +1

    You made that so easy. Thank you.

  • @BillyNawa
    @BillyNawa 6 років тому

    Hiya,
    The following code:
    topcommenters = pd.DataFrame(dfcom.groupby("author")["score"].sum().sort_values(ascending = False))
    topcommenters.head(5)
    Gives me a long error saying "TypeError: '

    • @dataschool
      @dataschool  5 років тому

      It's hard to say, without more details. Good luck!

  • @KirillBezzubkine
    @KirillBezzubkine 5 років тому +1

    How to SUM across columns (horizontaly) filtered by parameter?

    • @dataschool
      @dataschool  5 років тому

      I think this video will help you to understand: ua-cam.com/video/PtO3t6ynH-8/v-deo.html

  • @syntel13764
    @syntel13764 5 років тому +1

    i am a great fan of you .. you rocks !!

  • @rajeshd3574
    @rajeshd3574 5 років тому

    Hi Kevin,
    Thanks for the wonderful Video and i should really thank you for the service you are rendering for Data sciences community ....
    ​@UCnVzApLJE2ljPZSeQylSEyg in the below Code Snippet, i would like to apply aggression on two columns i.e, say beer.servings & wine_servings. How i can do this? is any way we can group by on one column and apply aggression on two column?
    drinks.groupby('continent').beer_servings.agg(['max','mean'])

    • @dataschool
      @dataschool  5 років тому

      Not sure I understand, sorry! But something on this page might help: www.dataschool.io/python-pandas-tips-and-tricks/

  • @mshparber
    @mshparber 4 роки тому +1

    Great! What about grouping by several columns?

    • @dataschool
      @dataschool  4 роки тому

      df.groupby(['col1', 'col2'])...

  • @legiegrieve99
    @legiegrieve99 7 років тому

    This is very helpful video. How can I apply this to the dataframe and want to keep all columns?
    I have a dataframe called Dataprep with 19 columns and 18484 raws. I have aggregated the columns I wanted with the code below:
    Dataprep.groupby('CustomerKey').OrderQuantity.agg('count')
    CustomerKey is the ID number for customers and orderquantity is the number of orders by each customer. The data look like(Sorry I only include the columns required):
    11000 - 1
    11000 - 1
    11000 - 1
    11000 - 1
    11001 - 1
    11001 - 1
    11001 - 1
    11001 - 1
    11002 - 1
    11002 - 1
    11002 - 1
    11002 - 1
    11002 - 1
    11003 - 1
    11003 - 1
    11003 - 1
    11003 - 1
    11003 - 1
    11004 - 1
    11004 - 1
    11004 - 1
    11004 - 1
    First column is customerkey and other one is orderquantity.

    • @dataschool
      @dataschool  7 років тому

      I'm sorry, I don't understand this dataset well enough in order to help. Good luck!

  • @ahowl7mx
    @ahowl7mx 3 роки тому

    I have a groupby question that I can't find on the internet. What if my dataframe columns are dates: 1/1/2021, 1/15/2021, 2/1/2021...5/15. Can I groupby by month? The dates run along the x axis as columns and not the Y axis as rows.

    • @dataschool
      @dataschool  3 роки тому

      I think you will need to convert to datetime format and then use resample. Hope that helps!

  • @panchopaulo111
    @panchopaulo111 4 роки тому +1

    This video was just amazing!! You just got a new subscriber.

  • @muyideenjimoh6672
    @muyideenjimoh6672 4 роки тому +1

    First video I clicked on and it answered all my confusion. Many thanks.

  • @cyruswpl
    @cyruswpl 8 років тому +2

    This is really helpful and easy to understand! Looking forward to your future great works, keep it up.

  • @Olly434
    @Olly434 4 роки тому

    is there a way to sort the results of a groupby mean on the servings? and filter by top 'n' per continent?
    i.e. I'd like to see the the top 3 results for beer servings sorted in descending order per continent instead of an alphabetized list
    Continent, beer servings mean
    Europe, 193.777778
    South America, 175.083333
    N America, 145.434783
    thanks in advanced, your videos are great!

    • @dataschool
      @dataschool  4 роки тому

      It's likely possible, though I can't think how to do it off-hand, sorry!

  • @brianbii1104
    @brianbii1104 3 роки тому

    I really enjoy ur video. i was wondering if you had any videos on how to do a report and if not then can u point to some resources that may have a way to explain it to me.

  • @252029ful
    @252029ful 3 роки тому

    Very clear explanation..Thank you very much for your contribution!! I am really enjoying while watch your video!!
    Why don't you publish a Course in Udemy?

    • @dataschool
      @dataschool  3 роки тому +1

      Thanks for your kind words and your suggestion! I know that many students like Udemy courses, but my values as a course creator don't align with their business model, and so I'm not currently interested in publishing a course there. I prefer to offer courses directly to interested students. Thanks for understanding!

  • @Leo-wd8vq
    @Leo-wd8vq 5 років тому

    7:50 drinks.groupby('continent').mean().plot(kind='bar'), how to show number for each bar?

    • @dataschool
      @dataschool  5 років тому

      Not sure off-hand, sorry!

  • @ryadbelkacemi5375
    @ryadbelkacemi5375 5 років тому

    Hello guys,
    First of all, great videos, tahnks man, secondly i wanted to filter the dataframe (drinks) by continent and only print those who are min or max of their continent :
    drinks[(drinks.beer_servings.isin(drinks.groupby('continent').beer_servings.min()) | (drinks.groupby('continent').beer_servings.max()) )]
    I find this line of code a little bit long, and i was wondering if i could make it easier to write (optimize it) ?

    • @dataschool
      @dataschool  5 років тому

      I'd have to play around with it to figure this out... let me know if you come up with something!

  • @rohan2441139
    @rohan2441139 7 років тому

    Hey! I was trying to run this snippet from a Notebook
    mean_tissue = {chunk.Taxon[0]:chunk.Tissue.mean() for chunk in data_chunks} # This part is returning a Key Error
    Can we do this using the groupby method or any fix to get this running?

    • @dataschool
      @dataschool  7 років тому

      I'm sorry, it's hard for me to say without knowing a lot more about the data and data types. Good luck!

  • @wadabulldog
    @wadabulldog 7 років тому

    Hi, can i possibly use groupby to assign items in a dataframe as keys in a dictionary, and then reference other items as their values? For instance, in a two column DF and you have name,age,sex,country in one column and the corresponding answers in the next column, can you use groupby to make a nested dictionary that looks like this
    biodata={name:{age:20,sex:male,country:latvia}}
    If it isn't possible, kindly advise on which pandas function is more appropriate for this.
    Thanks

    • @dataschool
      @dataschool  7 років тому

      I'm not sure how to do what you are suggesting. Sorry!

  • @magelauditore333
    @magelauditore333 4 роки тому

    U r awesome man. Pls Pls make numpy series for data science, stats

  • @ruslanzakharov4218
    @ruslanzakharov4218 5 років тому +1

    It was really useful for me. I don't know english well, but I understood almost all words in this video

    • @dataschool
      @dataschool  5 років тому

      That's awesome to hear! 🙌

  • @creekielappy
    @creekielappy 8 років тому

    Well hopefully you could help me out, I'm currently working with a dataset that has a lot of redundant description values in a certain column, can a groupby be used to find the maximum number value for each of those redundant values? Say I have 4 rows that have the ACC value, but each of those rows have the number value as follows: 1, 4, 6, 3.... can a groupyby be used to find the one row that has the max number value?

    • @dataschool
      @dataschool  8 років тому

      I'm sorry, I don't quite understand. Could you provide a short example to explain what you are trying to do? This link might help you to create a reproducible example: stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples

  • @mohammadabusitta9880
    @mohammadabusitta9880 6 років тому

    great video
    I have a question >> how to specify two columns ?? beer_servings and wine_servings for example
    thanx in advance

    • @dataschool
      @dataschool  6 років тому +1

      drinks[['beer_servings', 'wine_servings']]

  • @fet1612
    @fet1612 5 років тому

    Q&A Series: Video #14 _01(TC00:05)
    The question for today is: When should I use a "groupby" in pandas?
    ================
    Kevin says, "Let's jump right in and pull up an example dataset."
    import pandas as pd
    --------------------------------
    and our dataset is ALCOHOL consumption by COUNTRY.

  • @limjiahe531
    @limjiahe531 5 років тому

    Really cool vid but -
    Cant I use a pivot_table for all group_by operations? I'm a little confused when to use each...can anyone help?

    • @dataschool
      @dataschool  5 років тому

      This video might help: ua-cam.com/video/tcRGa2soc-c/v-deo.html

  • @shashanksaurabh9041
    @shashanksaurabh9041 8 років тому

    Hi, Can you please explain transform,filter and apply methods with groupby? Those are becoming headache for me

    • @dataschool
      @dataschool  8 років тому

      Thanks for the suggestion - I'll consider it for a future video!
      I do have one video on using 'apply', but not as it relates to a groupby: ua-cam.com/video/P_q0tkYqvSk/v-deo.html

  • @asadtanvir4065
    @asadtanvir4065 6 років тому +2

    Thank you Man! You are doing a great job here by showing Pandas tutorials in such easy way. Pray that you can carry out this benevolent work more

    • @dataschool
      @dataschool  6 років тому

      Thanks very much for your kind words!

  • @Dmitrycheburirtrie
    @Dmitrycheburirtrie 3 роки тому +1

    Best explanation !

  • @bryanortiz2098
    @bryanortiz2098 2 роки тому +1

    I love your videos bro

  • @RichardGreco
    @RichardGreco 4 роки тому

    Excellent tutorial! If possible maybe another video to refresh and/or to address Pandas 1.0.x?

    • @dataschool
      @dataschool  4 роки тому

      Thanks for your suggestion!

  • @joshsfox6266
    @joshsfox6266 2 роки тому

    How could I show the names of each country in each continent?

  • @khalidhassani6173
    @khalidhassani6173 4 роки тому +1

    agg is the best. thanks a lot

  • @noo-sho8500
    @noo-sho8500 5 років тому +1

    Great, clear tutorial. Plus now I know Europe is the most drunk continent

  • @All_Things_Trucking
    @All_Things_Trucking Рік тому

    Hey, thanks for the video on GroupBy. I am curious how you can use GroupBy in the context of quicker data processing of slicing and dicing. Rather than use a full dataset, use first groupby function to condense dataset and then work with that condensed dataset. Can you think about that scenario and elaborate how to do it in the most efficient way?

    • @dataschool
      @dataschool  Рік тому

      Sure, you can save the groupby object as then use that for future calculations!

  • @ainahannani4489
    @ainahannani4489 3 роки тому

    Hello. using your example, id like to make histograms for each continents. How do I make a poisson distribution out of the groupby mean? & fit the distribution to the histograms

    • @dataschool
      @dataschool  3 роки тому

      Sorry, I won't be able to assist you with this!

  • @scorpsallday33
    @scorpsallday33 7 років тому

    Are you fucking kidding me.... you sir are the man. These tutorials have been blessing me. Where have you been all my life 😂

    • @dataschool
      @dataschool  7 років тому

      Awesome! Thanks so much for your very kind comments :)

  • @manishmathuriya1832
    @manishmathuriya1832 3 роки тому

    Can Sir we get some practice problem based on that series

  • @zulqurnainahmad1130
    @zulqurnainahmad1130 3 роки тому

    one quick question, do i need to verify that country fiedl type is a string or the groupBy works with Objects. i have a lot of fields in my dataset which are Objects and can't use the groupby. it that is the case then how to convert Object to string?

    • @ChartExplorers
      @ChartExplorers 3 роки тому

      Strings are objects in Pandas (so you don't need to convert objects to strings). Groupby will work with object/string columns.

  • @robhubert8350
    @robhubert8350 6 років тому +1

    Nice job with the explanation. I combed through dozens of videos and articles online specific to leveraging groupby and your's by far is the best.

    • @dataschool
      @dataschool  6 років тому

      Thanks very much for your kind words!

  • @venkateshwaranp9517
    @venkateshwaranp9517 4 роки тому

    Can anybody xplain me..that how to see which country in africa having maximum beer serving

    • @ChartExplorers
      @ChartExplorers 3 роки тому +1

      You can see this video ua-cam.com/video/ni9ng4Jy3Z8/v-deo.html if for an explanation on how the first part of this code works.
      First we filter the df to only include countries in Africa.
      Then we look at only country and beer servings and sort it from largest to smallest number of beer servings
      Then we get the answer Namibia with 376 beers served
      africa = drinks[drinks['continent']=="Africa"]
      africa[['country','beer_servings']].sort_values(by='beer_servings', ascending=False)

  • @freeandeasy9795
    @freeandeasy9795 3 роки тому

    Nailed it! Exactly what I was looking for. Thanks

  • @iampujan
    @iampujan 6 років тому +1

    Easily understandable. Thanks for the video and keep making such videos.

  • @amadoum.jallow620
    @amadoum.jallow620 4 роки тому

    Let's say for instance, you have a column for mobile numbers and you want to find out the top 10 numbers that appeared more. How can i solve that?

    • @dataschool
      @dataschool  4 роки тому

      Try df.column_name.value_counts()