Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data

Corey Schafer

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 гру 2024
In this video, we will be learning how to group and aggregate our data.
This video is sponsored by Brilliant. Go to brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.
In this Python Programming video, we will be learning how to group and aggregate our data. This will allow us to explore our data in ways we have not yet done in this series. We will be able to answer questions such as: "What is the most popular social media site for each country?" We will be using the groupby method, and also some aggregate functions such as mean, median, value_counts, etc. Let's get started...
Video Timestamps:
Aggregate Column - 2:00
Aggregate DataFrame - 3:55
Value Counts - 7:51
Grouping - 12:30
Multiple Aggregates on Group - 26:00
People Who Know Python By Country - 27:20
Practice Question - 34:20
Concat Series - 37:27
The code for this video can be found at:
bit.ly/Pandas-08
StackOverflow Survey Download Page - bit.ly/SO-Surve...
✅ Support My Channel Through Patreon:
/ coreyms
✅ Become a Channel Member:
/ @coreyms
✅ One-Time Contribution Through PayPal:
goo.gl/649HFY
✅ Cryptocurrency Donations:
Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot
✅ Corey's Public Amazon Wishlist
a.co/inIyro1
✅ Equipment I Use and Books I Recommend:
www.amazon.com...
▶️ You Can Find Me On:
My Website - coreyms.com/
My Second Channel - / coreymschafer
Facebook - / coreymschafer
Twitter - / coreymschafer
Instagram - / coreymschafer
#Python #Pandas

КОМЕНТАРІ • 727

@coreyms 4 роки тому ⁺²⁰⁴
I hope everyone had a great week! We've got a long video this week, but we go over a lot of important topics about how to analyze data in Pandas. We will learn how to answer very interesting questions such as "What is the most popular social media site by country?". I put timestamps together for this video so that you all can skip around if you need to go back and watch a specific section. Here are those timestamps:
Aggregate Column - 2:00
Aggregate DataFrame - 3:55
Value Counts - 7:51
Grouping - 12:30
Multiple Aggregates on Group - 26:00
People Who Know Python By Country - 27:20
Practice Question - 34:20
Concat Series - 37:27
Have a great weekend everybody!
@calebmbugua745 4 роки тому ⁺⁶
Thanks so much bro,,,,much love from kenya
@anonymous-kl1un 4 роки тому
Hey, is this series gonna continue?
@anonymous-kl1un 4 роки тому ⁺¹
Can you explain all the types of joins
@anonymous-kl1un 4 роки тому ⁺¹
And if possible please explain multi-level Indexing as well
@JoshuaDHarvey 4 роки тому
Corey, is it safe to assume if your coming from a SQL background, that you can effectively use things like the 'pd.concat()' to replace the various joins (left, right, inner etc) workflows in SQL and just use SQLAlchemy or pyodbc libs to load the data and then do all the calculations with python that you would normally do in whatever SQL dialect?
@anubhavtomar1384 4 роки тому ⁺⁵⁴⁷
3:10 median function
5:00 describe function
7:20 count()
8:05 value_counts()
12:51 grouping the data
14:39 groupby() function
16:07 get_group(), grabbing a specific group by name
17:30 doing same by using the filters
18:40 using value_counts on filters
20:20 value_counts() for groups
21:49 using loc to find for one country
23:40 percentage by using normalize
25:00 median by country group
26:13 agg function for multiple functions
27:30 using filtering to get python users by country
30:20 error on using same approach for groups
31:40 apply method to run that on group
35:40 finding the percentage of people using python in each country(group)
37:40 using concat for combining series in a dataframe
45:30 adding percentage column
@afdqwfqwqwdfqwdawdas 4 роки тому ⁺²
thx, this is very useful. The videos already are very concise and to the point, but if I am just looking for how to do a proper groupby quickly on my own dataset....
@umutdemir2762 4 роки тому
thanks a lot.
@ravishekharprakash4172 4 роки тому
@@afdqwfqwqwdfqwdawdas sure
@80expertube 4 роки тому ⁺³
FYI, the percentage problem can be solved alternatively as follows: country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())
@sayarmandal1885 4 роки тому
@@80expertube It throws a RuntimeWarning
@parthrawri3001 4 роки тому ⁺²⁹⁷
I love the fact that there are no ads interrupting in the middle. So thoughtful. ❤️
@coreyms 4 роки тому ⁺²⁰¹
Yeah, I didn’t want the to ruin the flow of the videos. Glad you noticed :)
@parthrawri3001 4 роки тому ⁺¹⁸
Corey Schafer OMG! Your reply just made my day!
@livingwithlinlin3122 4 роки тому ⁺¹⁰
@@coreyms Thank you so much for doing this. You are such a considerable person with a big heart.
@JoshKonoff1 3 роки тому ⁺⁵
Corey, do you have a Patreon page? Thank you for your exceptional videos; a huge help for me and so many people!
@kylebeckhorn885 4 роки тому ⁺²⁰⁵
Yes please, do a video on the topic of MULTIPLE INDEXING!!
@j3553hh 6 місяців тому ⁺¹
I would pay to see Corey's tutorial on this. Every time I encounter a multi-index, I'm on stack overflow. It just doesn't seem to stick.
@pewolo 3 роки тому ⁺⁴²
Let's all admit that this dude is a hard working man and his work is just a wow!
I've been following him for quite some time now and I am always impressed by how thoughtful, tactical and clear his explanation is in every tutorial he makes.
Hat off to you, dude!
@felipegomez3047 3 роки тому ⁺⁸⁰
I'd like to share my solution to the practice question:
country_grp['LanguageWorkedWith'].apply( lambda x: x.str.contains('Python').sum() / len(x) * 100 )
As you can see it's just as symple as adding " / len(x) * 100 " in the lambda function, where len(x) is the total number of users for each country.
@ironpolux 3 роки тому
como se te ocurrio esto? ahem I mean, How did u come up with this? well played
@BCS_FahadAhmad 2 роки тому ⁺⁵
I guess x.count() in place len(x) makes more sense, since there can be people who did not answer language(I highly doubt XD)
@nicocilia5871 2 роки тому
@@BCS_FahadAhmad I think x.count() will not count NaN so I think len is better if you want to include people that skipped that question. I am assuming that was an option.
@gurjotsingh8631 2 роки тому ⁺³
I was thinking the same, so i downloaded his repository and tried it and it works. Came here to comment and saw your comment. so , i just wasted 5-10 minutes of my day. whatever.hallelujah.
@kingler199 2 роки тому
Damn well played
@jongyoonsohn8559 4 роки тому ⁺⁸⁷
I'd like to share my solution to the practice question.
ctr_knows_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True))
ctr_knows_python.rename({False:'Don\'t know', True:'I know'}, inplace=True)
ctr_knows_python
Hope this helps too!
@coreyms 4 роки тому ⁺¹³
Nice!
@moushumitamanna 4 роки тому ⁺¹
Hi, can you please explain what "na=False" means here and why do we have to put this in the code? Thanks in advance
@tplano3794 4 роки тому ⁺¹
@@moushumitamanna not applicable
@moushumitamanna 4 роки тому
@@tplano3794 thanks. But why should we put na=false in this code
@tplano3794 4 роки тому ⁺³
@@moushumitamanna in a column which is expected to have numbers, na does not make sense so we filter out these values. also if you run any functions (mean, median) then you may run into syntax errors
@zhenpan2048 11 місяців тому ⁺⁹
numeric_columns = df.select_dtypes(include="number")
medians=numeric_columns.median()
print(medians)
# this is a way of getting the medians of numerical values as I use df.median(), it gave me value error that says could not convert string to float"I am not a student who is learning to code" thanks for great work. I learn more from you than from my professors. Thank you so much for great efforts!😎
@heretolearndshare 11 місяців тому
You saved my learning session, thanks!
@giovannimantovani795 9 місяців тому
Thank you bro
@salehabdullahi9356 9 місяців тому
Thank you, you save me alot of time,
@mn4769 8 місяців тому
I found that you can shorten it by writing numeric_columns.median()
@sick-ol3jd 6 місяців тому
Thanks man
@merajajam425 4 роки тому ⁺⁸⁹
The level of my programming in Python has been substantially improved since I have started watching your great videos. Many thanks, Corey. Would you please prepare some videos regarding the networkx module as well?
@valerioharvey7289 2 місяці тому
other gurus are just like "here's the code for this, copy it and don't ask why" but you are the only one who shows how things work. Thank you very much
@milrione8425 4 роки тому ⁺³
I love how you are just using the same data throughout the whole series. Thank you so much, Corey!
@jiangxu3895 4 роки тому ⁺⁶
I just discover that your way of teaching is to tell not only how to do it but why this is how to do it. thumb up!!
@amir_forooghi 4 роки тому ⁺⁷
YESSSS !!! Corey`s video for groupby. I press like before I watch it. Groupby is just a superpower. Thank you for this awesome series Corey. You are the best.
@diegoalarcon6062 4 роки тому ⁺⁹
I don't care if some of your videos are long, in other channels they're just redundant but that's not your case! If you start doing short videos we may be losing all that valuable information that you provide to us. So far, this is the best Python channel I've seen. Greetings from Medellín, Colombia.
@Schmidt3k 4 роки тому ⁺³¹
For your practise question, use .mean() instead of .sum()
.mean() on a Series of bool will give you the fractions in a quick and easy way. Multiply by 100 for %.
edit: As per discussion below, .mean() ignores NA values whereas Corey's approach treats NA as '0'. An alternative is thus:
mygroups['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').fillna(0).mean())
Now, the results should be equal to Corey's.
@davidsp7949 4 роки тому
It looks like a nice solution but numbers from Corey's video are slightly different than those with .mean() and I do not know why.
For example:
for Afghanistan PctKnowsPython 18.181818, with mean is 20.512821
for Albania PctKnowsPython 26.744186, with mean 27.710843
Does anyone know why?
@sunramaroc 4 роки тому ⁺²
@@davidsp7949 yeah i have the same doubt,,i guess that s due to the fact that mean() take in consideration only the respondents who effectively answered the question,,and sum() take all respondents even the ones with NaN for the question.so corey solution is Pct over all respondents,, and the mean() is over only the ones who answered this Q.
@jasleung2932 4 роки тому ⁺²
.mean() neglects those "NAN" responses while if u use x.str.contains('Python').sum()/x.size instead, it would count those "NaN" as "no pythoner" which is what Corey was doing
@fjramons 2 роки тому ⁺⁴
Well played. For me your solution is quite elegant.
BTW, in case you wanted to treat NA as zeros (to get the same results from the video), you can simply use .mean() with its 'skipna' option disabled. This would make:
mygroups['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').mean(skipna = False)
@mahanansari6152 2 роки тому ⁺³
I just changed sum to value_count(normalize=True) and it worked
@elnazdehkharghani6121 4 роки тому ⁺⁴
You make all your subscribers happy with just uploading your videos !!! Thanks, Corey
@coreyms 4 роки тому ⁺³
Thank you all for watching!
@jorgetiz99 4 роки тому ⁺⁵
This has to be one of the best videos on youtube about Pandas, thank you so much. Greetings from Perú.
@antonyjohne 4 роки тому ⁺³⁷
Hey Corey! Thanks a million for the Pandas Series. As always, very intuitive and easy to follow.
Now that you've taught Matplotlib and Pandas, would love to see a new Numpy series in order to complete the Data Science trinity. Please consider adding a Numpy Series.
@YeekyYeeky 4 роки тому ⁺⁵
one of the best thing that happened to me when I woke up (I am on the opposite side of the world to Corey Schafer) is finding that Corey just upload another Pandas tutorial video , thank you !
@saiakhil4751 3 роки тому
I signed up for brilliant org just for Corey Schafer. Thanks for sponsoring him.
@Davidkiania 4 роки тому ⁺¹⁹
Best video in the series loving them and normally can’t wait for the next.
@walternyc 3 роки тому
Working on a project evaluating an employee survey and this is just what the doctor ordered. Thanks! One of the best channels in UA-cam for data analysis hands down
@shikharsaxena9989 4 роки тому ⁺³
after this lecture i started loving the complex coding of pandas and matplotlib. really you are an amazing teacher
@panpan4433 3 роки тому ⁺¹
For your exercise (What % knows Python) , I divided the sum in the lambda function by x.count() then multiplied by 100 :
country_group['LanguageWorkedWith'].apply(lambda x: 100 * x.str.contains('Python', ).sum() / x.count())
Thanks for the free content, awesome
@priyavratchaudhary9211 9 місяців тому ⁺¹
use len(x) instead of x.count() because count() function exclude respondents who does not know any language.
@brewtalxxx 2 роки тому ⁺¹
Thank you so much for this video. I learnt way more from this than the many hours I spent sitting in class listening to a teacher who just wanted to end the lesson early or have long lunch breaks. This is really precious. And thanks for the reassurance that if I find this difficult, there's nothing wrong with me LOL.
@codewithluq 4 роки тому ⁺²
Corey Again. Very fantastic tutor. I press the like button before I watch.
@vagelisilias 3 роки тому
I am a GIS student and I want to thank you because I'm doing my last assignment for university and I'm using Geopandas, matplotlib, pandas, cartopy and forth on and you helped so much with your videos, I have build a nice map and I have produced different tables with my data. Thanks god you are out there and sharing your knowledge free
@Prasanna_Rahavendra 4 роки тому ⁺⁵
Hey Corey! For the question you gave: The percentage of people by country who use python. There is an efficient solution too (Without creating a separate dataframe).
country_grp["LanguageWorkedWith"].apply(lambda x:(x.str.contains("Python").sum()/len(x))*100)
Actually what I am doing here is, in the lambda function, at the return part, I divide the No. of python users by the length of the given series and then multiply it by 100. This gives the percent of python users in each country. This approach might be a bit code efficient but can be a bit confusing for some.
@maheshmmmec 4 роки тому ⁺²
len(x) might not give u total respondent since it is series on LanguageWorkedWith and people might have skipped.
@gonzalezgenaro 4 роки тому ⁺¹
@@maheshmmmec Correct
@stevensukenik254 2 роки тому ⁺¹
You can use len(x) in the lambda, it will include the na in the series.
You cannot use count(x) because it skips na. But you can use value_counts(x).
If you run the following code, it will verify that Prasanna solution is corret:
len_total = country_grp["LanguageWorkedWith"].apply(lambda x: len(x)).sort_values(ascending=False)
us_no_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.isna().sum())
us_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.notna().sum())
df_counts = df['Country'].value_counts()
df_counts = pd.concat([df_counts,len_total,us_answer,us_no_answer],axis='columns')
df_counts.columns=['df_value_counts','lambda_len','user_respond','user_did_not_respond']
df_counts
produces:
df_value_counts lambda_len user_respond user_did_not_respond
United States 20949 20949 20769 180
India 9061 9061 8844 217
@fvdvhome Рік тому
Mr. Schafer, I am so happy I found your teaching. I have been on a journey to become a data analyst, and after completing the Google Analytics Course , I realized that I needed to learn much more. I am currently finishing a Python Course through Coursera offered by IBM.
Not every professional, no matter how good they are, have the natural ability to teach. Your method and technique are so amazing and helped me to overcome some of the confusions I had with coding in Python. I learned so much from just this video alone.
I will definitely visit the site you referenced, and look forward to learning more from your videos.
Thank you so much!
@gregoryogunna9527 Рік тому
American?
@prakhararora8981 Рік тому ⁺²³
hey if ur df.median() doesn't work and ur getting typeerror and valueerror u can do df.median(numeric_only=True)
@DilpreetSingh02 7 місяців тому
Thanks man
@atienograce2520 7 місяців тому
Thanks a bunch!
@anre3821 4 місяці тому
was looking for an advice on this, thanks a lot!
@greentree9751 2 місяці тому
thanks a lot
@WrongSmth 4 роки тому
Hey, Corey. I'm a network engineer and I'm learning pandas to be able to do some packet analysis and your videos really help me a bunch!
This is my solution for the coding problem from the video. Hope it helps!
know_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
total_respondents = country_grp['LanguageWorkedWith'].apply(len)
know_python / total_respondents
@Blueshockful 4 роки тому ⁺³
Im browsing thru some of the videos to brush up on Python, and this is the first python video that didnt get me bored. Concise and brillliant. Love your videos! keep up the good work :)
@LibardoLambrano 4 роки тому ⁺⁴
Thanks Corey for sharing these videos. Pretty clear explanations. You are a great teacher.
@WillMoody-crmstorm 2 місяці тому
Perfect timing. Thank you for all the effort put into these videos. I've done that last jibe crash so many times, with the feet just off centre
@deniscampana8345 4 роки тому
Thanks so much Corey ! It's clearly impossible not to understand what you explain on all your videos : It's fluid, straightforward, crystal clear ! And more over your english : Whaoooo ... Congratulations !! I wonder if I've learned more Pandas or english !!
200% great !!
@finncollins5696 Рік тому
thanks so much for this series. started from the first video two weeks ago, now in the 8th. this series so far made a lot progress in me,. thanks so much, .May God Bless You. Love from Sri Lanka...
@stressfreetrading1341 4 роки тому ⁺¹¹
NAMASTE!! Corey Schafer.. Love From India
@yosephkurabachew6539 10 місяців тому ⁺¹
perfect content. one flaw is that , you never explained what a "lamda" function is and went straight to using the function in your previous videos. you did the same thing here. now i have to first study lamda.
@lvcas9313 10 місяців тому
Yeah I had to take a look into it. For anyone curious, lambda and "def return" is the same thing, but lambda are throway functions while "def return" assign a name to be applid further in the code. The "def return" syntax is easier to read and more clean than lambda
@bhargav1811 3 роки тому
Every second of your python video are really worth it!!!
@sayantanchakraborty75 4 роки тому ⁺¹
Best videos on pandas on UA-cam by Corey Sir. Loving them and normally wait for the next videos. Lots of love for you from India.
@MatthewFoulk 4 роки тому ⁺²
Really appreciate the addition of practice problems. It helps me to grasp the material
@ahmedhawater7522 4 роки тому ⁺³
Man you are one of the best teachers who ever learned me something, much love and support ♥️
@yuewang9623 4 роки тому ⁺³
Every time I saw a new post, I click the 'like' button before watching:D
@rukhan8900 3 роки тому ⁺¹
The reassurance at the end was so appreciated as a beginner. Thank you for your help !!!
@yuanchima 4 роки тому ⁺⁴
My solution to the practice question:
def pctKnowsPython(x):
try:
return x.str.contains('Python', na=False).value_counts(normalize=True).loc[True]
except KeyError:
return 0
pctKnowsPython = country_grp['LanguageWorkedWith'].apply(pctKnowsPython)
@sayarmandal1885 4 роки тому
Thanks, Corey. This is one of the most comprehensive pandas tutorials on UA-cam. Love from India.
I also noticed a subtle issue. We are adding the number of respondents who filled their Country and not who filled LanguageWorkedWith. Someone can fill Country and not LanguageWorkedWith.
@ashkanfarahani6532 3 роки тому ⁺²⁶
Hi Corey. I think this might be a relevant simpler approach for getting percentage. I used value_counts(normalize=True) instead of sum.
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
This of course return both percentage who know Python and Who don't know. So if we want to get for a specific country, for instance Japan, then:
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)).loc['Japan'][1]
@RegularDude95 Рік тому ⁺¹
I have a similar approach to,i am happy to see that i am not the only one who always sees the easiest ways =)))
@JahidHasan-Aneek Рік тому
instead of using : value_counts(), use sum() in second line. Then you'll get appropriate answer.
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
@abdulkadirguven1173 Рік тому ⁺¹
Great approach. Thanks for sharing
@thotarohith2060 Рік тому
Here is my approach : filt=df['LanguageWorkedWith'].str.contains('Python',na=False)
python_count=df.loc[filt]['Country'].value_counts()
python_count.rename('p_c',inplace=True)
python_count
--
total_count=country_grp['Country'].value_counts()
total_count.rename('t_c',inplace=True)
total_count
--
result_horizontal = pd.concat([total_count, python_count], axis=1)
import numpy as np
result_horizontal.replace({'p_c':np.nan},0,inplace=True)
result_horizontal['perc']=(result_horizontal['p_c']/result_horizontal['t_c'])*100
result_horizontal
@dennisp5302 2 роки тому
I just went through Part 8 a second time. Thanks a bunch!! I learned a lot.
@qigangdeng8636 4 роки тому
Hey, Corey, I see many people gave their own answer which are wonderful. So want to give my one here, which looks like more a beginner answer:
#create a new column called: 'Python use'
df['Python_use'] = (
df['LanguageWorkedWith'].str.contains('Python')
)
#.value_counts() the 'Python use' column as it is a boolean type:
country_grp = df.groupby(['Country'])
Python = country_grp['Python_use'].value_counts(normalize=True)
Thanks for your great Lectures! I watched from part 1. I am your big fan now.
@mohammadghouse25 4 роки тому
Best Pandas playlist in youtube. One point solution for python learners
@jeremine9259 Рік тому
Just want to share here my solution for the practice question (but with the survey of 2022):
---
country_group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
---
And also give thanks to your wonderful videos, Corey!
It's been 3 years and they're still among one of the best tutorials.
@keepcontinue Рік тому ⁺¹
I thought the same thing
@ankitranjan7670 4 роки тому ⁺⁴
We can do this in the lambda function for getting the percentage :
lambda x : x.str.contains ('Python').value_counts(normalize=True)
And I would love to thankyou for all these videos!
@akashshelar4388 3 роки тому
"a = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))"
this what i did is it correct or not i dont know??
@Imrannaseem818 4 роки тому ⁺¹
Thanks Corey.
I have waited for this video whole week.
Great explaination
@alejandropereyra438 3 роки тому
This video is so useful , the simplification that python does for the problems is so helping. is the best language in the programming of code. And the proffesor of this video is really a genius. !!! thanks.
@way_to_be_analyst6042 3 роки тому
im just diving into pandas and would like to say - GREAT THANK YOU for such nice and detailed explanation.
great job!
@omj7113 Рік тому
Thanks a lot for your teaching! Here is the my solution at the end of the video:
# group object['column'] is a Series object, so the input of the function is a Series, ana the output value of the function is a float
def percent_know_python_each_country(countrySeries):
num_know_python = countrySeries.str.contains('Python').sum()
num_all = len(countrySeries)
percent = round((num_know_python / num_all * 100), 2)
return percent
country_group['LanguageWorkedWith'].apply(percent_know_python_each_country).sort_values(ascending=False).head(30)
@bobchannell3553 4 роки тому ⁺¹
Thanks for doing this video in a detailed way, like you always do. Just under an hour is a good length for a video like this. Thanks!
@nicholaspolino2657 4 роки тому ⁺⁴
LOL @ "If I did this correctly, and it's definitely possible I made a mistake." Happy I found these videos, thanks.
@srich-k 4 роки тому ⁺³
A much simpler way of finding the percentage of people from each country who know python is by using the mean() function
Something like this :
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').mean())
P.S: Corey's method produces a much detail-oriented version of this result
@akosasuke5128 2 роки тому ⁺¹
Corey Shafer deserves a UA-cam Teacher award
@rauberhozenplotz7009 4 роки тому ⁺¹
Helps me to get into my PhD. Thanks a lot for uploading this!
@markkennedy9767 3 місяці тому
The groupby explanation comparing it to a filter is really good.
@parsahosseini4241 4 роки тому
47 minutes of a pure pandas tutorial from a god in python, man you're a hero🔥🔥
@joncochran9647 3 роки тому
I've watched quite a variety of different data analysis tutorials and this one was easily one of the most engaging for me. Having interesting data really helps.
@aborucu 3 роки тому ⁺¹
Perfect explanation. Making a convoluted yet so important concept crystal clear through step by step explaning and also giving connections to pandas object types. Cheers!
@federicohan1458 3 роки тому
I found amusing explaining what a percentage% is after going over apply & lambda methods, but that's exactly the thoroughness that makes your videos so loved :)
@manish_chandra 2 роки тому
One of the best and most easily understandable vid on Pandas. Thank you for creating this !!
@tolex3 4 роки тому ⁺²
I've been doing data analysis using Python & Pandas for a few years now. Still, I'm picking up new things from your videos. Very clearly explained! Thank you!
@Yasharvl 4 роки тому ⁺¹
Thanks Corey! This is pure gold!
@Am-hsb 4 роки тому ⁺¹
Thanks a lot Corey! Got to learn complex syntax in simple ways. You are amazing teacher.
@bobchannell3553 4 роки тому
This was a lot to learn in one video. That's why I went back and watched it again this week. At the end, I added something I think would be useful in what I do. I added a filter to select records where the number of respondents is >= 5.
filt = python_df['NumRespondents'] >= 5
python_df.loc[filt]
@vijaybhatt4347 3 роки тому
You're Genius
Love From India
You mentioned India in your tutorial
Love this gesture 🇺🇸🇮🇳
@Tigrex281 4 роки тому
Hey Corey,
first of all thank you very much for all those fantastic videos. I also have tried to answer the question of percentage knowing python for each country. I came up with following solution:
PrctKnowPython = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True)).loc[:,True]
One advantage of this approach is, that you can just remove na=False and ignore NaN values in your data.
@AugustoGeografo 4 роки тому ⁺³
Always looking forward for your videos, Corey.
@mggarekar 4 роки тому ⁺²
nice video :) i liked the q/a approach at the end where you left it open.
@denizcicek7333 3 роки тому
You are just wonderful, it makes so much fun watching your tutorials. I finde directly the answer, those I need.
God bless you brother.
@fiefiego2298 2 роки тому ⁺¹
thank you Corey!! this is a wonderful pandas series!! you make the concept so easy that even a python beginner (that is me) without programming background in colleague can understand!!
i'd like to share my solution as well: (since i don't know concat method, i calculate the answer first then convert them into a dataframe by dictionary)
know_py = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
answer = df['Country'].value_counts()
per_cent = know_py/ answer
result = pd.DataFrame({'answer': answer, 'Python': know_py, 'percentage': per_cent})
i find you stop uploading new tutorials for a long time, hope everything goes well with you. and strongly looking forward to hearing from you soooooon!!
thank you & greeting at 2022 sep 7th :)
@buzz.b 2 роки тому
Thank you for the last example (percent that knows python). It was great to see how the different methods learnt can come together in a practical example; this really helped consolidate the knowledge gained.
@saadchaudhry9110 Рік тому
you teach really well, I am learning a lot...Thanks. I have also learned other topics from your videos....I was stuck on opening and reading a csv file in python, till I saw your video and learned it....I am an absolute beginner...😅...Thanks
@pcordeirobr 3 роки тому
Corey, the content of your videos are amazing. This tutorial in special is sensational.
@belleriveblvd 4 роки тому
Corey, I learn a lot from your videos. But this one has been especially helpful. Thanks.
@RavenEX1980 3 роки тому
best tutorials ever, i have read lot of books, but your technique is global and works best...keep on the good word @Corey Schafer
@sumranms 4 роки тому ⁺¹
I really like the way you speak. Your language is clearly understandable and you have a great accent. :)
@AAND8805 3 роки тому
I am following your pandas series since the last 3 days and may complete in 1 or 2 days max, I will come back to the series to revise it, very well made Series and keep up the good work !😀
@charimuvilla8693 2 роки тому
At 35:40 since you want to get the percentage of boolean values that are 1 you can pretty much just replace .sum() with .mean(). It's sum/length.
@СергейФролов-ъ5я 3 роки тому
Corey, thank you very much for your free videos!
@danteawakening4430 2 роки тому
Hello, about the question in the video, I don't know if my attempt is right but here it is:
1) To take into account the NaN values (so they are involved in the calculation):
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum() / len(x))
2) Not taking into account the NaN values:
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum() / x.count())
Since the information of how many entries are in country_grp and are grouped by country I think it is inefficient to look for the number of entries elsewhere, thank you for the course !
@yunfengliang6746 4 роки тому ⁺¹
Thank you very much for the great video. for the percentage calculation, I use the following code based on what is learned:
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)).loc[:,True]
@Boat-xs8lm 6 місяців тому
Thank you,I am very lucky that I found your tutorials.
@maheribnerahman7783 4 роки тому ⁺¹
Really appreciate your teaching strategy man..have been learning a lot from you since the quarantine started.Love from bangladesh
@satyamgaba 3 роки тому
Best Valentine's Video!!
@turksonmichael1236 Рік тому
Thank you for this. Had clearer understanding of pandas than before. Wish you the very best
@akritisstory850 4 роки тому ⁺¹
you are a great teacher Corey.
@yashkumarsinghpatwa9267 4 роки тому ⁺¹
Hey Corey Schafer Thanks a lot for this amazing series which helped me to upgrade my skills which I was unaware of that.
@ironpolux 3 роки тому
Really enjoying this series, thank u Corey!
@Terence818 4 роки тому
Yes Corey, having a future video on multi-index will be very helpful!
@saraghafelehbashi5808 2 роки тому ⁺¹
Wow! what a fantastic explanation! I saw lot of videos! but you are different! i know why? because you are using pandas with great examples and nice explanation! thank you again!
@matthiashupfer2659 4 роки тому ⁺³
These tutorials are well thought out and really great in explainatory purposes. Greats skills here from Corey! Thank you.
@anantharjun9662 Рік тому
Heyyy coreyyy I got drops in my eyes after watching the way you taught....you made my day❣️✨love you so much corey
@moushumitamanna 4 роки тому ⁺¹
First of all, Thank you so much for all your tutorials. You are a great teacher.
and yes, please make a video on Multiple Indexing.

Наступне

Автоматичне відтворення

Python Pandas Tutorial (Part 9): Cleaning Data - Casting Datatypes and Handling Missing Values