Exploratory Data Analysis in Pandas | Python Pandas Tutorials

Alex The Analyst

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 лис 2024

КОМЕНТАРІ • 203

@pbp7 Рік тому ⁺⁴⁷
Man, “Oceania” was so funny 😂, tks for the class!
@santiagofajardo4949 Рік тому ⁺¹²⁰
Hello,
at minute 24:24, I managed to reverse the range of column names using [5:13][::-1]. The expression [::-1] is used to reverse ranges and it is very useful:
df2 = df.groupby('Continent')[df.columns[5:13][::-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False)
df2
Thank you very much, Mr. Alex, for these tutorials.
@WorkJob-g3o 10 місяців тому ⁺¹
Thank You!
@renanz21 10 місяців тому ⁺⁴
Alternatively, start counting columns backwards,
df2 = df.groupby("Continent")[df.columns[-5:-13:-1]].mean().sort_values(by='2022 Population', ascending=False)
df2
@AlexisTeseyra-mj4ft Місяць тому ⁺²
or df3.plot().invert_xaxis()
@OkallTheAnalyst 9 місяців тому ⁺¹⁹
Incase you are running into an error at minute 11:12, add numeric_only = True to the corr. i.e df.corr(numeric_only = True).
@mananagrawal4114 8 місяців тому
thanks man !
@Hamzahnahmad 4 місяці тому
thank you. really helpful!
@usmanhammed7158 2 місяці тому
Thank you
@jaymanhire День тому
Thanks! Never seen that before!
@ruthbeaubrun6954 День тому
thank you!!
@JW-pu1uk Рік тому ⁺³⁴
This is absolutely top tier content. I can't stress this enough to people new, or going into the DA/DS field: you WILL be exploring and cleaning data sets much more than you will be visualizing and building models.
Thanks for this, Alex!
@satrapech6107 Рік тому ⁺⁴⁵
the correction of df.corr() is:
numeric_columns = df.select_dtypes(include=[np.number])
correlation_matrix = numeric_columns.corr
correlation_matrix()
@pradiptanugraha6841 Рік тому ⁺¹
Thanks it works. Why df.corr() not working on me ?
@rajkumarjadi7061 Рік тому
thanks man.
@francescab1413 Рік тому ⁺⁵⁴
df.corr(numeric_only = True)
worked for me
@arrofifahmi7708 10 місяців тому ⁺²
@@francescab1413 me too mate! Thanks a lot!
@SDMNKhan 10 місяців тому
name 'np' not defined?
@kartikgupta370 11 місяців тому ⁺¹¹
We can also write this to save time writing all the column names in the list "df2 = df.groupby('Continent')[df.columns[12:4:-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False)
"
@rafaelmarques5623 11 місяців тому ⁺¹²
Oceania is one of the 7 Continents (North America, South America, Europe, Asia, Africa, Oceania, Antartica). It's basically Australia and the countries (islands) around it.
Hope that helps!
@AlastorGarcia Рік тому ⁺¹¹
Thanks Alex! Right now i'm applying to my first DA Job and you have no idea how useful your videos have been for me!!
@ermano5586 Рік тому ⁺²
Hey? How is it going? Did you succed in applying for the job you want?
@frenamakenson9844 9 місяців тому ⁺²⁷
Hello,
100000000 thanks for sharing
For the Corealtion part at 11mn
df.corr(numeric_only=True) # pass numeric only param to not having error
@matthewchristian9969 3 місяці тому
Thank you!
@muhammadqasim9749 2 місяці тому
thanks
@pradiptisimkhada292 Рік тому ⁺⁴
I just finished all the videos in you bootcamp playlist few hours ago and I'm excited to do this again..
@shankarmidatala2049 4 місяці тому ⁺¹
Namaste! I found your tutorials "Simple, Easy to follow, and To the point". Thanks.
@toygar8699 11 місяців тому ⁺²⁶
For those get error in heatmap:
import matplotlib.pyplot as plt
numeric_columns = df.select_dtypes(include=['float'])
sns.heatmap(numeric_columns.corr(), annot=True)
plt.rcParams['figure.figsize'] = (20, 7)
plt.show()
@asmitaupadhyay4656 8 місяців тому
thank you
@nointernetnarwhal7615 7 місяців тому
THANK YOU!!!!!! I almost quit for good.
@nassrmohamed278 7 місяців тому
i had that error in corr : " could not convert string to float: 'AFG'"
do you know how to solve this
@kaleabgirma-x2b 7 місяців тому
thanks a lot toygar
@yanpaucon1043 6 місяців тому
@@nassrmohamed278 df.corr(numeric_only=True)
@DuckingDuck-th2lt 10 місяців тому ⁺¹¹
Hello, Alex!
Once again, thanks a lot for all your hard work!
At 13:10 I got an error ValueError: 'box_aspect' and 'fig_aspect' must be positive"
Solved it by putting the plt.rcParams BEFORE the sns.heatmap
The other problem was that some functions didn't work until I added the parameter numeric_only = True, e.g., df.corr (numeric_only=True) or .mean(numeric_only = True)
Hope, it can help someone!
@yanpaucon1043 6 місяців тому
Thank you, You are the Best!
@alexishuynh 2 місяці тому
It certainly helped. Thank you, DuckingDuck.
@sj1795 10 місяців тому ⁺²
EXCELLENT SUPERB video!! I can't believe it--I'm 6/7 videos away from the end of your FANTASTIC bootcamp series! Wahoo! I learned a lot in this video. :) As for "ending on a low note", hardly Alex lol All your content is uplifting and rewarding! As always, THANK YOU!
@abhishekchaudhary7913 10 місяців тому ⁺²
df4=df3.sort_index(ascending=True)
df4 at 26:11 as alex is sorting manually you sort the year directly by this command
@АлександрПокладов-х8т 11 місяців тому ⁺¹
Hey, just a quick note here, when we're plotting the populations, it's only related to the numeric values compared to the highest populations, in fact (for example) Oceania's population increased in around 2.5 times
Anyway, thanks for the content, it's amazing
@ngwamalfred8151 Рік тому ⁺¹
Where would l have been without this video .
@kogureyoeh Рік тому ⁺⁶
at 24:00
you can just simply add ".sort_index()" on the "df3 = df2.transpose()", so that we don't have to manually rearrange the columns.
df3 = df2.transpose().sort_index() worked on my end, hope on your end too.
@abisolalumous5505 10 місяців тому
thank you
@keluargaindo-timordiuk Рік тому ⁺⁷
For the grouping data I do df2=df.drop(columns=['CCA3','Country','Capital'])
df3=df2.groupby('Continent').mean(numeric_only=True).sort_values(by="2022 Population",ascending=False)
df3
to get to the same output as seen in the video
@danielmariobuchberger Рік тому
Me too, this should be explained, because Strings can not get easy a mean...to long is most the problem!
@bolajiawofuwa8116 11 місяців тому
THANK YOU!!!!!!
@MaximKazartsev Рік тому ⁺⁴
Alex, thank you for this great video and everything you do!
In order to avoid manual ordering of the population years, there is a way to use df.columns method, by adding reversed. The whole construction looks like
df2 = df.groupby('Continent')[list(reversed(df.columns[5:13]))].mean().sort_values(by='2022 Population', ascending=False)
And it works )
@languagewanderlust 8 місяців тому
thank you!
@DEDE-ix9lg Рік тому ⁺¹
I always enjoy a video from Alex. Making one of the best videos , while some other channels just can be a real headache
@quotesdiary310 Рік тому ⁺²
Hi Alex
Thank you so much for your support for freshers in the field of data analytics.
@tranguyen4462 7 місяців тому
omg I laughed out loud at the "Oceania" part ;)))) Alex is so funny and brutally honest about things he didn't know ;)))
@Zenitsu-mq7fq 8 місяців тому ⁺¹
24:50
df2 = df.groupby('Continent').mean(numeric_only=True).iloc[:, -5:-13:-1].sort_values(by = '1970 Population', ascending = False)
df2 = df2.transpose()
df2.plot()
This way we don't use the copypasting and changing columns, just use reversed indexes)
@Inc0gnit030 Рік тому ⁺¹
I really enjoyed this introduction to Pandas! Keep up the good work!
@aishwaryapattnaik3082 Рік тому ⁺²
Thanks a lot for this clear cut explanation. Can you make something similar for NLP projects end to end ?
@LaMeeLifestyle Рік тому ⁺⁴
Thanks for all you do. I’m loving the bootcamp. Just finished excel project. However, please can you make a video on story telling?
@kevindeschepper8140 5 місяців тому
another way to select the columns (think of a big data sets where indicing with numbers would be challeging) columns_to_include_2 = df.select_dtypes(include=['number']).filter(like='population').columns
@kevindeschepper8140 5 місяців тому
columns_to_include_2 = df.select_dtypes(include=['number']).filter(like='Population').columns.difference(["World Population Percentage"]):P
@anuarroho2561 3 місяці тому ⁺⁴
mean(numeric_only=True)
@moniquebrasilbaptista1989 Рік тому
I am sure I am going to use some of these tips. Thank you!😍❤
@kevindeschepper8140 5 місяців тому
To exclude rank from being display in the numerice data: columns_to_include = df.select_dtypes(include=['number']).columns.difference(['Rank'])
@abdulsami6117 Рік тому
Love from Pakistan Alex, Really Helpful and Enjoyable.
I also like the OOPS sound you make 😂😂
@staquatica1607 Рік тому ⁺⁴⁷
I got some error's (using pycharm) that I solved by using "mumeric_only=True". For instance: df.corr(numeric_only=True) and df.groupby("Continent").mean(numeric_only=True)
@mohammedshadaabkhan3228 Рік тому ⁺⁶
Hey use this code instead
numeric_df = df.select_dtypes(include='number') # Select only numeric columns
plt.figure(figsize=(20, 7)) # Set the figure size
sns.heatmap(numeric_df.corr(), annot=True) # Create the heatmap with annotations
plt.show()
@DevanshAsawa 10 місяців тому ⁺¹
helped a ton thanks
@haley2486 10 місяців тому ⁺¹
Thanks for posting! I had to do SHIFT+TAB on the corr() function to find out how to get only numeric values.
@nassrmohamed278 7 місяців тому ⁺¹
thaaaaaaaaaaaaaaank youuuuuuuuuuuuuuuuu
@sarayusemesta6132 5 місяців тому
26:00
you can just add this to inverted columns
df2 = df.groupby('Continent')[df.columns[5:13]].mean(numeric_only=True).sort_values('2022 Population', ascending=False)
df2_inverted = df2.iloc[:, ::-1]
df2_inverted
@adityavamsi12 29 днів тому
Love from India❤❤
@nadarioferguson6276 7 місяців тому
Thank you so much for this. I really enjoyed it and learned a lot of what I had forgotten a few years ago.
@jeffrey6124 3 місяці тому ⁺¹
Hope you also make a Pyspark series 🤓
@Charlay_Charlay 10 місяців тому
Thank you for the Pandas class!
@neildelacruz6059 Рік тому ⁺¹
Thank you Alex this is very helpful.
@zachary626 10 днів тому
df.corr() ❌
df.corr(numeric_only=True) ✅
since this posting numeric_only now defaults to False so if using newer versions of panda here is the correction:
@sivasagarchakkarai1687 4 місяці тому ⁺²
If "df.corr()" doesn't work for the same data set were using in this Video. And It throughs an error : could not covert string to float: 'AFG'. Like this, Try : df.corr(numeric_only = True)
@nitinrawat-g6t 4 місяці тому
same
@nitinrawat-g6t 4 місяці тому
numeric_columns = df.select_dtypes(include=[np.number])
correlation_matrix = numeric_columns.corr
correlation_matrix()
@haithammontaser7769 Рік тому
Hello Alex. Thanks for the video and content. Is there any video for data per-processing?
@enix492 Рік тому ⁺²
Hello Alex. I read a few reviews on your recommended course on Udemy. People are saying that it is a bit outdated especially the last section. Do you think I should still go for it and the non updated part doesn't matter? Love your content and thanks for everything you do here.
@AlexTheAnalyst Рік тому ⁺²
I haven't taken it in a while - worth listening to more recent comments. Could be outdated?
@СергейСтуднев Рік тому ⁺¹
Thank you for the useful information!
@vitorribeirosa Рік тому ⁺¹
Neat...
Thanks for sharing this content.
Cheers
@thepasstimevideos7195 3 місяці тому
superb video sir..
@innocentnduaguba 11 місяців тому ⁺²
Thank you so much Alex, truly great content you put out there. I have a question please; when I run df.groupby('Continent').mean() and df.corr() I get errors, please what could be the cause and what can I do to remedy it.
@sabithsaqlain1367 11 місяців тому ⁺¹
use df.corr(numeric_only = True)
@sj1795 10 місяців тому ⁺¹
@@sabithsaqlain1367 THANK YOU for this!! This was driving me a little nutty. Really appreciate you sharing this. :)
@SDMNKhan 9 місяців тому
I could not fix the mean() issue.
@chriscurtis95 6 місяців тому ⁺¹
df.groupby('Continent').mean(numeric_only=True)
@Gratitude-x3g 4 місяці тому
@@chriscurtis95 🙏 Thank You!
@truthgaming2296 10 місяців тому
its spells 'O-Ce-A-Nia' btw
btw thank for this guidance SIr Alex :)
@TheRobinCreations Рік тому
Thank you so much it was very informative.
@aayushitrivedi3481 Рік тому ⁺²
love your videos alexx ;)
@SoggyBagelz Рік тому ⁺³
Lets goo!
@gauravpunera3256 Рік тому ⁺¹
Alex please make video on how to get international remote data analyst job
@HarshKumar-ws3wv 8 місяців тому
Sir, in your opinion : Jupyter vs Pycharm? Which is better for Exploratory Data Analysis ?
@elfridhasman4181 Рік тому ⁺¹
Thank you Alex💯🔥
@quotesdiary310 Рік тому ⁺¹
Thank you so much alex
@jjsan1 6 місяців тому
This is great! Thank you!
@Marcusram Рік тому
we can do df3=df3.iloc[::-1] to solve the problem with the date order
@minasghazaryan9344 Рік тому ⁺⁶
Hi, Alex. First of all thanks for a great video and explanations in it.
If you could help out with the issue I get running your exact code I would be more than grateful.
Running the df.corr() line gives me the following error: ValueError: could not convert string to float: 'AFG' .
Same comes for the heatmap,etc. What could it be here?
Thanks a lot in advance.
@ReneePieschke Рік тому
Getting the same errors.
@11zaad Рік тому ⁺²
try this ==> df.corr(numeric_only=True)
@dustin3320 Рік тому ⁺¹³
Best to use df.corr(numeric_only=True) to get around this
@Batira583 Рік тому
you saved my life thanks so much @@dustin3320
@fede77 11 місяців тому ⁺²
df.corr(numeric_only = True)
@youssefbekk4453 Рік тому
high level , thanks
@orlumbuseuw5646 Рік тому ⁺¹⁹
Was there here an adult ignorant of what Oceania is or is this some inner joke in the channel?
@octaverius762 Рік тому ⁺²
I can't believe this
@litoavila. Рік тому ⁺¹
Also FYI America is just one continent, in case you doubt it
@MatthewBreithaupt Рік тому ⁺²
OceanEeeA
@MatthewBreithaupt Рік тому ⁺¹
FYI Australia is not a *small* island. Oceania doesn't "mean" anything, it's the name of a continent containing the countries listed right in front of you since you already filtered the data 😂😂
@rjk537 Рік тому ⁺¹
I'm a law graduate without any experience or qualifications in data analysis whatsoever but i want to get into data analysis. Will i be able to get a job in this field? and if yes then what possible skills and certifications will help me to achieve the same? please give me some tips and insights it would be really helpful!
@ermano5586 Рік тому
Yes, you can, from skills I would prefer mostly analytical thinking, learn probability and statistics, other high math stuff.
From certification mr Alex said that Amazon and Tableau certifications, and others will help, but anyways if it's long-term learning certificate, I think it is ok to have it on CV. But the thing that highlites you it is the projects that you have done mostly for your job and I mean not only portfolio projects but another ones to show your uniqueness.
@БулатШарафутдинов-р6д Рік тому
Again, thank you were much!
@philiprhome3824 Рік тому ⁺¹
as R user, the syntax of pandas is just weird in compare to tidyverse (dplyr and tidyr)
@octaverius762 Рік тому ⁺³
Alex which continent do you think Australia is in 😮
@AlexTheAnalyst Рік тому
:D
@chefernandez563 Рік тому
Australia is also a continent tho😂 sometimes ppl will also refere to NZ ans Aus as the "Australias" but Oceania includes the other surrounding islands
@octaverius762 Рік тому ⁺¹
@@chefernandez563 Oceania is a continent, Australia is a country. How people often speak is not relevant
@dragoneer121 Рік тому
@@octaverius762 Actually it is relevant. Though different countries do have different models and its entirely up to convention. Australia the continent is usually considered the 3 islands of mainland Australia, Tasmania and Papua New Guinea
@chgfxghjjkllll 3 місяці тому
oh-shee-ana ! you killed me ...
@alikoohi8265 Рік тому
informative video thanks.Just found an easier way to reverse order of rows:
df3 = df2.transpose().loc[::-1] 😉
@Chathur732 3 місяці тому ⁺¹
at 11:12 the df.corr() does not work now. Instead use:
df_numeric = df.select_dtypes(include=[float, int])
correlation_matrix = df_numeric.corr()
correlation_matrix
@abhi8243 2 місяці тому
Thank u
@onosemuodeikuesiri7620 Місяць тому
This is simple and more straightforward
df.corr(numeric_only = True )
@diegomartins7214 Рік тому
Thank you!
@karanvaghela4668 Рік тому
Hey alex why we should use python instead of SQl Because SQl is easy
@dragoneer121 Рік тому ⁺¹
Continents are mostly a social convention. The english spekaing countries tend to use 7, while spanish speaking countries have a 6 continent model where it uses Oceania and combines North and south America.
Australia is the continent but Oceania is a geopolitical convenience. If it was not included most of the pacific isalnd countries would not be associated with a continent. North and South America are another convenience and Central america is only a region by American standards.
As an example of how ridiculous it is as a continent, Hawaii would be included if it was independant.
@ayoubchouket 9 місяців тому
thank you
@OazadOMER Рік тому ⁺¹
Thank you very much Alex I'm shifting from Ph to Data Analyst with your bootcamp I had an issue with plt.show() AttributeError: module 'matplotlib' has no attribute 'show' i's deprecated and I counldn't find something sameller and also my chart not showing numbers 14:10
Best regards
@dishanbhandari 7 місяців тому ⁺¹
Hi there, did u find the solution to your problem of not showing numbers? I ran into the same problem too.
@olaleyeboluwatife949 5 місяців тому
@@dishanbhandari hey mate, you found the solution?
@iqraasif3783 Рік тому ⁺¹
Hi, can someone help. When I plot figures that have been grouped, it doesn't show the figure, just says .
@JayDenton-n1n 9 місяців тому
21:09 I just figured it out. Simply add another line after the plot, like:
df2.plot()
plt.show()
@ajeyarajupadhyaya8287 Місяць тому
Hey please tell me how to get a discount for the python with pandas course It is too expensive in Indian currency
@akademy_performance_digital 10 місяців тому
great
@nishanths3176 2 місяці тому
Can I get the dataset for this
@adminravi Рік тому ⁺¹
Is it ok if I use:
pd.set_option('display.float_format', '{:.2f}'.format) instead of
pd.set_option('display.float_format', lambda x: '%.2f' % x)
@rohallav Рік тому
or even better you can do lambda x: f"{x:.2f}"
@meredithleonor5035 Рік тому
why use anaconda instead of google collab, just curious looking forward in visual tutorial at python and statistics thanks i really need this type of tutorial i am studying cohort analysis and RFM analysis
@peaceandlove8862 Рік тому
Oceania is the continent that includes Australian and New Zealand.
@l7932 5 місяців тому
thanks sir
@donvious 7 місяців тому
hi, where is the link for the csv format document?
@rnjesus9950 10 місяців тому
This worked for me where df.corr() did not:
# Select numeric columns (excluding any non-numeric columns)
numeric_columns = df.select_dtypes(include=['float64', 'int64'])
# Calculate the correlation matrix
correlation_matrix = numeric_columns.corr()
correlation_matrix
@oluwanifemishittu9586 Місяць тому
where do i get the csv file from?
@chefernandez563 Рік тому ⁺¹
Am I the only one who knew Oceania was Australia, New Zealand, Samoa and those places😂😂
@r10053506 5 місяців тому
why is my program when running corr() is not automatically detecting numbers and runs into an error
@arpitmaheshwari122 11 місяців тому ⁺¹
hey, can anyone tell if the correlation command is working in vs code?
I'm getting a value error in this part.
please share the solution if you have one
thanks :)
@Shashankkundena 8 місяців тому
Hey, just use numeric_only = True
@harisahmed7833 9 днів тому
im getting this error on df.corr() "could not convert string to float: 'AFG'" plz help
@taroge5464 Рік тому ⁺¹
no explanation.................pd.set_option('display.float_format',lambda x : '%.2f' % x)
@osiomogieasekome8799 Рік тому
I couldn't get seaborn to import... I tried online solutions about installation but it didn't work
@SieanElpidama 6 місяців тому
my heatmap is broken its not showing all the values even if I wrote the annot = True anyone have a fix? i tried almost everything when I hit shift+tab
@ermano5586 Рік тому
I have one problem, which is that the table does not display columns starting from "area (km^2)" when we call "df" to view the table, I mean there is no scrollbar for horizontal data, can anyone help for this, please?
@ruchirmittal9207 11 місяців тому ⁺¹
Try another browser. Some browsers doesn't support that feature.
@sandipthepro Місяць тому
Unable to use groupby() in 'Continents' its showing an error: agg function failed [how->mean,dtype->object]
Plese help me with this solution anyone
@srijanrawat4014 Рік тому
i am having problem in downloading the file , can anyone help me out
@DatabaseAdministration 8 місяців тому ⁺²
It's funny american don't know the continent of australia.
@dishanbhandari 7 місяців тому
My heatmap doesn’t contain the data values inside them as in 14:18 instead it just shows a heatmap with column values as in the top most band. I have written the code just as shown above df.corr(numeric_only=True) as well as that ‘annot’ but still no data values. Pls Anyone help
@NyeinHtutSwe 6 місяців тому
i am also run into same problem :). I still cant find the solution
@jDub997D 5 місяців тому ⁺¹
upgrade your seaborn package
pip install seaborn --upgrade
restart your kernel and rerun all the boxes
@olaleyeboluwatife949 5 місяців тому
@@jDub997D 1000 thanks bruv... bless you
@BhaskarDial 5 місяців тому
corr_matrix = df.select_dtypes(include='number').corr()
# Then proceed with creating the heatmap
sns.heatmap(corr_matrix, annot=True)
plt.rcParams['figure.figsize'] = (20, 7)
plt.show()
I have used this code for heatmap but the notebook doesn't populate the heatmap with individual correlation values rather colored tiles only. please anyone can help?
@pixelsNpositivity 5 місяців тому
pip install --upgrade seaborn matplotlib
Update seaborn and matplotlib. It worked for me
@naagarhive6581 9 місяців тому
OOPs
@roshandhumal1193 Рік тому
Sir Alex.
I am Roshan Dattaram Dhumal
I live in India from Mumbai.
I want to start my career in data analysis but I don't know how to start and I want to know what steps you have to take to become Data analytics.
I would like to request you to please explain to us and give us some steps. Please sir I will definitely do hard work.
@hammadahmed7192 Рік тому
try passing numeric only argument. In recent version, default value of this argument has changed to false so it tries to correlate string values as well.
df.corr(numeric_only = True)
@Ben-qe8ju Рік тому
O-she-ana
@gogor8017 10 місяців тому
You said 'Oceania' so many times, now it sounds like meaningless word.
@aayushitrivedi3481 Рік тому ⁺²
first
pin me
@ermano5586 Рік тому
pin

Наступне

Автоматичне відтворення

Exploratory Data Analysis with Pandas Python