Complete Python Pandas Data Science Tutorial! (2024 Updated Edition)

Keith Galli

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 січ 2025

КОМЕНТАРІ • 188

@Kevin-cy2dr 6 місяців тому ⁺⁹¹
Back when the first iteration was released i was in college having no idea about what a dataframe is now I'm a developer and still watching your videos. Thanks Keith for being a part of my learning journey❤
@la-dev 5 місяців тому ⁺¹
I'm totally new to Python and learn the basics from the Corey Schafer. And now moved here to learn Pandas. I'm on right track? My goal is to become data engineer and then data scientist.
@Vitali-pe3wg 3 місяці тому ⁺¹
@@la-dev I have the same goal. I wanna become a data engineer. Maybe we can exchange our path and push each other?
@la-dev 3 місяці тому
@@Vitali-pe3wg sure Vitali, let's do it bro.
@la-dev 3 місяці тому
@@Vitali-pe3wg I'm still watching your way 🧐
@Vitali-pe3wg 3 місяці тому
@@la-dev how can we connect? Currently I am learning how to build a pipeline with alacbe airflow. The tool is amazing.
@adarshravindran9137 6 місяців тому ⁺¹³
00:01 Complete Python Pandas Data Science Tutorial
02:12 Setting up virtual environment for data science project
07:03 Exploring DataFrame Functions
09:26 Learn how to load CSV files in pandas
14:12 Accessing and filtering data in Pandas
16:31 Understanding data slicing and indexing in Pandas
21:08 Accessing and manipulating data in Pandas
23:18 Iterating through rows in Pandas can be done but may affect performance.
27:45 Advanced conditional filtering based on string operations
29:59 Filtering data using regular expressions in pandas
34:18 Adding and removing columns in Pandas data frame
36:34 Using inplace parameter in Pandas for modifying data in place
41:22 Extracting specific data fields from a Pandas dataframe
43:40 Convert date objects to datetime type for easy manipulation
48:14 Custom functions using Lambda for data manipulation
50:40 Merging and concatenating data at scale.
55:24 Data frame manipulation for filtering and combining data.
57:52 Merging data frames and handling null values
1:02:26 Handling missing data using pandas dropna method
1:04:44 Analyzing Olympic athlete data using Pandas in Python
1:09:03 Pivot tables convert data into a useful format.
1:11:32 Analyzing Popular Birthdates of Olympic Athletes in Python Pandas
1:16:54 Ranking heights of individuals using Python Pandas
1:19:16 Utilizing rolling functions in Pandas for cumulative sums and other calculations
1:24:22 Using specific data types in Pandas like string types within Pi Arrow can optimize performance at scale.
1:27:00 Using Pandas to filter and pivot data in Python
1:32:11 Explore Olympics dataset and pandas functionalities
1:33:52 Wrap up and thank viewers for watching
Crafted by Merlin AI.
@dabunnisher29 6 місяців тому ⁺³⁰
Your last pandas tutorial helped save me hours and hours of work. Don't ever forget that you are AWESOME!!!!
@caturdayvlogs 4 місяці тому
Can i have video link please?
@aflah7572 6 місяців тому ⁺¹⁰
Strongly resonating with another comment here
I recall watching your tutorials in my first year of college. I just graduated recently and became research software engineer. Your videos have been pivotal for all the stuff I've done :)
@KeithGalli 6 місяців тому ⁺³
Awesome stuff! Congrats on the new role. Keep up the good work 😎
@aflah7572 5 місяців тому ⁺¹
@@KeithGalli Thank You!!
@pierresorel28 6 місяців тому ⁺³⁴
People wait for new episodes on Netflix but legends wait for Keith's new tutorials 😎
@nithindinesh24 Місяць тому ⁺²
Love your aproach towards explaining things Keith. It feels like i am learning from my brother. Really friendly voice and approach you have there. Love from India :)
@KeithGalli Місяць тому
I appreciate the kind words! I'm very happy that you enjoyed the tutorial :)
@rishabh6339 28 днів тому ⁺³
19:15 - just use the following to reset the index -
coffee_data.index = coffee_data.reset_index(drop=True)
coffee_data.reset_index(drop=True, inplace=True)
@utkarshkapil 4 місяці тому ⁺³
Bro's content is still the best out there after 5 years
@masonhyde9411 6 місяців тому ⁺²
1:14:00 Yes this is true! I analyzed an Olympic dataset for a college final project, and we used the fact that the plurality of NHL players are born in Jan-March to pitch our analysis proposal.
@KeithGalli 6 місяців тому ⁺²
Cool to hear that you have validated this with data! 💯
@MingzeLee-z3j 2 місяці тому ⁺⁵
Bro I just finished watching your old video about panda, and then i checked this one out... wow you really look way more mature now hhahahahaha
@ravikumar-rw5rm 8 днів тому
It was brilliant content which you used in this video one more thing you have not used dummy data like other youtoubers but you used like CSV and real data environment. I love very much. Thank you so much for making this video. I appreciate it
@leucam375 Місяць тому
What a cool video, Keith! I was watching every single second of it, and practicing with my own dataset at the same time. Will recommend this to anyone who wants to revise Pandas after a long time of not using Python. Thanks very much
@peterh8970 Місяць тому ⁺¹
This is a brilliant tutorial, everything is explained really well and is so clear. Really helping me to get to grips with Python and Pandas, thanks a lot.
@RealBenBizman 6 місяців тому ⁺⁵
No way- I just watched your other video on this the other day! Crazy!
@mikhailbandurist8652 6 місяців тому ⁺³
It's an honour to me to be among the first viewers of this excellent tutorial!
@rodrigo100kk 6 місяців тому ⁺⁶
Absolutely amazing! A hint: make a Python Pandas Advanced Tutorial more focused on graphics.
@NewsChannel-y4g 6 місяців тому ⁺¹
Would love to have a follow up video on seaborn from this guy with these same csv files shown. the parquet and excel files do not seem to want to copy paste from the browser when you select raw
@skyeshwin 6 місяців тому ⁺⁴
Hey Keith! Big fan of your work! Keep it going brother!
@manasmathur173 Місяць тому ⁺²
1:07:58
The syntax used is no longer supported
coffee.groupby('Coffee Type').agg(
total_units_sold=('Units Sold', 'sum'),
price_mean=('new_price', 'mean')
)
this will be the new syntax
@symnshah 3 місяці тому
Your tutorials are always simple and straight forward, no here and there. Thumbs up.
@PhilosophyOfGreatests 19 днів тому
thanks man for this video. it's been a great help for me.
i really appreciate these kind of video lectures.
it has really improved my skill set.
thanks again.
@jaideepsingh870 5 місяців тому
this is honestly the best tutorials i have ever seen, really looking forward to new learnings
@udaynj 5 місяців тому
Awesome video, right speed and comprehensive. My thanks to you for taking the time to do this - am sure it was hours and hours of work and I truly appreciate your effort
@benjoanc 5 місяців тому ⁺²
I always love your content because of the ease of understanding ❤
I've been hearing alot of the polars library but there's limited content on it. Please if possible do something on it
@faugno-1516 4 місяці тому
I really appreciate your efforts , you are delivering such a best content related to python and its libraries. I saw your first dataset cleaning with pandas and i truly loved your live tutorial . Please come with more real word pandas dataset cleaning live tutorials which helps junior developer lime me a lot. Once again Thanks for sharing this type of content
@AI_Launchpad_community 18 днів тому
thank you keith that was so helpful I just watch 17 min now but it is pretty helpful thank you again
@ahillsavio5607 5 місяців тому ⁺¹
Good stuff man! Keep up the good work!
@bouallaguiali2906 5 місяців тому ⁺²
Well done Keith . Please do more videos about Data Analysis .
@alexng5056 3 місяці тому ⁺²
thanks a lot keith. your tutorial is superb!
@KeithGalli 3 місяці тому
Glad you enjoyed!
@rimpan1556 5 місяців тому
Great tutorial. You keep teaching new things all the time with practicao examples and speak just the exact amount not to make it boring. Good job. I wait for sklearn, np, matplotlib, sns, streamlit tutorials 😂
@MachineLearning-mv8zb 6 місяців тому ⁺¹
Great you're back!
@gaumeuvlog2603 5 місяців тому
Thanks for uploading new video about Pandas. I learn a lot from you. Can't wait to watch your next videos 🤩
@meeFaizul 6 місяців тому
Keith, your tutorial is a game-changer!
Your content is top-notch. Can't wait for more!
❤️ from 🇵🇰
@Hoan9duy 6 місяців тому ⁺²
Awesome content as always 🔥🔥🔥
@ayodejiisarinade857 6 місяців тому ⁺¹
You are doing a great job. Well-done
@olorunfemitunde-adedipe25 3 місяці тому ⁺¹
The King is back
@ben_tyler5 6 місяців тому ⁺²
Did anyone notice how our keith has been sneaking a quick peek to the right at the beginning in the last few videos? 😂 Seriously though, loving the content!"
@CesarSantosLopezYolo 6 місяців тому
Hey I love these vids... Keep them coming! Love from Mexico buddy
@VishnuChandran-zj7sq 5 місяців тому
Thank you for making this video. Keep rocking!
@francisco_ponce 5 місяців тому ⁺²
Me parece increible como hay gente capaz de almacenar tanta informacion, muchas gracias por el video!!
@corporate_guyfitness 4 місяці тому
Thanks Keith love from India it is really helpful to new learners like me
@asianpizzaguy3108 Місяць тому
Great tutorial sir🙌🏼
@Alberto-fj9il 2 місяці тому
Hey Keith, i just wanna say thanks i follow this video for 3 days getting with more complex examples and using your base ideas, is so powerful when you mix knowledges, i´ve been programming for less than a month but still can do some analysis, really appreciate it
@KeithGalli 2 місяці тому ⁺¹
Awesome! I like your approach, glad my video was helpful as part of it :)
@adjieaja23 6 місяців тому
i have been waiting for this. thank you teacher
@KeithGalli 6 місяців тому
You are very welcome!
@massimo5019 6 місяців тому
Just WOW. Great tutorial!
@KeithGalli 6 місяців тому
Glad you liked it!!
@sapienthought1103 3 місяці тому
i came from the older tuto not only this is updated but keith is way better at explaining now
@abdous-i8s 4 місяці тому ⁺¹
AWESOME VIDEO, BTW ITS SHOWING YOUR CHATGPT HISTORY 😄
@KeithGalli 4 місяці тому
Lol yeah I know. I thought about blurring it, but I figured people might get a kick out of seeing my chatgpt history xD.
@omsingh5525 6 місяців тому
Hey , Thanks for the amazing tutorial.
@JJGhostHunters 5 місяців тому
This is great content! Please make a similar tutorial or recommend one that relates to using vectorization via Numpy arrays. I have applications that do what I need them to do, but involve nested loops that iterate over millions of rows of data. I really need to move away from these loops to improve execution time.
@ahmedbadal3795 6 місяців тому
nice 2:00 pm course for me thanks alot
@ericwang5126 6 місяців тому
Amazing video!
@BluesAndWater 6 місяців тому ⁺¹
Muy bueno, gracias por todo!
Very good, thank you for everything!
@KeithGalli 6 місяців тому ⁺¹
¡Por supuesto! Estoy feliz que te gustó 🙂
@elizabethmcinerney4272 Місяць тому ⁺¹
I was able to go through all of you pandas instructions and understand everything. However, I cannot figure out how to install and use Visual Studio Code on my PC, and how to clone your git repository . Is there a resource you can recommend for these things?
@AstroidegitaTech 6 місяців тому ⁺¹
Well-done man
@FIBONACCIVEGA 6 місяців тому
good its the update of the old video . Excellent!!!
@Xamy- 3 місяці тому
He returns
@skyeshwin 6 місяців тому
At 30:40, the code for athletes that start and end with the same letter throws an error. Can anyone suggest me the correct solution? I tried str.extract but I can't include the na=False since it's throwing an error.
Wrong code - bios[bios["name"].str.contains(r'^(.).*\1$', na=False)]
Correct code -??
@KeithGalli 6 місяців тому ⁺¹
I just double checked and I see that a warning pops up (but it's not actually an error). You can ignore the warning. That being said, you might not see results because the names start with an uppercase letter & end with a lowercase letter. You can fix this by passing case=False into.your str.contains() method (see below)
Correct code:
start_end_same = bios[bios['name'].str.contains(r'^(.).*\1$', na=False, case=False)]
@skyeshwin 6 місяців тому
@@KeithGalli Hey thanks for the correction! One more thing I wanted to mention. At 48:22 when the result pops up, I can still see the rows whose height_cm is 'NaN', the height_category is showing to be 'Tall'. So I tweaked your code a little bit:
Existing code: bios['height_category'] = bios['height_cm'].apply(lambda x: 'Short' if x < 165 else ('Average' if x < 185 else 'Tall'))
New code: bios['height_category'] = bios['height_cm'].apply(lambda x: 'Short' if x < 165 else ('Average' if x < 185 else 'Tall' if x >= 185 else 'NA'))
This will show those rows whose height_cm have no information(NaN), the corresponding height_category to be 'NA'.
Similarly, this issue occurs again at 50:29.
My intent is not to pinpoint your mistakes but just to educate anyone who's a newbie to Python!
Love your work always!
@SyedAbdulrazak-h8e 5 місяців тому
use metioned in 15:00 minutes of the video, press control and enter for changing sample that is random . i tried it in my pycharm but it did not work what should i do for this ?
@KeithGalli 5 місяців тому
When I said ctrl + enter, within a Jupyter notebook that just re-runs my current code cell thus producing a new sample row from the dataframe. In your pycharm editor you should be able to just re-run your code and if you print out the sample, you'll see it change.
@SyedAbdulrazak-h8e 5 місяців тому
@@KeithGallii am glad u repiled thank u .
@AtiqurRahman-x2i 6 місяців тому
Hey Keith, it was a nice promo. From Bangladesh 🇧🇩
@KeithGalli 6 місяців тому
Glad you liked the promo!!
@JeffGardiner-y8r 17 днів тому ⁺²
What IDE does Keith Use?
@Architect-u2g 14 днів тому
Looks like vs code with a notebook extension. Sorry that you waited pretty long.
@ayushsingh7759 3 місяці тому
Great Vid well done
@stu8924 6 місяців тому
Brilliant, thank you.
@msbeau5341 5 місяців тому ⁺¹
What did he say we should click to get copilot to come out please? I am using windows
@random-drops 6 місяців тому ⁺¹
Thanks. While watching your introduction, I start to wonder if you're going to do a video on NumPy, especially when a major version has released. No hurry, please take your time. Thanks in advance.
@KeithGalli 6 місяців тому ⁺⁴
Good suggestion. I need to do some more research into the new release, but an updated NumPy video is definitely a possibility!
@NewsChannel-y4g 6 місяців тому
@@KeithGalli dude this video was exactly what i was looking for as someone relatively new to python trying to get into data science. NumPy and Seaborn would be good follow up videos if you used the same data. The CSV files seemed to copy paste well from the browse but the parquet and excel did not want to and made me load as a .txt at that point i just crossed my fingers hoping you would use the csv and 20 mins in so far you have great video so far. excellent focus on detail great beginner level examples and functions...tried datawars and datacamp before coming here...thank you truely..
@KeithGalli 6 місяців тому
@@NewsChannel-y4g Happy to hear that!! Yeah I think that because Excel & Parquet files aren't human readable in their raw form, it doesn't let you copy & paste the URL in the same way as CSV. It is a good test to be able to read those files though, so I recommend that you try downloading them (there's a download raw file button on Github) and then reading them in locally with your code. You'll probably want to move the files from your downloads folder to the same location as your notebook file and then you should be able to load it in with a command like pd.read_excel('./olympics-data.xlsx') & pd.read_parquet('./results.parquet') respectively. That being said, I plan to continue using CSV files in most of my videos so you should be fine with the method you have been using. Not sure if I'll use the same data, but I hope to do some videos that incorporate NumPy & Seaborn in the not-so-distant future. Keep up the good work!
@tristoneyang1255 5 місяців тому
very helpful, thx K.
@Divyansh-n3h 5 місяців тому ⁺¹
continue from filtering data 24:12
@asfasdfsd8476 6 місяців тому ⁺¹
Bro I got a job after your first video!
@KeithGalli 6 місяців тому
That's awesome!! Nice work 💪
@jeeboi347 27 днів тому
Amazing, thankssss
Also, just curious, why do you type dd in some of the code cells?
@KeithGalli 26 днів тому
So if you're escaped from the code cell and you type "dd" it will delete the code cell. So the reason you saw me type it in the cell sometimes was because I didn't first escape from the cell (esc key) and as a result the text was entered into the cell.
@rushikeshkharat4022 5 місяців тому ⁺¹
I was asked in an interview - how to import multiple files at once in pandas instead of importing files one by one if there are so many files. Is there a quicker way? how to accomplish that in pandas?
@GriffithVader 4 місяці тому
did you learn
@rushikeshkharat4022 4 місяці тому ⁺¹
@@GriffithVader no but i guess it might be done using some loop or maybe if we can import a folder itself..just like in powerbi
@ayeshavlogsfun 5 днів тому
To import multiple files at once in pandas instead of doing it one by one, you can use Python's glob module to get all the file paths matching a pattern, then iterate through them to read them into pandas DataFrames. Here's an example:
Code Example
import pandas as pd
import glob
# Specify the folder path and file extension (e.g., CSV files)
file_paths = glob.glob("path/to/your/files/*.csv")
# Read all files into a list of DataFrames
dataframes = [pd.read_csv(file) for file in file_paths]
# Optionally, combine all DataFrames into one
combined_df = pd.concat(dataframes, ignore_index=True)
# Display the combined DataFrame
print(combined_df)
Key Steps:
1. Use glob.glob() to find all files matching a pattern in a directory.
2. Use a list comprehension to load each file into a pandas DataFrame.
3. Optionally, combine the DataFrames using pd.concat() for a single dataset.
@eu_dz8684 6 місяців тому
Could you please tell me how to teach pandas after this course, what topics should be covered and what's the best way to teach that?
@djangoworldwide7925 20 днів тому
Keith youre ao great.
@aleksandrajovanovic2631 5 місяців тому
how to split dataframe for example i want dataframe for every sport or country great video :)
@KumR 5 місяців тому
Very Nice Mr. Galli. Can u pl do one in polars too???
@lifewithrahi_inuk 6 місяців тому
Amazing!
@KeithGalli 6 місяців тому
🤠
@Sayied-s7d 6 місяців тому
pointers 38:16
@DanielValenzuelaPerez 6 місяців тому
🔥 Thanks!
@shreyalalit1460 6 місяців тому
Awesome!
@crystalkishore4974 5 місяців тому
Thanks Man ❤
@lord_voldemort44 6 місяців тому
awesome video
@ObinnaWGMI 4 місяці тому
Would've been nice to have mentioned the shortcuts you used
@aidafirouzyar Місяць тому
Why did he reduce the fee in Tonkeeper, but the tokens were not transferred, it just hit you, and the support does not answer, what should I do?
@karangoyal8646 6 місяців тому
Great work bro !! where do you live in boston. I am from boston too
@ramarisonandry8571 6 місяців тому
Love from Madagascar
@alisher.m 5 місяців тому
Can you release polars course?
@chiragsoni6990 6 місяців тому
bro axis 0 is horizontal frame and axis 1 is vertical frames but the function works when applied vertically by using axis 1 which is weird but thats how it works i guess
@yashkeshorts343 День тому
21:11
@GenZdev 6 місяців тому
would like to refresh numpy too with you
@RoshanPradhan2 26 днів тому
15:13 -> sample()
@abhinavawasthi1730 5 місяців тому
from where can i download the csv file for practice?
@Wilson5150Wilson 5 місяців тому
what is this workstation called? It seems ideal for experimentation. I'm currently using VS Code and can't test individual lines like you are. Or maybe you can in VS Code, I'm jut new!
@KeithGalli 5 місяців тому ⁺¹
Make sure you use the ".ipynb" file extension and then in VSCode you will need to install the "Jupyter" extension. Hope this helps you get set up!
@bravocortez4087 2 місяці тому
How do I clear out bad data in a string? dollar signs and spaces?
@KeithGalli 2 місяці тому ⁺¹
One way to do that would be the following line:
# Strip out "$" and spaces, then update the column (replace 'price' with whatever your column is named)
df['price'] = df['price'].str.replace(r'[$ ]', '', regex=True)
@Rayuzimaki Місяць тому
how did u create the virtual enviroment on mac i typed the same thing u did "python3 -m virtualenv ~/Envs/tutorial" and got this error " No module named virtualenv"
@SirLeafALot Місяць тому
You may have to install the virtualenv library
@SirLeafALot Місяць тому
If you use pip. Then try pip install virtualenv
@mandy6622 4 місяці тому
Keith please make a tutorial on pyspark
@Sayied-s7d 6 місяців тому
46:44 where is the chear sheet did anyone know
@KeithGalli 6 місяців тому
I got you! here is the cheat sheet: strftime.org/ (this link can also be found in the video description)
@Sayied-s7d 6 місяців тому
@@KeithGalli thanks
@tobibaby 4 місяці тому
Köszönöm a videót.
@tanjumraisa-id4de 5 місяців тому
why i couldn't download data raw file..kindly say? thats why im stuck...
@rodrigo100kk 6 місяців тому
1:16:15 - This was 20% increase, not 120% increase.
@KeithGalli 6 місяців тому
Good catch. It was 120% of the previous day, which is a 20% increase :).
@thunde7226 6 місяців тому
Great video keith.......:) bye
@TheWayHome-wb1uh 6 місяців тому
Can you do a course on langchain?
@KeithGalli 6 місяців тому ⁺¹
Not a dedicated course, but here's a video I did using Langchain in a real-world project:
ua-cam.com/video/MeyVptCRubI/v-deo.htmlsi=CeGcaKvG6eSAbGpg
@TheWayHome-wb1uh 6 місяців тому
@@KeithGalli Thank you for responding. I did go through this and it was pretty cool. Just wanted to know if you might consider a full tutorial on langchains and LLMs.
Also, love your channel.
@AbdulVajid-fz3vs 6 місяців тому
Please upload an end to end machine learning project
@KeithGalli 6 місяців тому
I recommend checking out this video:
ua-cam.com/video/MeyVptCRubI/v-deo.htmlsi=RqO--khHDJdNRI0a
A real-world project (an actual consulting project of mine) that you can follow along with that uses LLMs.
@NormieDead 6 місяців тому
dude just woke up to gave another PANDA
@KeithGalli 6 місяців тому
😎

Наступне

Автоматичне відтворення

Build Awesome Web Apps & Dashboards with Python! (Full Shiny for Python Course)