25 Nooby Pandas Coding Mistakes You Should NEVER make.
Вставка
- Опубліковано 6 вер 2022
- In this video I go over my list of 25 mistakes commonly made my beginners learning pandas in python. Pandas is a great tool, but there are some pitfalls to avoid!
Shoutout to mCoding who inpired the idea for this video! / mcodingwithjamesmurphy
Follow me on twitch for live coding streams: / medallionstallion_
My other videos:
Speed Up Your Pandas Code: • Make Your Pandas Code ...
Intro to Pandas video: • A Gentle Introduction ...
Exploratory Data Analysis Video: • Exploratory Data Analy...
Working with Audio data in Python: • Audio Data Processing ...
Efficient Pandas Dataframes: • Speed Up Your Pandas D...
* UA-cam: youtube.com/@robmulla?sub_con...
* Discord: / discord
* Twitch: / medallionstallion_
* Twitter: / rob_mulla
* Kaggle: www.kaggle.com/robikscube
#python #pandas #datascience
I need to implement the chaining methods and using functions into what I do, much easier to use and read. Great video as always.
Totally. Just those two things alone are huge! Glad you enjoyed the video.
Usually these videos address REALLY nooby mistakes that any general programmer already avoids. THIS video however ACTUALLY addresses library FUNCTIONALITY and discusses the tools that a programmer may be unaware of to increase readability and efficiency. Rob, my good sir, you just earned a sub.
So happy to have you as a sub. Even happier to read such a kind comment. Looking forward to more videos like this in the future.
I couldn't have said that better myself. I am self taught and definitely learned new tricks here. You have also earned my sub!
Same! I had no idea many of these even existed!!
@@robmulla I am a fairly experienced programmer, not so much with Python, but I have a few things I might want to use Pandas for at some point and this has given me a bit of a taste for features that I look forward to trying.
00:18 #1. Writing into csv with unnecessary index
00:53 #2. Using column names which include spaces
01:25 #3. Filter dataset like a PRO with QUERY method
01:44 #4. query strings with(@ symbol) to easily reach variables
02:07 #5. "inplace" method could be removed in future versions, better explicitly overwrite modifications
02:35 #6. better Vectorization instead of iteration
03:01 #7. Vectorization method are preferable than Apply method
03:30 #8. df.copy() method
04:08 #9. chaining formulas is better than creating many intermediate dataframes
04:28 #10. properly set column dtypes
05:01 #11. using Boolean instead of Strings
05:25 #12. pandas plot method instead of matplotlib import
05:45 #13. pandas str.upper() instead apply and etc
06:10 #14. use data pipeline once instead of repeating many times
06:41 #15. learn proper way of renaming columns
06:59 #16. learn proper way of grouping values
07:31 #17. proper way of complex grouping values
08:01 #18. percent_change or difference now could be implemend with function
08:25 #19. save time and space with large datasets with pickle,parquet,feather formats
08:58 #20. conditional format in pandas(like in Microsoft Excel)
09:22 #21. use suffixes while merging TWO dataframes
09:48 #22. check merging is success with validation
10:13 #23. wrapping expression so they are readable
10:33 #24. categorical datatypes use less space
10:55 #25. duplicating columns after concatenating, code snippet
Thanks for making this!
@@robmulla i wish i commented better as English is not my native language, Thank You for bringing us Valuable Tutorials that saves us our time and energy! I wish i helped and learned from you more
egg bro
thanks, I like no 4
This needs to be pinned
Please keep doing this. No additional jargon, crisp, straight to the point explanations are what are required. No body needs a 10 hour tutorial. Thank you for this.
I'll try my best! I do like trying to cram a ton of information into a short format, but these videos take a while to create. I totally copied the format from mmcoding (check out the channel if you haven't already)
I thought I was pretty good in Pandas, but you gave me so many new things to improve. HUGE thank you!
Glad I could help! I'm constantly learning better ways to do things in pandas myself.
I was thinking that I was pretty bad, but surprisingly I usually only make 2 mistakes from the video (which is a cool chance to improve). I just love such videos because not only they help to improve your skills, but also to be realistic about your expectations and ambitions. Thanks for the video, Rob!
Matt Harrison's "Effective Pandas: Patterns for Data Manipulation" is one of the best resources I've read on idiomatic pandas.
I really need to get myself a copy! He knows his stuff for sure.
He has a great video (series?) on effective pandas also!
ty i will look into this book
I have been working 2 years now with pandas and I can strongly affirm that I have made like 70% of those bad practices, appreciate a lot your video!
Thanks for commenting. Honestly I still make many of them to this day.
The pandas query function does not outperform the loc method. In fact, it is sometimes much slower when your query/data is so big. We industry users will utilize the loc method for quick EDA. Query might be useful when you have a scheduled cron
Yea. Query isn’t for speed of processing but speed of writing the code.
Rob, thank you for all the time and energy you have put in for us. Would appreciate an updated video on "Exploratory Data Analysis" may be expanding on your year old one. Thank you again!
This can be, some of my first times commenting in youtube after years of usage. This video was INCREDIBLY USEFUL! There's a lot of my previous team members did on scripts and sometimes are complicated to maintain or create new ones following the same logic. This covers exactly what they used and what is the best option to rewrite it and make it more understandable.
Thank you so much for this godly information.
You're very welcome! I really appreciate the positive feedback. I’ll try to keep making helpful videos like this. Share with your friends in the meantime!
I started to watch your videos recently, and from now on I'm doing the chaining and putting each function in "one row" to make the data cleaner, and also, the query method, so powerful and simple, I was used to replicate the dataframe with the column and value searched to filter my df. You are boosting my studies!
Thanks for that!
I can't believe how good this video is. I love your no-nonsense delivery; I don't have time at work to watch a 4-hour "intro" video. Keep it up!
Learned more about Pandas in this video than a whole many videos worth hours combined. Seriously, thank you.
Rob, as always, fantastic video. I have to admit, i get caught on some of those mistakes so it is great to have you point out and make suggestions on how to correct them. Thanks for sharing. Much appreciated.
I fall into these a lot too! We can all get better, glad you found the video helpful.
oh wow the quality and clarity is worth subscribing! thank you !
Wow dude! You are single handedly responsible for my data science growth. PLEASE keep making more of these videos I really appreciate it.
Wow! I love hearing feedback like this. I'll keep making videos if you all keep watching! :D
Rob, amazing video and intuitive. Happy to subscribe!
This video is amazing, I am using pandas for a long time now and still learned so many new good practices thank you
One of the best videos I've seen on Pandas! So glad someone prominent enough is advocating for method chaining and pandas methods!
The 'Query' method in particular is relatively unknown. In conjunction with not using 'snake case' this leads to beginners being very inefficient at code due to not being able to use dot syntax
I am just an intermediate level so I can relate to many of these mistakes. It goes as deep as university however. They do not teach clean, efficient code at all!
Glad you enjoyed it! I confess I don't use chaining nearly as much as I should.
I'm currently working on my first major pandas project and I reckon that I may have done around 15/25 of these 'mistakes'. Looks like I have some optimisation to do over the coming days!
We all have to start somewhere. I didn't learn many of these until I had been using pandas for years.
I feel personally attacked. Thanks so much for releasing this. I knew my code was bad, but not THIS bad.
Haha. With coding we all are learning and getting better every day. Me included. Thanks for watching!
These are fantastic refactoring suggestions.
I can't believe I watched this whole video and only 2 of them were things I didn't know about! Thank you for sharing!
Awesome stuff. I've been using pandas for over 4 years, but it never occurred me to start using the query method instead of loc (despite me finding it tiresome to keep repeating "df" all over the place when using loc).
I also appreciate the quick format. You see UA-camrs taking too long to say nothing at all, so congrats on actually going through 25 tips in 10 minutes. You got yourself a sub!
Simple Application Enter your details
ua-cam.com/video/DWn5-Ej8R-M/v-deo.html
This was great! Just what I needed :)
Great video. Thank you for being so direct and giving us valuable tips ☺
Glad you liked it! Thanks for giving feedback. Share the video with anyone else you think might also like it.
This is awesome, I’ve been wanting to know what are the better ways to write my code and why. Please continue to make these videos.
Wow! Thanks so much Emily. Really apprecaite the feedback and super thanks!
Thanks, great tips! I've been using pandas for years, and I've only recently started using some of these (particularly query, and didn't know about the @ operator)
Glad it was helpful! The @ operator is really useful. You can also do stuff like min() or or apply operations between columns within the query.
Another awesome, useful video, Rob. Thank you.
Thanks for watching Deepak!
1:28 Before I discovered your videos, I'd never considered using the query method. The examples I've previously seen online made it look like a me-too add-on for seasoned SQL users. Using conditionals to mask off rows seemed just as easy and more pythonic. Also, at work, I typically filter with a script when I pull down the data, so by the time I get the data into pandas, I just need to tweak. But, you've shown me the light. Thanks!
I totally understand where you are coming from. Its important to keep in mind query can be slower, but for quick filtering it can be really quick and clean way to filter data. It really depends on what I'm doing. Glad I showed you something new though!
Very useful! Thank you for sharing in such an easy and agile way.
Hey! Glad you learned something. Appreciate the feedback!
Excellent points! Learned new stuff that a lot of tutorials don't explicitly teach.
Glad it was helpful! Thanks for watching and please share with others.
Hey, Rob! Super video this one. I myself am Sr. DS working each day intensively with pandas, I will implement many of the tips you show! Thanks a million :)
Awesome to hear! I'm still learning new tricks with pandas every day.
Great video. Very helpful. Please keep making more like this
Appreciate that. I plan to!
I used the Pandas lib more then 2 years, but today I learned something new! Thank you, man!
Glad you learned something new! Share with anyone else you think might appreciate it!
I didn't know about suffixes. Amazing!
Thanks Ken, glad I you were able to learn something new! Love your videos.
Really enjoyed how fast this content came. I felt like it was a great speed to keep me engaged. I usually find these types of videos boring.
Love it. Thank you!
This video made me realize i have still a long road ahead in Pandas. Thanks! Just subscribed ;D
Thanks for the sub! We all start somewhere, but you'll pick it up quickly in no time.
Thank you. This video was helpful.
I'm an experienced developer looking to get familiar with Pandas. I found this video very valuable.
found your channels few days ago and man you have some epic content . The noob mistakes here are the exact way most tutorials teach you..just wondering why the hell the non noob ways are not taught as they are easier and shorter and the syntax makes more sense... thank you for this video
Glad you like them! I’m trying to continue to make more stuff like this so keep watching!
Very useful video, thank you for making this !
Glad it was helpful! Share it with anyone you think might also benefit.
Thanks for this great tutorial!!
Great insights, thanks for these important tips
Glad you found them helpful. Share it somewhere on social you think people might learn from!
lots of good info! thank you!
Glad you learned from it!
Super useful! Thanks a lot, mate!
Thanks for watching. Please share with someone you think might also like it.
This video rocked me. I've been using python for a few months and watching this video made me bust out my laptop so I could try all of these items out. Thank you for this.
So glad you found it helpful. Share with a friend!
The space need to be avoid part is so true! But wait a second, every time I face the space but not underscore is from others data, so I think what we actually need is how to deal with the space condition.(Which is a pain of journey)
Maybe rename all the columns with versions without a space. Like, you replace all the spaces with an underscore. df.rename can take dictionaries or even a mapper function so this is easy to do. Using a dictionary is preferable as you can just reverse map it, if you want to use the columns with spaces in them in the end.
Good point. In most cases to can be done with a list comprehension one liner!
This is amazing! Thanks a lot.
Glad you like it!
Learned tons with this. Short and succinct. New subscriber.
Thanks for subscribing!
I'm new to Pandas and all tips from this video are gold for me, thank you a lot!
Glad you learned something new. Welcome to the world of pandas!
+1000. I’m brand new to Pandas and still trying to grok the idiom. This video is GOLD.
Great video as always. I will start exploring query method more.
Rob, Can you please make a video on how feature engineering, especially how to create new features using aggregation etc. Thank you
Glad you enjoyed the video. Feature engineering would be a good topic for a future video. I'll add it to the list!
Oh man, that guide is pro! Thanks, gonna apply all of that when refactoring my project!
Glad it helped! Tell a friend!
Very illuminating video! I learned a lot quickly.
Thanks for the feedback Daniel!
I loved this to s be to my students. You did a great job in a short video!
Thank you so much! It's hard to make it short but is worth it in the end.
Thank you for creating such an amazing video on pandas. It has even been really helpful for me as a pandas new bee. Leanrt a lot! 🎉
Love it!
Awesome video! I work with Pandas for +3 years and learned a lot here! Thanks
Happy to hear it. Tell your friends!
Oh god. I clicked on this video just to confirm that this is one more overly exaggerated self-confident dude trying to teach newbies of 2 weeks experience.
After watching this, this is god damn life changing. As an engineer focusing on fluid dynamics and floater response, I use pandas daily basis. Out of 25, I didn’t know approximately 20. Every single person who has any plan to use pandas must watch this.
Awesome!
OMG! I had to rest after first 10. So huge dose of information. Thanks.
Found lots of favorite annoyances and learned a few new tricks! I'll add a shout-out to the ".pipe()" method to allow for wrapping all your transforms in a single statement when a single .method can't cover the required transform. An added bonus of "pipe()" - since it's using user defined functions to do the transforms, you can add decorators to automatically print out metadata on the resulting transform steps to get a quick insight into potential bugs.
Oh. Great one. I forgot to add pipe and assign in this video but wish I did.
This is really useful, thank you!
Glad you found it useful, Juan!
Great video. Lots of operations and procedures that are helpful for effective coding. Would be really helpful to have a cheat sheet linked for easy reference.
some great tips here. i usually chain with \ and i didn't know a query method exists!!
guess you learn everything new all the time!
Glad you learned something new! Cheers.
Hi, I love your videos!!!
Can you please make a video on how to handle missing values and outliers?
Great suggestion! I did have a whole video on this topic on Abhishek Thakur's channel. Check it out here: ua-cam.com/video/EYySNJU8qR0/v-deo.html
Amazing Tips, Many thanks.
Dude, Amazing video apparently clear the concept.
Glad you think so! Share with your friends!
Nice video! I have been using pandas for years and still run into these issues :)
Thanks! Glad you enjoyed the video. I really enjoy your videos too.
Great video, thank you!!
Glad you liked it! Share witha friend!
Great tips!
Thank you! The .diff method is a lifesaver when computing velocities. The advice on not using inplace is excellent i got into various troubles because of it but i thought that's what the "experienced guys" do.
Thanks for watching. inplace is very tricky. Diff method is really powerful, and there are parameters you can use within it depending on your use case.
Really helpful tips, thanks for video
Thanks for watching!
did you ever tried to use np.vectorize function to apply transformations over a df column? that one is along with my favorites.
amazing video btw, subscribed!
Yes! I've used it before and had some good results. Thanks for watching!
Thank you. Legend
Dude I've worked with pandas for 7 years and learned some new tricks, thanks a lot!
Great to hear! You've been working with it longer than I have. Please share my channel with any friends you think might also learn from it.
Thanks, really helpful
Thanks!
Great video! I also like the jazz bass behind you, I also play bass :)
Awesome! I’m more of a guitar player but I also enjoy playing bass.
Dear Rob,
I'm a total beginner in Python and Pandas. From what I understand, the warning at 3:30 is not about making a copy of sliced data, but rather about not using the .loc method and using "direct assignment" for columns (or whatever it's called). I could be wrong, but this is what I've gathered from reading the documentation and encountering a similar warning in my code.
Thanks for your valuable content. It has been a great help
Hey Rob, great video as usual. Can you tell me why using inplace = True is a bad idea? In R I often use the compound pipe operator %% from the magrittr package which is effectively the same as Pandas' inplace parameter. Is there a reason I shouldn't be doing this?
Hey Jared. Great question. Check out this article.. towardsdatascience.com/why-you-should-probably-never-use-pandas-inplace-true-9f9f211849e4
I'm so guilty of number 8! Thank you for this!
I’ve made every one of these mistakes at some point so I know how you feel. Thanks for watching!
Extremely underrated channel Extremely helpful
Thanks Nikhhilil!
Wow, very useful - a true "tour de force" for better Pandas code. THX for this !
Glad it was helpful! Please consider sharing it with anyone else you think would benefit from watching.
Thanks, lots of good info.
Glad it was helpful!
Oi! There were several of those I didn't know. I wouldn't have thought I was a noob, but I guess we all have a bit of that in us. Thanks for the video!
Glad you learned something new. I find I’m always learning something new with python and data science. That’s why I love it so much.
I really think this should be written up in a medium blog article. Would be awesome to refer to.
That’s a good idea. I really want to make blogs for all my videos but I don’t have the time. Maybe someday
I was genuinely worried I was making noob mistakes in Pandas...
😂 Hey Alex! Now I'm dying to know... did you have any reason to be worried?
At 6:23 (#14) you're returning the dataframe, but you're also modifying it in place. Having a return there gives the impression that the original dataframe isn't modified, specially if you also assign it to itself later.
It ties back to #5.
I didn't know about the .query neither the parenthesis for the chaining. Awesome video
What is it with the \ on a chaining example you showed?
Thanks! Glad it helped. \ let’s you split lines for the same code.
So good. Glad I subscribed.
Glad you watched!
Awesome content, I'm an aspiring data scientist, very useful content. Like your jupyter note theme by the way, which one is it?
Thanks. I have a whole video on my jupyter setup. But it’s jupyterlab with the solarized dark theme.
Great vid thanks!
Glad you liked it!
As a beginner this video made me learn some basic concept about pandas. thanks
Great video. Subscribed
Thanks for the sub!
the last 5 were cool! thank you
Glad you found them helpful.
Good Work!
Thank you! Cheers!
So good. Thank you.
Thanks for watching!
Very helpful !
Glad you found it helpful.
Wonderful, this video is super helpful
Glad you think so!
@@robmulla videos like these actually helps a data scientist’s sword to be sharper.
Thank you for sharing it, hope to look for more advanced videos like these in future.
wow this was very helpful thanks!
Great video!!!
Glad you liked it!
This is excellent thanks
You're very welcome