A Gentle Introduction to Pandas Data Analysis (on Kaggle)

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 157

  • @soumyadrip
    @soumyadrip 2 роки тому +36

    Timestamps:
    Introduction: 0:00
    Pandas data structure: Series 2:44
    Pandas data structure: Series 5:04
    Reading in data: 10:01
    Inspect The Data: 12:17
    Columns and Rows: 15:25
    Subsetting Data: 19:00
    Casting dtypes: 25:45
    Creating new column: 29:00
    Adding new Row: 31:19
    Plot Examples: 33:05
    Save our output: 36:49

    • @robmulla
      @robmulla  2 роки тому +4

      You are awesome @somuSan! Thanks so much for tagging these.

    • @soumyadrip
      @soumyadrip 2 роки тому +6

      @@robmulla Nah no problem 😅, you are the awesome here, creating such a great end to end tutorial.

  • @MohammadMamun-o7y
    @MohammadMamun-o7y 8 місяців тому +1

    I have been using pandas quite a while,learned a lot from your video.
    Your presentation was very good to follow through.
    Thanks for your hard work.

  • @bashar9200
    @bashar9200 Рік тому +1

    Thank you!! I never get tired of your videos!!!

    • @robmulla
      @robmulla  Рік тому

      I appreciate you watching them!!

  • @Cpt_Diabetes
    @Cpt_Diabetes 9 місяців тому

    absolutely loved this video, one of the best walkthroughs out right now

  • @turboblitz4587
    @turboblitz4587 Рік тому +1

    Hey I think this is a great tutorial for actual pandas beginners. I have been using pandas for quite some time now, and still learned something. However, an advanced Video would be super great, because I see you doing all sorts of super advanced stuff in your coding videos that blow my mind and that I don't understand! Cheers

    • @robmulla
      @robmulla  Рік тому +1

      Thanks for watching. I’m glad you think it’s a good beginners! I need to think through what I could include for an intermediate/advanced tutorial. My pandas noob video has some good examples/tips if you haven’t already checked that out.

    • @turboblitz4587
      @turboblitz4587 Рік тому

      @@robmulla Hey thanks :)
      No I think I didnt watch it gonna check it out

    • @turboblitz4587
      @turboblitz4587 Рік тому +1

      @@robmulla I checked out the video and yes, I have learned a lot in this one! Also in the timeseries forecasting video. Now I feel like a noob haha

  • @wwpharmacist
    @wwpharmacist Рік тому +1

    For sure I enjoy this tutorial

  • @dennisalbarello2551
    @dennisalbarello2551 11 місяців тому

    Great video! Interesting way to make a Pandas Introduction. Thank you a lot to share with us your knowlegde!

  • @shahbazsaeed7090
    @shahbazsaeed7090 Рік тому

    please start a series of ML from scratch like linear regression, logistic regression, decision tree, random forest, k means clustering with python please sir please ..... i m really understanding your teaching style as you teach through hands on

  • @zhiyang2553
    @zhiyang2553 Рік тому

    excellent video! amazing introduction! thank you a lot !

  • @NAC79
    @NAC79 2 роки тому +2

    Going on 1 full week of trying to change the font between Jupyter now Kaggle. Idk why I'm obsessed with it

    • @robmulla
      @robmulla  2 роки тому

      Did you figure it out? Have you watched my tutorial about jupyter/python notebooks?

    • @NAC79
      @NAC79 2 роки тому +1

      @@robmulla I have not. Adding it to the que now. Thanks!

  • @moniquebrasilbaptista1989
    @moniquebrasilbaptista1989 Рік тому

    Amazing. Thank you!

  • @caiyu538
    @caiyu538 Рік тому +1

    thumb up. Thumb up

  • @wiscgaloot
    @wiscgaloot Рік тому +1

    For the last few weeks I've been struggling to learn how to do with Python what I do in 10 minutes with Excel. And a few months ago I spent half a day figuring out most of the steps to do the same analysis in Matlab. Dataframes are ridiculously difficult to deal with I'm learning it because several colleagues said that if I'm looking for a new systems engineering job, I should know Python. Numpy is easy. Pandas dataframes are obscure and clunky.

    • @robmulla
      @robmulla  Рік тому +1

      Interesting to hear you actually find numpy easier to work with than dataframes. I find the opposite because dataframes have an index and column names. It might just take some time and then it will click for you, or maybe you will find something else that works better for your use case. Hope the video was helpful and good luck on your journey.

  • @marlowisws
    @marlowisws 2 роки тому +1

    Would be helpful to mention Tab kicks off the auto-complete for us n00bs

    • @robmulla
      @robmulla  2 роки тому

      Good point. Also shift-tab for docstrings!

  • @CarolinaMunoz-vy3ni
    @CarolinaMunoz-vy3ni Рік тому +1

    Hello Rob, Can you help me with this error, please? I can't figure out what i did wrong. Thanks a lot :)
    df['likeCount'] = pd.to_numeric(df['likeCount'].astype('str'))
    TypeError: 'Series' object is not callable

    • @robmulla
      @robmulla  Рік тому +1

      Strange. Don’t know why it isn’t working. Do you need to set the type to string? Try just dunking to numeric on the column without astype(‘str’)

    • @CarolinaMunoz-vy3ni
      @CarolinaMunoz-vy3ni Рік тому +1

      @@robmulla Sorry again, I have a question, one more time. What is the best way to work with numbers with decimals, for example, I have a field that has decimal numbers, but I want to convert it to an int.

  • @average-jojo-enjoyer
    @average-jojo-enjoyer Рік тому +1

    My time stamps still not working -_-

    • @robmulla
      @robmulla  Рік тому

      Oh man. I’m sorry. That was really frustrating when it happened to me.

  • @ilyesb2271
    @ilyesb2271 2 роки тому +8

    Thank's for everything you share with us

    • @robmulla
      @robmulla  2 роки тому +1

      My pleasure! I apprecaite the positive feedback!

  • @vaporr5929
    @vaporr5929 Рік тому +2

    At minute 12:40, I believe you meant first "5 rows" and *not* first "5 columns"

    • @robmulla
      @robmulla  Рік тому

      Ah! Good catch. Maybe I did it on purpose to make sure you were paying attention 😏

    • @vaporr5929
      @vaporr5929 Рік тому +1

      @@robmulla I truly appreciate the learning content you provide to the masses on your channel. Looking forward to more tutorials on Pandas, liked and subscribed!

  • @brandoncyoung
    @brandoncyoung Рік тому +3

    just getting into DS and kaggleis great the notebook is so easy to use access to datasets is great. thanks for sharing!

  • @speedyg2295
    @speedyg2295 Рік тому +1

    What is I have an Excel or csv that has 999 rows but I want to delete 101 thru 999 rows? or Just read the first 100 rows only. Then save that as a new DataFrame as only those 100 rows.

    • @robmulla
      @robmulla  Рік тому

      .head(100) will give you just the first 100 rows.

    • @speedyg2295
      @speedyg2295 Рік тому

      @@robmulla Thanks for the info

  • @wesleyweel8007
    @wesleyweel8007 Рік тому +7

    Excellent video, after 2 decades of experience in the software industry I cannot stress more how important it is to have a strong foundation in the basics inorder to attempt something more advanced. Wonderful job

    • @robmulla
      @robmulla  Рік тому

      Really appreciate your endorsement! I completely agree about knowing the basics first.

    • @utica2burn
      @utica2burn Рік тому

      Interesting comment! I have been using python and pandas for years and thought to myself today I should go back and check the basics - so here I am!
      I find myself still googling every time I have to do simple stuff like change column order or make a copy of a data frame.

  • @ibowman_UCLA_BRAIN
    @ibowman_UCLA_BRAIN Рік тому +4

    This video deserves to be the most liked Gentle Introduction to Pandas Data Analysis on UA-cam.

    • @robmulla
      @robmulla  Рік тому +2

      Really apprecaite that feedback. Share it with all your friends and maybe we can make it the most liked!

  • @kebincui
    @kebincui 25 днів тому

    The best video about dataframe, very clear and easy to understand. Thanks Rob👍

  • @JHornsby89
    @JHornsby89 Рік тому +4

    Excellent tutorial - Really clear with excellent explanations and concise steps. I've found it useful as a codealonng with a different set of data. Just to get to grasps with Pandas. Thank you!

    • @robmulla
      @robmulla  Рік тому +1

      Awesome! So glad you enjoyed and learned from it.

  • @thechristan019
    @thechristan019 11 місяців тому

    Hi folks
    Does anyone know what keyboard he is using?
    Thanks in advance

  • @chacehawkins4708
    @chacehawkins4708 Рік тому +2

    First pandas or python tutorial I was able to watched more than 10 min of in one sitting. Great job. Looking forward to following you here and on twitch

  • @gabrielacolen7281
    @gabrielacolen7281 Рік тому +1

    THank you so much for this video. The quality is amazing and you are such a good teacher.

  • @n.zisanyalvac6538
    @n.zisanyalvac6538 2 роки тому +2

    thank you! greetings from Turkey :)

    • @robmulla
      @robmulla  2 роки тому +3

      My pleasure. Glad to know there is someone watching from Turkey!

  • @BrutalNewby
    @BrutalNewby 9 місяців тому

    Have a like good man. Gonna watch all of your videos from 0 ;)

  • @punamjadhav7801
    @punamjadhav7801 Рік тому

    Dataframe 1 / Table 1
    MaterialID |Unit Selling Price |Unit Cost
    A | 100 | 80
    B | 200 | 140
    C | 150 | 100
    D | 250 | 230
    E | 225 | 215
    Dataframe 2 / Table 2
    Month | Quantity Sold | Material ID
    Jan | 10 | A
    Feb | 5 | E
    Mar | 25 | C
    Jan | 5 | D
    Feb | 15 | B
    Mar | 2 | A
    Which month highest total sale amount achieved
    Which month highest profit amount observed
    Change Quantity Sold from 5 to 7 programatically and find out revised
    Which month highest total sale amount achieved
    Which month highest profit amount observed

  • @madiva45
    @madiva45 10 місяців тому

    Hello from dom rep next to aiti, how i change the back ground color of kaggle note book .

  • @nikunjgorani8964
    @nikunjgorani8964 Рік тому

    i cant convert the dtype of likeCount into int64 Can some1 hlp me

  • @amitavroydev
    @amitavroydev 10 місяців тому

    Great video

  • @CarolinaMunoz-vy3ni
    @CarolinaMunoz-vy3ni Рік тому +2

    Although I watched this video a long time ago, I wanted to go over a basic explanation. Thanks for sharing your knowledge.

    • @robmulla
      @robmulla  Рік тому +1

      Glad it was helpful!

  • @ke30_
    @ke30_ 2 роки тому +2

    Your channel is a goldmine ty!

  • @boubi9329
    @boubi9329 2 роки тому +1

    Hello, I saw one of your post in the youtube forum saying that your Timestamps didn’t worked out properly,
    I see now that it works.
    Did you do anything for it to work, any report sent to youtube ? Did the Timestamp worked after you hit the 1000 subscribers or the problem was there before ?

    • @robmulla
      @robmulla  2 роки тому

      Hey! Yes. that was really frustrating. I didn't do anything special. It took a few months but then started working. I hope you get it figured out too.

  • @BarryPennock
    @BarryPennock 9 місяців тому +1

    So cool! Pandas for Python makes things so much easier.

  • @gospelmoto2833
    @gospelmoto2833 Рік тому

    Got a new sub here. Thanks for your video.

  • @pkprasads
    @pkprasads Рік тому +1

    Thank you so much.

  • @seifmoheb1112
    @seifmoheb1112 2 місяці тому

    Great tutorial. Very well explained.

  • @betterliving747
    @betterliving747 10 місяців тому

    One of the best explanations thank you

  • @PaquiCamus
    @PaquiCamus Рік тому +2

    I watched the whole video and I am so amazed at how easy is to do data analysis nowadays. I am the generation that used: Lotus to Excel, due to their limitation we needed to combine Statistical Packages(R, SAS), Graphing(Grapher), and 3D surface mapping (Surffer) plus Visual Basic. Now I have decided to learn Python plus others to go back to some old data for recalibration that was impossible deal with due to memory overloading. Panda is very powerful. Thank you very much for sharing your knowledge.

    • @robmulla
      @robmulla  Рік тому +1

      Glad the video showed you something new. Pandas is a great skill to master and can be really powerful for doing things that aren’t possible to do in excel with large datasets.

    • @PaquiCamus
      @PaquiCamus Рік тому +1

      @@robmulla Yes, you are right. I am learning it and struggling with Jupyter Notebook and Kagle. I like both. I still have a long way to go!

  • @terenceochuo701
    @terenceochuo701 Рік тому

    Getting errors while reading csv file

  • @mohamedsaber7097
    @mohamedsaber7097 3 місяці тому

    very interesting Tutorial ❤❤

  • @stylesg7818
    @stylesg7818 Рік тому +1

    Thank you

  • @samuelbarretoT-T
    @samuelbarretoT-T 6 днів тому

    I loved this video !!

  • @manasisingh294
    @manasisingh294 Місяць тому

    tysm for this tutorial! :D

  • @amazman977
    @amazman977 3 місяці тому

    Thanks. Easy to follow you.

  • @swannschilling474
    @swannschilling474 5 місяців тому

    Great content!!

  • @nakul469
    @nakul469 7 місяців тому

    29:20 - it is giving me value error

  • @SathishKumar-bb4ly
    @SathishKumar-bb4ly 2 місяці тому

    Very Nice and Helpful

  • @ItsMePeterB
    @ItsMePeterB 6 місяців тому

    Thank you for the tutorial!

  • @annikaw5068
    @annikaw5068 Рік тому +1

    yass

    • @robmulla
      @robmulla  Рік тому +1

      😉

    • @annikaw5068
      @annikaw5068 Рік тому +1

      @@robmulla You just gained a new subscriber✅😊

  • @otenyop
    @otenyop 11 місяців тому

    Wow this was a great video!

  • @jonesPossibly
    @jonesPossibly Рік тому +1

    Hi - why is the series of 'thing' in mydf dataframe an object dtype, but the 'count' is an int64 dtype? Why is 'thing' not a 'string' dtype?

    • @robmulla
      @robmulla  Рік тому

      I'm not sure, if "thing" is a string it will show as "object" if you count on it it will produce a number integer.. but I'm not sure exactly what you are asking.

    • @jonesPossibly
      @jonesPossibly Рік тому

      @@robmulla hiya - thanks for the reply - at ua-cam.com/video/_Eb0utIRdkw/v-deo.html
      you can see that the series containing stings has a dtype of object, whereas the series containing the integers has a dtype of int64. I was just wondering why its object and not string?

  • @AdobadoFantastico
    @AdobadoFantastico 2 роки тому +2

    You're very good at tutorializing.

  • @abhishekrai1060
    @abhishekrai1060 2 роки тому +1

    Don't use twitch. Will subscribe you here though

  • @andresfrancojunor
    @andresfrancojunor 2 роки тому +2

    That was really good ! Thank you !

    • @robmulla
      @robmulla  2 роки тому +1

      Glad you found it helpful. Please share it with anyone else you think might also learn from it!

  • @jakubkopczynski779
    @jakubkopczynski779 Рік тому +1

    Thanks for the video, I've learnt a lot! I tried casting different dtypes on the imported Excel spreadsheet columns and found out an interesting issue, when I summed up the numbers in the same column with float64 and int64 dtypes, I got different results (no decimal points involved). Honestly I have no idea why it's working like that! I guess it has to be something with Excel formatting...

  • @metalflames777
    @metalflames777 7 місяців тому

    21:38
    Wow!! I never knew you could use underscores in place of commas for the larger numbers!

  • @aditipandey1769
    @aditipandey1769 Рік тому

    Thank you Sir

  • @100themagician
    @100themagician Рік тому +1

    Amazing video, thank you Rob

    • @robmulla
      @robmulla  Рік тому

      Thanks for watching. What with a friend!

  • @NAC79
    @NAC79 2 роки тому +1

    How many years have you been doing Data Science?

    • @robmulla
      @robmulla  2 роки тому +1

      Thanks for asking. I've been working with data for over 10 years but doing data science specifically for about 6.

  • @antoines8843
    @antoines8843 Рік тому

    What is the shortcut you use @27:16 ???
    To copy paste the code while replacing the variable inside?

  • @filmssharecenter3293
    @filmssharecenter3293 6 місяців тому

    great

  • @chq012
    @chq012 Рік тому

    Wow, great video!
    Very much enjoyed it.
    I will definitely watch your other videos to learn many more techniques.
    Thank you.

  • @kapamagicman
    @kapamagicman 5 місяців тому

    Really great! I love how you go through step by step and with the end to end examples. Thank you!

  • @SarahBoyd-o8d
    @SarahBoyd-o8d 9 місяців тому

    Thanks! So well explained.

    • @robmulla
      @robmulla  9 місяців тому

      Glad you enjoyed it!

  • @uchegodswill-iv4cd
    @uchegodswill-iv4cd Рік тому

    yes, thanks a lot. l learnt so much. its been interesting and i am going to watch all you videos.

  • @andrewkurian726
    @andrewkurian726 Рік тому

    thank you

  • @laurenceturpin1409
    @laurenceturpin1409 Рік тому

    Thank you for doing the video I learnt a lot about pandas please keep making videos like this.

  • @mohan250s
    @mohan250s 2 роки тому +1

    watching your videos one by one bro, awesome work as usual

    • @robmulla
      @robmulla  2 роки тому +1

      Thanks a ton, so glad you are learning from them. Make sure to comment and share so the videos get picked up by the youtube algorithm.

  • @matthewshaffer3378
    @matthewshaffer3378 3 місяці тому

    Very helpful! thanks for making this!

  • @DiegoSilva-dv9uf
    @DiegoSilva-dv9uf Рік тому

    Valeu!

  • @ericbroun4657
    @ericbroun4657 Рік тому

  • @txreal2
    @txreal2 Рік тому +1

    Can I use Google Colab to follow along also?

    • @robmulla
      @robmulla  Рік тому +1

      Yes! It's actually really easy. Just click the three dots in the top right corner of the kaggle notebook and then click "download code" you then can open it in google colab. You would also need to download and link the data. Why do you prefer colab over a kaggle notebook? They are very similar.

  • @dailyuploads3959
    @dailyuploads3959 Місяць тому

    Thanks nice teaching method you open my mind and eyes

  • @kirtwedel9275
    @kirtwedel9275 10 місяців тому

    Fantastic video! Thanks much for the lesson.

  • @felixakwerh5189
    @felixakwerh5189 Рік тому +1

    do you have a discord channel??

    • @robmulla
      @robmulla  Рік тому +1

      Yes I do! Join! discord.gg/KnsDbstv

  • @rrio7171
    @rrio7171 Рік тому

    best I found on YT! underrated video, there should be more likes

  • @gabrielbiacchi6169
    @gabrielbiacchi6169 Рік тому

    You explain these in a very clear way,! Thank u sir 🙏

  • @travistexian
    @travistexian 5 місяців тому

    Awesome information, plainly stated

  • @AlbertoChillon
    @AlbertoChillon Рік тому

    Thank you very much, an amazing tutorial!!

  • @ghrangelr
    @ghrangelr 2 роки тому +1

    Hello, What key do you use to open the menus?

    • @robmulla
      @robmulla  2 роки тому +1

      Which menu is that? You might want to check out my video on jupyter notebooks when I talk about the keyboard shortcuts I often use.

  • @dlcrdz00
    @dlcrdz00 Рік тому

    First I want to thank you for sharing your skill and time creating these videos. I thought I was doing pretty well until we got to the Columns and rows section...haha I type the same "set_index" as you did but I kept getting an error. I found out I could run "reset_index", then I did "set_index" again and it worked.

  • @obayram4615
    @obayram4615 Рік тому +1

    Very good 🙂👋👋👋👋👋👋😋

  • @byte_me_xd-hk5zt
    @byte_me_xd-hk5zt Рік тому

    your intro i immediately was like you're my hero!

  • @Dongnanjie
    @Dongnanjie 8 місяців тому

    Love it. Thank you Rob!

  • @Abdolahy
    @Abdolahy Рік тому

    Hi there Rob! I wanna thank you for this fantastic tutorial you made on EDA with Python Pandas library. That really was the most impressive EDA tutorial I've ever watched on YT.

  • @nasranruwaidi
    @nasranruwaidi Рік тому

    Amazing introduction for pandas. This video alone just already covered most of my reporting needs.Thank you

  • @stifferdoroskevich1809
    @stifferdoroskevich1809 2 роки тому +1

    Amazing video!!!

    • @robmulla
      @robmulla  2 роки тому +1

      Glad you liked it!!

  • @heitorrapela
    @heitorrapela 2 роки тому +1

    Good content! 😄

    • @robmulla
      @robmulla  2 роки тому +1

      Thanks so much Heitor!

  • @ArenitaHernandez
    @ArenitaHernandez Рік тому

    Amazing! Thanks!

  • @sandraoriji8351
    @sandraoriji8351 2 роки тому +1

    Awesome 👍

    • @robmulla
      @robmulla  2 роки тому

      Thanks Sandra! Glad you liked it.

  • @davdeveloper
    @davdeveloper Рік тому

    This video was so useful. Thank you so much!

    • @robmulla
      @robmulla  Рік тому

      You're so welcome! Thanks for watching.

  • @anirbanc88
    @anirbanc88 Рік тому

    this is such a great tutorial, thank you so much!

    • @robmulla
      @robmulla  Рік тому +1

      Glad it was helpful! Please share with others who you think might also learn from it.

  • @pashkinzon
    @pashkinzon Рік тому +1

    Wonderful video - explanation is nice and clean, very intuitive narrative, thank you!

  • @samarumugam4833
    @samarumugam4833 Рік тому

    Hi Rob why are addressing Hey u tube ,seems to be quite odd or some sort , mind you U tube is not watching yours Videos we guys are watching our Likes and Views makes your day worth while , So Gentle request be direct to source please address us by " Hi guys " or something .. 🙏🙏🤞Any way Great Videos by You bro. Keep it Up. God Bless You...

  • @CarolinaMunoz-vy3ni
    @CarolinaMunoz-vy3ni 2 роки тому +1

    Hello Rob, i followed your tutorial and i had an error when you created a new columns. Can you help me with this error. Thank you very much for the great job. /opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead

    • @robmulla
      @robmulla  2 роки тому

      Hey Carolina. This message usually comes up when you are trying to edit a subset of a previously defined dataframe. The best fix is to add `.copy()` after subsetting and renaming a dataframe.
      So before you might have:
      df_small = df.query('thing > 10')
      Change to:
      df_small = df.query('thing > 10').copy()

    • @nikunjgorani8964
      @nikunjgorani8964 Рік тому

      @@robmulla Hey Rob I am getting this error during converting the likecount into integer IntCastingNaNError Traceback (most recent call last)
      Cell In[126], line 2
      1 df['viewCount'].astype('int')
      ----> 2 df['likeCount'] = df['likeCount'].astype('int').copy()

  • @RoadTo10KsubsWithoutAnyVideos

    Very well explained. easily understood . Didn"t have any issues following along. Good job, brother 😀😀

    • @robmulla
      @robmulla  Рік тому

      Glad it helped! I apprecaite the feedback.