Solving real world data science tasks with Python Pandas!

Поділитися
Вставка
  • Опубліковано 22 тра 2024
  • Practice your Python Pandas data science skills with problems on StrataScratch!
    stratascratch.com/?via=keith
    In this video we use Python Pandas & Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
    Setup!
    Github source code & data: github.com/KeithGalli/Pandas-...
    Installing Jupyter Notebook: jupyter.readthedocs.io/en/lat...
    Installing Pandas library: pandas.pydata.org/pandas-docs...
    Check out the first video I did on Pandas:
    • Complete Python Pandas...
    Check out the videos I did on Matplotlib:
    • Intro to Data Visualiz...
    • Python Plotting Tutori...
    Detailed video description! (timeline can be found in comments)
    We start by cleaning our data. Tasks during this section include:
    - Drop NaN values from DataFrame
    - Removing rows based on a condition
    - Change the type of columns (to_numeric, to_datetime, astype)
    Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data:
    - What was the best month for sales? How much was earned that month?
    - What city sold the most product?
    - What time should we display advertisemens to maximize the likelihood of customer’s buying product?
    - What products are most often sold together?
    - What product sold the most? Why do you think it sold the most?
    To answer these questions we walk through many different pandas & matplotlib methods. They include:
    - Concatenating multiple csvs together to create a new DataFrame (pd.concat)
    - Adding columns
    - Parsing cells as strings to make new columns (.str)
    - Using the .apply() method
    - Using groupby to perform aggregate analysis
    - Plotting bar charts and lines graphs to visualize our results
    - Labeling our graphs
    If you enjoy this video, make sure to leave it a like and subscribe to not miss any future similar tutorials :).
    Check out the new "solving real world data science tasks" video I posted!
    • Solving real world dat...
    ---------------------------------------------
    Follow me on social media!
    Instagram | / keithgalli
    Twitter | / keithgalli
    ---------------------------------------------
    Video Timeline!
    0:00 - Intro
    1:22 - Downloading the Data
    2:57 - Getting started with the code (Jupyter Notebook)
    Task #1: Merging 12 csvs into a single dataframe (3:35)
    4:25 - Read single CSV file
    5:44 - List all files in a directory
    7:06 - Concatenating files
    11:00 - Reading in Updated dataframe
    Task #2: Add a Month column (12:48)
    14:12 - Parse string in Pandas cell (.str)
    Cleaning our data!
    17:31 - Drop NaN values from df
    21:25 - Remove rows based on condition
    Task #3: Add a sales column (24:58)
    25:58 - Another way to convert a column to numeric (ints & floats)
    Question #1: What was the best month for sales? (29:20)
    30:35 - Visualizing our results with bar chart in matplotlib
    Question #2: What city sold the most product? (34:17)
    35:32 - Add a city column
    36:10 - Using the .apply() method (super useful!!)
    40:35 - Why do we use the lambda x ?
    40:57 - Dropping a column
    46:45 - Answering the question (using groupby)
    47:34 - Plotting our results
    Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
    53:16 - Using to_datetime() method
    56:01 - Creating hour & minute columns
    58:17 - Matplotlib line graph to plot our results
    1:00:15 - Interpreting our results
    Question #4: What products are most often sold together? (1:02:17)
    1:03:31 - Finding duplicate values in our DataFrame
    1:05:43 - Use transform() method to join values from two rows into a single row
    1:08:00 - Dropping rows with duplicate values
    1:09:39 - Counting pairs of products (itertools, collections)
    Question #5: What product sold the most? Why do you think it did? (1:14:04)
    1:15:28 - Graphing data
    1:18:41 - Overlaying a second Y-axis on existing chart
    1:23:41 - Interpreting our results
    ---------------------
    If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
    Join the Python Army to get access to perks!
    UA-cam - / @keithgalli
    Patreon - / keithgalli
    *I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

КОМЕНТАРІ • 1,7 тис.

  • @KeithGalli
    @KeithGalli  3 роки тому +196

    Posted a new "Solving real world data science tasks" video! Check it out here: ua-cam.com/video/Ewgy-G9cmbg/v-deo.html

    • @Trazynn
      @Trazynn 3 роки тому +4

      This is awesome. Learning Python is so much easier when there's something tangible and grounded to work towards.

    • @colorways518
      @colorways518 3 роки тому

      hii keith!!! I am getting an error after this line
      CODE: for file in files:
      current_data = pd.read_csv(path + "/" + file)
      ERROR: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
      Please can you help me solve this error....I tried to find solution online but didn't get any.

    • @larrywang1983
      @larrywang1983 3 роки тому

      @@colorways518 Just thinking out loud,aren't we able to find the below kind of info from Amazon Jungle Scout, Helium10, Sellics. We are amazon seller, do we also need to go thru Python and data-science on Amazon. There are 3rd Party SaaS plug-ins to solve these questions. Correct me if i am wrong?
      - What was the best month for sales? How much was earned that month?

    • @ismaeelaileru4612
      @ismaeelaileru4612 3 роки тому +2

      For the problem on getting city with highest sales, we ran into an ordering problem while plotting the cities, I think we can also use result.index as our xtick
      That way it simply takes the values straight from the Dataframe in the right order rather than using df.unique and rearranging

    • @rodrigodasilva9176
      @rodrigodasilva9176 3 роки тому +1

      This red warning displays bcuz u didn't make a copy of the original dataframe, do it and this warning goes off.

  • @billyjorrosh9394
    @billyjorrosh9394 3 роки тому +880

    "I dont know how to do it, but i know how to google it." this guys knows how things going in real world haha

    • @thanhnhando3070
      @thanhnhando3070 3 роки тому +54

      Googling is, indeed, one of the most important skills for coding.

    • @indexima6517
      @indexima6517 3 роки тому +1

      Hahaha! We invite you to take a look at our videos which deal with the same topics :)

    • @carlurbananimals
      @carlurbananimals 3 роки тому +11

      His very fast too, like I would need to know it, coz once I go to google im there for 4 hours :/

    • @samirvinchurkar8226
      @samirvinchurkar8226 2 роки тому +5

      I did the exact same process be it R, Matlab or Py

    • @samirvinchurkar8226
      @samirvinchurkar8226 2 роки тому +2

      @@carlurbananimals that's coz your question isn't exactly right ;)

  • @justapugontheinternet
    @justapugontheinternet Рік тому +161

    As a programmer/data analyst/systems administrator I can safely say that this is exactly how we solve problems in real life. Good job!

    • @pasha7293
      @pasha7293 Рік тому +3

      you wouldnt have watched this video if you were

    • @justapugontheinternet
      @justapugontheinternet Рік тому +42

      @Pasha people who think they know it all are a bore. 🙄 You could always learn something new from other people, it never hurts to learn new perspectives. Good luck with that mindset. I learn everyday. 😌

    • @saugatjarif8272
      @saugatjarif8272 Рік тому +10

      @@justapugontheinternet love your mindset on🎉🎉🎉🎉

  • @terrymaverick580
    @terrymaverick580 3 роки тому +296

    the best part part was watching some one google the answer an seeing how they implement the solution instead of just acting like they know everything. man your tutorials are the best an down to earth

    • @Amir-tv4nn
      @Amir-tv4nn Рік тому +1

      hahahahaaha you think this kids knows what he is doing and for your information we all google no matter what postion we hold. 🤣 we built websites for a reason to always look back to when needed. Google provides faster search capability rather going to src and look through to get to. Get your mind straight about goodle 🤣 This kid clearly looking around for the code he already written and you assuming google is preferred to be a bad example as a programmer 😂 tells me you expecting movies type like hackers hahahaahahaha. Come to reality

    • @dragonmateX
      @dragonmateX Рік тому

      It honestly makes it feel more real, like, I am studying data science now and I google stuff all the time, the fact that even someone well versed in data science still googles stuff constantly is reassuring.

    • @Amir-tv4nn
      @Amir-tv4nn Рік тому

      @@dragonmateX people who work in google google stuff 😂 get back to reality to why google is meant for🤣

    • @buak809
      @buak809 Рік тому

      @@Amir-tv4nn and? what the fuck is your problem? so far you didn't write anything valuable here

    • @Diabolic9595
      @Diabolic9595 Рік тому

      @@Amir-tv4nn Come to reality. Man, come to reality. Could you please come to reality? Btw you should come to reality

  • @olajiireolajide
    @olajiireolajide 2 роки тому +8

    Love how realistic and down to earth all your videos are! Makes data analysis way more approachable. What a guy!

  • @helmialfath9897
    @helmialfath9897 4 роки тому +719

    This situation so realistic. The mistakes, the solving.. great video!

    • @Pidamoussouma
      @Pidamoussouma 4 роки тому +8

      Yes liked it ..it was so realistic

    • @user-ok5xb8sf3q
      @user-ok5xb8sf3q 4 роки тому +3

      is this sarcasm?

    • @ipshie
      @ipshie 4 роки тому +1

      Юрій Черній pretty sure no it's not

    • @billyjorrosh9394
      @billyjorrosh9394 3 роки тому +13

      not only teach us about pandas but also give us the confidence that "If this guy could be so success in data science then why shouldn't I?"

    • @89DerChristian
      @89DerChristian 2 роки тому

      @@user-ok5xb8sf3q no

  • @mid_paulownia
    @mid_paulownia 4 роки тому +153

    This is the most practical Python tutorial video I've ever watched.

  • @sushiplatter5540
    @sushiplatter5540 2 роки тому +9

    Keith, you're literally the most underrated and one of the best teachers on youtube. This exercise cleared most of my doubts about Data Science and i fell in love with it because of you. Thank you so much for this, you're the best!

  • @H99x2
    @H99x2 2 роки тому +10

    Dude, this is by far one of the best real-life tutorials on YT. Subbed for more like this!

  • @user-ci1oj3xo6h
    @user-ci1oj3xo6h 4 роки тому +6

    Content of this quality deserves far more recognition. Thank you!

  • @ujjawaljani6731
    @ujjawaljani6731 3 роки тому +141

    He is like my friend who teachs one day before exams. 😂😅

  • @akosasuke5128
    @akosasuke5128 Рік тому +1

    I get the feeling in this video that you know more than you're letting on but you're just trying to make things as basic as possible and I love it. I hope to teach others in this same manner. God bless you

  • @mikeyu6347
    @mikeyu6347 9 місяців тому +1

    I was absolutely blown away by the fanastic lectures. The best teacher I've ever had!

  • @royvivat113
    @royvivat113 3 роки тому +7

    This is the most informative video I've ever seen on what data science actually is! I keep looking for actual applications and I loved seeing your thought process, comments, and method of asking and answering questions.

  • @Random_dudebro
    @Random_dudebro 4 роки тому +24

    I just finished your two videos demonstrating numpy and pandas, finally feeling a good grasp of python basics (y)
    Thank you for everything you do!

  • @sathirasilva4958
    @sathirasilva4958 3 роки тому +55

    Great tutorial!
    55:00 When parsing a column into datetime, specifying the format manually will decrease the execution time significantly:
    all_data['Order Date'] = pd.to_datetime(all_data['Order Date'], format='%m/%d/%y %H:%M')

    • @rotan90
      @rotan90 Рік тому

      on google colab it was like 30 sec vs 2 sec. Great tip !

  • @deeplysuperficial8132
    @deeplysuperficial8132 2 роки тому

    By far one of the best tutorials I've seen in a long time. I'll be watching all your content. You explain things in a way that I'm able to perfectly keep up with.

  • @aphotos2284
    @aphotos2284 4 роки тому +3

    This is one of the best videos out there. Please do more of these. It's great to learn about the mindset as well as the technique!

  • @devmrin
    @devmrin 4 роки тому +22

    Hands down one of the most useful I've seen. Insights galore. Thank you!

  • @abdulqadirtinwala1296
    @abdulqadirtinwala1296 3 роки тому

    Dude , literary i have never seen anyone solving real world problems on you tube .Your, way of teaching is quite impressive. Many, you tubers just showcase basic problems .But, hats off to you !!!

  • @user-jw5tk2ef2f
    @user-jw5tk2ef2f 7 місяців тому

    This was the best python tutorial video I have ever watched! Thank you for taking the time to go into depth about the process of data science. You're awesome!!

  • @Magmatic91
    @Magmatic91 4 роки тому +19

    I love how this guy is explaining, I really enjoyed learning from you.

  • @ijbarraza
    @ijbarraza 4 роки тому +11

    As a new learner of python I found this to be one of the best videos on youtube for beginners. How he managed to deal with the problems and solve them on the go (not knowing it all, but knowing how to consult google for the right answer). Way to go! Loved the approach and how easy you made it look

  • @stefanlasek3256
    @stefanlasek3256 3 роки тому

    Honestly, one of the best videos I have seen. From mistakes, how to look for answers and little tips & tricks.
    You have got new subscriber in me.

  • @hoiying-chan
    @hoiying-chan 3 роки тому +10

    Your assignments are harder than Coursera's. I'm actually learning something. Major thanks all the way from Holland! 🙏

  • @anthonygonsalvis121
    @anthonygonsalvis121 3 роки тому +91

    Love how this cool dude researches solutions on the fly and explains things as he goes even when he commits minor unforced errors. He is so relatable. His other tutorials on Pandas, Numpy, Matplotlib, etc. are equally helpful. I wish him all the success and hope that he continues to share his knowledge for decades to come.

    • @chineduezeofor2481
      @chineduezeofor2481 3 роки тому

      He's such a GREAT tutor!!!

    • @indrajeetsinghyadav876
      @indrajeetsinghyadav876 2 роки тому

      Agreed totally relatable and helpful videos for beginners giving them a chance to know what error can happen due to what syntax errors. Thanks for the informative guide.

  • @francescofaccia
    @francescofaccia 4 роки тому +32

    Hy Keith, you're great! thanks to you we can be introduced to a hell of a lot of useful panda tools! keep up the good work!

  • @imdadood5705
    @imdadood5705 3 роки тому +2

    Thank you, Keith. I haven’t got enough words to thank you for this work. This is a great project for a beginner. Thanks again! 😊

  • @matty5ps444
    @matty5ps444 Рік тому +1

    just to add to what most people are saying, this is in my opinion the best way to do a tutorial. you showed me that even though im a super beginner and not long coming out of learning basic python things im able to pick up something really easily while realising that i dont have to feel bad thinking everyone else is better than me and that even experienced programmers google stuff and actually are not gods sitting on pedestals acting like they are better than us haha. great work

  • @DarshanMalu
    @DarshanMalu 4 роки тому +17

    You are awesome! Thanks for patiently explaining everything, also teaching how to google what you want! Thanks man!

  • @rezap1356
    @rezap1356 4 роки тому +7

    The best graph type for correlation is 'scatter graph', looks like a constellation. Great video Keith. Thanks.

  • @Jordanptheone
    @Jordanptheone 3 місяці тому +3

    Watching this 4 years after you published it, and you're still a legend ! Thank you !!!

    • @KeithGalli
      @KeithGalli  3 місяці тому +1

      Thank you for watching and the kind words!!

  • @edric7552
    @edric7552 Рік тому +9

    Hi Keith, I feel obligated to personally thank everyone that helps in pursuing my data career and of course, you included. I've used your project (and learned a LOT) and modify/add codes here and there with my own styling for my online portfolio. Moreover, you're a fantastic teacher and you deserve all the credits you should get for helping others like me. Thank you for doing this, may God return the favor and always bless you. Rock on Keith!

    • @KeithGalli
      @KeithGalli  Рік тому +2

      Thank you so much for the kind words! :)

  • @rafacardenas8783
    @rafacardenas8783 4 роки тому +4

    great job Keith!, keep up with the walk-through-style tutorials, hands on is the best and even better when you have the feedback.

  • @yaswanthfinds
    @yaswanthfinds 4 роки тому +18

    so nice I was searching this kind of tutorial, it has real-time mistake and solution,I hope you do this kind of videos regularly

  • @Account-fi1cu
    @Account-fi1cu 3 роки тому +4

    Great tutorial! thank you for sharing
    In 50:26 for cities: can always use the index values from 'results' DF:
    cities = results.index.values
    instead of a for loop

  • @geetanjalimisra4676
    @geetanjalimisra4676 8 місяців тому

    Keith you came through!! This is the kind of tutorial I was literally looking for to hone my data analysis/preprocessing skills. Thank you!!!

  • @kyledawes9593
    @kyledawes9593 3 роки тому +39

    As a business major with very limited internship experience, I am teaching myself python and data analytics from scratch. This video is literal gold to me because this is one of the few that actually shows the entire wrangling process! Thanks for the great vid!

    • @vilw4739
      @vilw4739 2 роки тому +1

      If i use only fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv") i get file not found error..i should use the whole path starting from c drive..How does he not get error

    • @ashiksrinivas
      @ashiksrinivas 2 роки тому +1

      @@vilw4739 He is using jupyter notebook where files are stored separately in a jupyter notebook directory and you can upload files in the directory and import them by simply running fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv")
      If you're using a local python IDE like pycharm and VSCode, you need to specify the whole directory like fd=pd.read_csv("C:/Data Science/Sales_Data/Sales_April_2019.csv") to import.

    • @vilw4739
      @vilw4739 2 роки тому

      @@ashiksrinivas thankyou

    • @muhsintabatabayee8592
      @muhsintabatabayee8592 Рік тому

      @@vilw4739 did you ever figure it out? getting the same error

    • @vilw4739
      @vilw4739 Рік тому

      @@muhsintabatabayee8592 they should be in the same folder.Otherwise you need to put the whole path

  • @karimkhatib8569
    @karimkhatib8569 3 роки тому +3

    Really interesting to go through the entire process, including looking up solutions and solving errors!

  • @cusescholar3582
    @cusescholar3582 26 днів тому

    This is the best data science class on the net (that I have seen, of course). We are solving real problems, using google, and working with datasets that require a lot of preprocessing. Perfect.

  • @vickyzhang820
    @vickyzhang820 2 роки тому +1

    Sooooo fantastic!!!
    This is definitely the best Data Project video I've seen on UA-cam!

  • @dawnfantasy
    @dawnfantasy 4 роки тому +6

    50:47 cities = result.Sales.keys() works as expected. great tutorial, tks!

  • @jenn6997
    @jenn6997 4 роки тому +8

    You are always so passionate and enthusiastic even if there're errors haha :) Love your positive attitude! Look forward to more great videos!! :)

    • @masthanjinostra2981
      @masthanjinostra2981 3 роки тому

      I get tensed like in hell..

    • @geekyprogrammer4831
      @geekyprogrammer4831 3 роки тому

      he purposely introduced those errors for us to have real-life problem-solving experience :)

  • @FrancisBaconthe3rd
    @FrancisBaconthe3rd 3 роки тому

    Didn't watch more than a few minutes since I already know how to do most of this stuff but loved how the dude straight up tells us to google it. SO TRUE!!! I've had professors who tell me the same thing. Thumbs up.

  • @DataScienceMAHAMAT
    @DataScienceMAHAMAT 21 день тому

    This is the most practical Python tutorial video I've ever watched. Thanks for sharing!

  • @KeithGalli
    @KeithGalli  4 роки тому +296

    Video Timeline!
    0:00 - Intro
    1:22 - Downloading the Data
    2:57 - Getting started with the code (Jupyter Notebook)
    Task #1: Merging 12 csvs into a single dataframe (3:35)
    4:25 - Read single CSV file
    5:44 - List all files in a directory
    7:06 - Concatenating files
    11:00 - Reading in Updated dataframe
    Task #2: Add a Month column (12:48)
    14:12 - Parse string in Pandas cell (.str)
    Cleaning our data!
    17:31 - Drop NaN values from df
    21:25 - Remove rows based on condition
    Task #3: Add a sales column (24:58)
    25:58 - Another way to convert a column to numeric (ints & floats)
    Question #1: What was the best month for sales? (29:20)
    30:35 - Visualizing our results with bar chart in matplotlib
    Question #2: What city sold the most product? (34:17)
    35:32 - Add a city column
    36:10 - Using the .apply() method (super useful!!)
    40:35 - Why do we use the lambda x ?
    40:57 - Dropping a column
    46:45 - Answering the question (using groupby)
    47:34 - Plotting our results
    Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
    53:16 - Using to_datetime() method
    56:01 - Creating hour & minute columns
    58:17 - Matplotlib line graph to plot our results
    1:00:15 - Interpreting our results
    Question #4: What products are most often sold together? (1:02:17)
    1:03:31 - Finding duplicate values in our DataFrame
    1:05:43 - Use transform() method to join values from two rows into a single row
    1:08:00 - Dropping rows with duplicate values
    1:09:39 - Counting pairs of products (itertools, collections)
    Question #5: What product sold the most? Why do you think it did? (1:14:04)
    1:15:28 - Graphing data
    1:18:41 - Overlaying a second Y-axis on existing chart
    1:23:41 - Interpreting our results
    Thanks for watching! If you enjoyed, please consider subscribing :).

    • @ANKITRAJ-fe8dh
      @ANKITRAJ-fe8dh 4 роки тому +4

      Heyy,machine learning would be awesome

    • @luuminhvuong
      @luuminhvuong 4 роки тому +2

      I Have very big data in xlsx format. Read excel tâkes like forever...

    • @mberoakoko24
      @mberoakoko24 4 роки тому +1

      I am on holiday and have started datascience for fun to see what the buzz is all about. I have to say I love it and I would appreciate if you'd apload more videos like this. I have learnt a TON

    • @kulpreetsingh9064
      @kulpreetsingh9064 4 роки тому

      Hey man, are you gonna do more such videos anytime soon?

    • @mohammedyounis7207
      @mohammedyounis7207 4 роки тому

      Thank you so much, it is very useful to me

  • @SaulOjeda
    @SaulOjeda 3 роки тому +13

    this video was amazing, I can't believe I actually sat throught the whole thing past my bedtime

    • @exploringwithdave5926
      @exploringwithdave5926 2 роки тому

      If you are a coder, there is no such thing as "bedtime". Just, awake, and not awake.

  • @user-fq5kb9nx3c
    @user-fq5kb9nx3c 7 днів тому

    I just enter data analysis area and amazing this videos made 4 years before already! thanks for made this, learnt your skills and problem solving as talents, appreciated!

  • @TanayaAmar
    @TanayaAmar 3 роки тому

    Loved this video - especially the real-world approach! Please keep creating more such content! Thank you so much!!

  • @oluwadamilaretijani1777
    @oluwadamilaretijani1777 2 роки тому +7

    Your courses are very great as you delve into practical content. Your course helped me to pass data analysis test in Turing. Thank you so much

    • @akosasuke5128
      @akosasuke5128 Рік тому

      Congrats oludamire, I'm guessing you're a Nigerian. I'm a Nigerian too and recently got into Exploratory Data Analysis through the udacity Nanodegree program. I'm currently on my second project which is an Investigation of WeRateDogs Twitter dataset. I think I have learnt a thing or two so far. Do you think I'm ready for Turin?..i hear it's like going to the big leagues lol.

  • @OK-Computer
    @OK-Computer 4 роки тому +44

    Great video! At the beginning it is much more concise to do this and concatenate all csv files into one like this (better to put ipython notebook csv files in the same directory and then):
    files=[f for f in os.listdir("./") if f.endswith('.csv')]
    df=pd.concat(pd.read_csv(i) for i in files)
    THAT'S IT!

    • @muhammadbashirmuhammad5529
      @muhammadbashirmuhammad5529 3 роки тому +1

      Thats better thanks

    • @subho1766
      @subho1766 3 роки тому +1

      monthly_dataframes = [pd.read_csv(file) for file in glob.glob(filePath + "*.csv")]
      merged_dataframe = pd.concat(monthly_dataframes)

    • @bartproffitt5240
      @bartproffitt5240 3 роки тому +1

      thank you so much i have been battling no such directory all morning

    • @jeisonsanchez4842
      @jeisonsanchez4842 2 роки тому +2

      Also consider adding a condition to skip the first row of each subsequent file - to avoid duplicate headers.

  • @manhaabdellah2682
    @manhaabdellah2682 Рік тому

    Im new to data analysis. My instructor always tells us to search our questions on google and get help from stack overflow. I didnt understand it till now and got stuck on my second project for sales analysis. This helped me big time!!! I'm so thankful to you for telling all those shortcuts. The data time split had such a long tricky code online.

  • @pranavkrishna9137
    @pranavkrishna9137 2 роки тому

    Keith! Thank you so much! I honestly mean it when I say, this is one of the best videos I've ever watched, trying to learn Data science. Thank you so much for this wonderful piece of content!!

  • @anubhkumar8824
    @anubhkumar8824 4 роки тому +8

    34:34
    Pro tip: go to command mode (press Esc) and press 'b' to make cells below current cell or 'a' to make cells above

    • @KeithGalli
      @KeithGalli  4 роки тому +6

      Thanks for the tips! Love when people comment helpful stuff like this :). Just started using command mode to easily switch cells from code to markdown, will have to add these two commands to the arsenal as well!

    • @FlyingMonkeis
      @FlyingMonkeis 4 роки тому +1

      f and j will move focus to above or below cells and u can pair this with shift and then press ‘m’ to merge the highlighted cells. so shift+f+m will merge the current cell with the one below it. ‘dd’ will delete a cell also! (these bindings are very vim like)

    • @christopherlyons7613
      @christopherlyons7613 4 роки тому

      Think that's reversed. Use 'b' to make cells above and 'a' to make cells below.

  • @muskankaushik5628
    @muskankaushik5628 3 роки тому +1

    This was a great video,you covered a lot of pandas and also showed real work which includes learning by making mistakes,looking things up.Exactly what i was looking for. Thanks a lot!!

  • @jeffmiller7010
    @jeffmiller7010 2 роки тому

    I enjoyed working through this real world data analysis problem with you. I look forward to more, please do more problems like this. It helps me to work out problems in Python.

  • @Yayaloy9
    @Yayaloy9 3 роки тому +9

    At 50:10 for anyone who wants to use .unique(), when you calculate the sales for each city make sure to throw in a .reset_index() in there, it will reset the indexes and your bar is going to be alright.
    cityy=all_data.groupby("City").sum().reset_index()
    then you do the rest like him, you can also throw in ascending order in there as well, just follow the rest of his instruction.
    cityy=all_data.groupby("City").sum().reset_index().sort_values("Sales",ascending=False)
    xxx=cityy["City"].unique()
    plt.bar(xxx,cityy["Sales"])
    plt.ylabel("$$$")
    plt.xlabel("Cities")
    plt.xticks(xxx, rotation='vertical', size=8)
    plt.show()

    • @smackedup7657
      @smackedup7657 9 місяців тому

      thanks a lot

    • @rezwanmehedad2095
      @rezwanmehedad2095 8 місяців тому

      unfortunately, I am getting a ValueError. Any idea how I can solve this:
      ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (10,) and arg 1 with shape (12,).
      I havent got any proper answer from google or maybe not an expert enough to understand :p.

  • @arnopisspot5115
    @arnopisspot5115 4 роки тому +4

    this video was super interesting. I can certainly watch 10 more of these!

  • @dp6736
    @dp6736 Рік тому +1

    Hi Keith, Even after three years, this video is very useful. You are very good at explaining the concepts. Thank you very much

  • @bloosea123
    @bloosea123 3 роки тому

    Awesome video Keith! Finding the pairs of items is a problem I had in a previous project and now it is nice to know how to solve it! It is also nice to see the entire data discovery process in the video, complete with plenty of Stack Overflow questions.
    Quick tip: plots of specific columns in a data frame can be completed in one line of code, for instance
    df["ColumnName"].plot()

  • @a.yashwanth
    @a.yashwanth 4 роки тому +4

    Checking the length of dataframe helps instead of storing in csv file and verifying.

  • @kafaayari
    @kafaayari 2 роки тому +16

    When passing a function to apply, you could have just passed the function name, there's no need to do apply(lambda x:get_city(x)). This is just enough and better => apply(get_city)

    • @MattHuisman
      @MattHuisman 2 роки тому +2

      Came here to make sure someone said this! As long as the function you pass only takes a single argument. Otherwise lambda x: my_func(x, other_arg)

  • @rafaelmachado7666
    @rafaelmachado7666 Рік тому +2

    Amazing video ! All the mistakes and the searching process make the beginners in data science realize that it's possible to do a lot of things since the start of the journey. Thanks

  • @kelvingitari
    @kelvingitari Рік тому

    Best data analysis video I have watched so far! I also love how most people in the comment sections have outlined alternative ways of approaching some of the tasks.

  • @JoaoOliveira-wh1tp
    @JoaoOliveira-wh1tp 3 роки тому +5

    Great video. Just a few suggestions:
    At 4:25 when using os.listdir("'./"), this returns a list alread. So using [file for file in os.listdir(...)] is redundant.
    At 40:50 you don't need to use the lambda function, even if you want to access a cell content. If you simply pass the reference to a function, by default the *args will be passed. Example:
    def modify(a):
    return 'CHANGED ' + a + ' CHANGED'
    df['Column'].apply(modify) # modify without parenthesis is the reference to the function.

    • @mahermonirify
      @mahermonirify 3 роки тому

      could u please help : why i'm getting path error when i did try to use os.listdir but not when i opened a specific file to read?

  • @MicahJohns
    @MicahJohns 3 роки тому +9

    23:39 that duplication was because of the header rows in each of the files. I've dealt with this a lot. You would have had had to have excluded those header rows on each file before you concatenated all of them together to resolve this.
    Great video course man, thank you for making all of that content

    • @vertik3895
      @vertik3895 2 роки тому

      I just did what he did and all I am getting is the header rows, what's the solution?

    • @oscardyremyhr5948
      @oscardyremyhr5948 2 роки тому +1

      @@vertik3895 load first df as normal and proceeding df´s as pd.read_csv('file2.csv', skiprows=1) before concat

    • @eduardosa9658
      @eduardosa9658 Рік тому

      @@vertik3895 The solution is call the method read_csv(..., header=None) for each iteration

  • @katherinenavarrohansen2748
    @katherinenavarrohansen2748 3 роки тому

    I write from Denmark, but I'm Chilean, I followed all the steps and really everything is very clear, I loved your explanations of each task and each question

  • @GunHolsters
    @GunHolsters 2 роки тому

    i really appreciate your approach to these tutorials. Allowing the problem to drive the programming solution (while learning some of it on the fly) is how i do most everything.

  • @abhishek_raj
    @abhishek_raj 3 роки тому +7

    Keith: I am gonna snatch the first two digits and make it the month.
    The data: Hold my NaNs !

  • @Scratchmex
    @Scratchmex 4 роки тому +10

    22:00 I think is more reliable to parse column of dates as datetime type to avoid all these problems

    • @stevejuso
      @stevejuso 3 роки тому

      pd.to_datetime did not work for me on this data. How did you use it? I get an error

    • @SiIentFire
      @SiIentFire 2 роки тому

      @@stevejuso Really late reply, but just incase it helps someone.
      You can tell the read_csv function to read a column as a date by passing in parse_dates=['col1', 'col2'] for any amount of columns.
      You can tell it to use European format with dayfirst=True
      And if you need a specific format you can use date_parser to give your own parser for a specific format.
      So in my case it was:
      df = pd.read_csv('filepath', parse_dates=[datecols], dayfirst=True) to get the cols I needed into European date format.
      One key thing is that it converts the dates to a pandas timestamp. But they are interchangeable with python datetimes almost all of the time. Can also be converted with an .apply(lambda x: x.to_pydatetime) if you need.

  • @calvinwijaya9706
    @calvinwijaya9706 3 роки тому

    Hi Keith, first time seeing your video, this kind of format of 'tutorial' is just perfect. Thankyou !

  • @rishabhdewangan6520
    @rishabhdewangan6520 3 роки тому +2

    One of the easy simple and best ways to approach data analysis
    This is my first time watching you sir and Im already a sincere subscriber while(True): Do watch, learn and grow under your guidance
    You are Awesome

  • @berkayozkan2631
    @berkayozkan2631 3 роки тому +6

    I love how he freaks out whenever there is a small warning lol

  • @omrieliyahulevy7985
    @omrieliyahulevy7985 4 роки тому +6

    Great tutorial, I've learned a lot!
    a suggestion for you first question for the best month for sales:
    Instead of creating the extra cols of 'month' and 'sales' we can use the pandas "resample" method which does the group by month for us, and just like in the groupby method we close it with the "sum" and we get the same table!
    all_data.resample('M', on='Order Date').sum().sort_values(by='Price Each', ascending=False)

    • @Yayaloy9
      @Yayaloy9 3 роки тому

      But heres the problem, Order Date is not a date time type so you have to conver it first.
      all_data["Order Date"]= pd.to_datetime(all_data["Order Date"], format="%m/%d/%y %H:%M")

  • @zewduwereta302
    @zewduwereta302 2 роки тому +1

    I have been enjoying your videos (a few) recently but this one is just superb! I am eager to chase more videos. Thanks for your styles and tricks! !!!

  • @Jack-xy4fy
    @Jack-xy4fy 3 роки тому

    fantastic video, thank you so much!
    showing your mistakes and working out the solutions is absolute gold, that is something many other tutorials are missing.

  • @Doorshlak
    @Doorshlak 4 роки тому +13

    This channel is the best thing I've encountered in a while. Thank you for helping the desperate ;-; Would do 5 likes if I could

  • @JohnnyRottenest
    @JohnnyRottenest 4 роки тому +5

    50:00, use result.index as x values and x ticks.

  • @priyalarunnile7981
    @priyalarunnile7981 3 роки тому

    This is awesome. Thank you so much @Keith. Would love to go through more videos in the future. Please do post.

  • @qiaochow6668
    @qiaochow6668 3 роки тому

    Thanks so much for making the video! Like your style! Love how you teach and the way you solve problems! The imperfection makes the video perfect!

  • @nishantbanjade920
    @nishantbanjade920 4 роки тому +11

    I like the way you say in every mistakes - :: AAAAh What did i do ::" lol :D xD

    • @Jack-xy4fy
      @Jack-xy4fy 3 роки тому +1

      hahaa it made me laugh because i do the exact same thing

  • @vikram3297
    @vikram3297 4 роки тому +4

    32:15 you have created months list to pass it to plt.bar() out of thin air, in current scenario as our data is coming in sorted way by month so no issue is coming else it would have plotted Sales against wrong month. Instead I tried this, please let me know if I'm wrong about it?
    all_data.groupby('Month')['Daily Sale'].sum().plot(kind='bar')
    plt.show()

    • @naishkiteboarder
      @naishkiteboarder 4 роки тому

      The groupby function sorts by months I think so that will be [1:13], same as the new month variable

    • @naishkiteboarder
      @naishkiteboarder 4 роки тому

      Monthss = [month for month, df in All_Data.groupby('Month')]

  • @Abdullahkbc
    @Abdullahkbc Рік тому +1

    You are great Keith. You are doing it in a manner that most students can understand better.

  • @TheMaltesemania
    @TheMaltesemania 3 роки тому

    I feel like I struck gold with this video. It's helping me learn a lot quicker than online tutorials. Thank you!

  • @ng4logic
    @ng4logic 4 роки тому +18

    58:22 I heard that

  • @KeithGalli
    @KeithGalli  2 роки тому +3

    I'm launching a data analytics bootcamp!
    goto.masterschool.com/5wn3sw
    Some highlights of the program:
    - Fully remote (with flexible working hours)
    - No tuition fees until after you land a job in tech
    - Open to applicants anywhere in the world!
    This is a 7-month long program kicking off in June. To learn more and get your application started, click the link above ⬆

  • @mikshubhatt1175
    @mikshubhatt1175 3 роки тому

    This is really an example of real world data analysis. Appreciate your efforts.

  • @dakafranklin1786
    @dakafranklin1786 2 роки тому

    This tutorial is wonderful bro most especially I like the fact that you google some of these problems unlike other UA-camrs the make it feel like they do all from their head which makes it more difficult because audience may think they have to memorize everything.

  • @ericwxng
    @ericwxng 4 роки тому +3

    disappointed there's no python kangaroos in lieu of their recent population decline

  • @eurasmo
    @eurasmo 2 роки тому

    This video is AMAZING! Thanks a lot for posting it. Great examples of real-life scenarios that might happen when analysing data

  • @berkan9900
    @berkan9900 2 роки тому

    Wow, I was searching about data scientist cases and this video included the best real-life data science case ever. Especially, I am impressed with your Google search skills hahaha.
    Perfect, very real, and funny. Subscribed :D

  • @chineduezeofor2481
    @chineduezeofor2481 3 роки тому +1

    Thank you so much for this Keith! Beginners like me appreciate this a lot.

  • @AmiNation5430
    @AmiNation5430 Рік тому

    You really put so much efforts in helping us to understand the data, process and ideas. Thank you for the videos.

  • @arpangoyal7337
    @arpangoyal7337 Рік тому

    LOVED the entire video and how raw it was, alongwith his explanation!

  • @mulezen9167
    @mulezen9167 Рік тому

    I can't stress enough how much I love this video! Good work!

  • @ernestoa.371
    @ernestoa.371 Рік тому

    Your videos are helping me a lot, don't stop man!!

  • @rachrach9871
    @rachrach9871 Рік тому

    Great tutorial! So much I’ve learned in this video! Thanks so much Keith. Looking forward to learning more useful stuff here

  • @KuyaRalph
    @KuyaRalph 3 роки тому

    Nice tutorials! Just finished your pandas videos! Thanks, man!

  • @tanmaysinghi1868
    @tanmaysinghi1868 2 роки тому

    Thank you so much for making this, the video acts as a great capstone project and shares a lot of learnings, what makes is best is the fact that you yourself were learning throuought the video using stack flow etc, that made the video extremely friendly in my opinion, it really does teach you how analysis happens in the real world.

  • @MrDviratis
    @MrDviratis 3 роки тому

    Really enjoyed coding along following this video. Nicely done, Keith!