Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

Поділитися
Вставка
  • Опубліковано 4 жов 2024
  • In this video, we take a look at an Exploratory Data Analysis (EDA) portfolio project within Python Pandas. Everything is coded within Jupyter Notebook and the data is sourced from Kaggle.
    Python Libraries needed: Pandas, Seaborn
    Kaggle Data: www.kaggle.com...
    Interested in discussing a Data or AI project? Feel free to reach out via email or simply complete the contact form on my website.
    📧 Email: ryannolandata@gmail.com
    🌐 Website & Blog: ryannolandata....
    🍿 WATCH NEXT
    Python for Data Analyst and Scientists Playlist: • Python Tutorials For D...
    Python Data Cleaning: • Real World Data Cleani...
    Python Groupby: • The Complete Guide to ...
    Vid 3:
    MY OTHER SOCIALS:
    👨‍💻 LinkedIn: / ryan-p-nolan
    🐦 Twitter: / ryannolan_
    ⚙️ GitHub: github.com/Rya...
    🖥️ Discord: / discord
    📚 *Data and AI Courses: datacamp.pxf.i...
    📚 *Practice SQL & Python Interview Questions: stratascratch....
    WHO AM I?
    As a full-time data analyst/scientist at a fintech company specializing in combating fraud within underwriting and risk, I've transitioned from my background in Electrical Engineering to pursue my true passion: data. In this dynamic field, I've discovered a profound interest in leveraging data analytics to address complex challenges in the financial sector.
    This UA-cam channel serves as both a platform for sharing knowledge and a personal journey of continuous learning. With a commitment to growth, I aim to expand my skill set by publishing 2 to 3 new videos each week, delving into various aspects of data analytics/science and Artificial Intelligence. Join me on this exciting journey as we explore the endless possibilities of data together.
    *This is an affiliate program. I may receive a small portion of the final sale at no extra cost to you.
  • Наука та технологія

КОМЕНТАРІ • 114

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  Місяць тому

    Thanks for checking out this video.
    Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
    If you want to watch a full course on Python Pandas check out Datacamp: datacamp.pxf.io/XYD7Qg
    Want to solve Python data interview questions: stratascratch.com/?via=ryan
    I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
    *Both Datacamp and Stratascratch are affiliate links.

  • @WarbossPepe
    @WarbossPepe 11 місяців тому +10

    You're a good man Ryan. Hope the run went well

  • @idreeskhan5129
    @idreeskhan5129 6 місяців тому +3

    Great work Ryan . Thank you

  • @arun_jakhmola
    @arun_jakhmola 4 місяці тому

    Hey Ryan, Greetings from India
    I shadowed you for 3 days and completed the project in bits but glad I finished the whole video.
    Loved the project and the way you taught it.
    (Just a suggestion - Please go by the agenda for the project, so that we can have an outline in our minds of the key things that we as data analysts need to extract from the data.)

  • @kokowin5851
    @kokowin5851 4 місяці тому +8

    This is an easier way to remove USA from the event name = df2["Event name"] = df2["Event name"].str.replace("(USA)", " ")

    • @sandydalhousie
      @sandydalhousie Місяць тому

      yes this is better I agree. Also, I also tried using the split method as used by Ryan but all my entries in the "Event name" get replaced with "None" somehow! I don't understand.

  • @emastehr
    @emastehr Рік тому +4

    Great Project. Could you develop a full project? Something that includes sql, python and then a visualization tool. That would be amazing

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Рік тому

      Yes I’ll be working on one in the future. Focus atm is more models like the one I uploaded today

  • @234bellamkonda
    @234bellamkonda Місяць тому

    Awesome video, finished it in a day. Planning to do 1 project a day following videos till I get comfortable doing things on my own. Very easy to follow, thank you so much 😊

  • @rahulpal_dsml
    @rahulpal_dsml 11 місяців тому +2

    Not subscribing you would be a sin, after going through this beautiful and informative video!. keep going!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  11 місяців тому +1

      I appreciate it! Working on a big class video next! Followed by Pytoch

    • @rahulpal_dsml
      @rahulpal_dsml 11 місяців тому

      @@RyanAndMattDataScience would love it, I am not sure whether you do it or not, as i just came across your video today, but do try posting (community post) some time before the videos, would not want to miss it.
      Appreciate for valuable input by you, really impressed by a tutor's ability to convey after more than a decade !!

    • @rahulpal_dsml
      @rahulpal_dsml 11 місяців тому

      Hey, Ryan, i am getting this error when combining all the filters together. Could you please guide how to sort this?
      MemoryError: Unable to allocate 75.9 TiB for an array with shape (7461195, 1398540) and data type float64
      I have a 8th gen cpu (i5 - 8350U), 24 Gb RAM, 500 GB SSD (Crucial mx500), and am using jupyter notebook in anaconda env.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  11 місяців тому

      @@rahulpal_dsmlcan you try running it in Kaggle or Google collab?

    • @rahulpal_dsml
      @rahulpal_dsml 11 місяців тому

      @@RyanAndMattDataScience Hi, yes, it did run on google colab, thanks a lot

  • @shailendra_kunwar
    @shailendra_kunwar 5 місяців тому

    Awesome work Ryan 🔥🔥🔥🔥
    I have just watched it and I appreciate the effort that you put in for the video. I will be using this as my portfolio project.

  • @jkzhakom
    @jkzhakom 5 місяців тому

    Fantastic video, Ryan. Thanks for sharing your knowledge with us.

  • @nlnl72
    @nlnl72 6 місяців тому +1

    Thanks for the video! really helpful.
    Do you think you can do a Data Scientist Portfolio Project(s) series? I'm sure you'll find a lot of people interested in that (including me haha)!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  6 місяців тому +1

      hey I have 2 out so far! and I just published another data analyst project last week

    • @nlnl72
      @nlnl72 6 місяців тому

      @@RyanAndMattDataScience Okey thanks, I'll definitely check them out!

  • @takashiiexe
    @takashiiexe 7 місяців тому

    Thanks Ryan! Great Project.

  • @alexrosen8762
    @alexrosen8762 Рік тому

    Really useful project for learning especially since the datasample is included. Thanks a lot 🙏

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Рік тому +1

      Glad it was helpful! Currently in the initial stages of next months project

    • @alexrosen8762
      @alexrosen8762 Рік тому

      @@RyanAndMattDataScience Great! Looking forward to that👌

  • @Nighthunterm
    @Nighthunterm 5 місяців тому

    Was just doing some python learning to get some more knowledge and and I just found your channel. I heard you say you ran your marathon around UCF. I'm a fellow alumni as well from there haha. Go knights!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  5 місяців тому

      Haha charge on and 25 loops around campus. I’ll never run there again lol

  • @7a30adnanbin5
    @7a30adnanbin5 5 місяців тому +1

    Great Vid mahn .. really helpful

  • @RoleJohn
    @RoleJohn Рік тому

    great great content ! i am subscribing only on the condition you upload more and more in depth analysis using Python. Keep it up

  • @akshatalanjewar3056
    @akshatalanjewar3056 7 місяців тому

    Its simply amazing ....i lke the way u teach and informative video

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 місяців тому

      Thank you

    • @akshatalanjewar3056
      @akshatalanjewar3056 7 місяців тому

      @@RyanAndMattDataScience ...need one question answer .. according to job market ....which python libraries I should know for data analyst profile ..

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 місяців тому

      @@akshatalanjewar3056 start with pandas and scikit learn

    • @akshatalanjewar3056
      @akshatalanjewar3056 7 місяців тому

      @@RyanAndMattDataScience well , I know python libraries like pandas , numpy , seaborn and maplotlib ....sql , power bi ..is this sufficient to get a job

  • @maxnicolasnavarro4017
    @maxnicolasnavarro4017 Місяць тому

    Thank you so much for bringing back my love for this field.
    I needed this so much...

  • @shayanakhavan6002
    @shayanakhavan6002 5 місяців тому

    Great video, Ryan!

  • @RRangel7b
    @RRangel7b 2 місяці тому +1

    Hello
    1th of Thank you !!
    & how about:
    df = pd.DataFrame(data)
    usa_events = df[df['Event name'].str.contains('USA')]
    print(usa_events)

  • @lujingyan6853
    @lujingyan6853 7 місяців тому +1

    Thank you for your sharing. But when you use (df["Event name"].str.split("(").str.get(1).str.split(")").str.get(0) == "USA") to select all the USA races, it will ignore the events that contain more than one () in their name, such as Palisades Ultra Trail Series (PUTS) - Big Elk 50k (USA). It might be a good way to use df["Event name"].str.contains(r"\(USA\)".

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 місяців тому

      Ah didn’t realize when doing this project. Great catch and thanks for commenting

  • @binarify4364
    @binarify4364 7 місяців тому

    Brilliant Project !

  • @ayantikaC03
    @ayantikaC03 9 місяців тому

    Great video Ryan!

  • @michaelshepherdmunemo4414
    @michaelshepherdmunemo4414 4 місяці тому

    Great work. Thank you i was following hands on. # Subscribed_and_Liked

  • @everywoman2774
    @everywoman2774 10 місяців тому

    subscribed! great video. Thank you for this

  • @stallonengobua8820
    @stallonengobua8820 4 місяці тому

    Thank you very much Ryan

  • @tarekhusam
    @tarekhusam Рік тому

    You are amazing, keep it bro

  • @athayaazaria1825
    @athayaazaria1825 7 місяців тому +1

    hi, can I get the full syntax at minute 49.07, I can't see the continuation. I need it for my current school assignment, and this will help me a lot😊😊😊

  • @navid7467
    @navid7467 17 днів тому

    New subscriber here! Thank you for your good work. Just a quick question. To extract events held in USA, since we know we are looking for the 3 letters between the 5th last and last as USA, couldn't we use this condition: (df['Event name'].str[-4:-1]=='USA')? I used it but my dataframe returns 26524 rows which I thought might be due to difference in the version of dataset.
    I also tried (df['Event name'].str.endswith("(USA)")) and got the same number of rows.

  • @thekendev
    @thekendev 6 місяців тому +1

    Hey Ryan,
    Just watching this and following along.
    I’ve got a question please;
    At the 17:30 mark I noticed that the split you did seemed a bit overwhelming. As a novice in data scientce, I couldn't help but notice something interesting in the data. There were event names labeled inconsistently for the USA, some as "usaaaaA" and others as "usaaa". So I used a simple string.contains() function with case sensitivity turned off to standardize it, resulting in 1.7 million rows. Wanted to hear your thoughts on this approach.
    I know might be labeled a lazy and easy approach but I found this catching more rows effectively. Please give me your views(I’m still learning)

    • @thekendev
      @thekendev 6 місяців тому

      So my .shape() is 30120 not 26090

  • @mohamedzrirak5884
    @mohamedzrirak5884 9 днів тому

    thank you👍

  • @dominiktokarski8054
    @dominiktokarski8054 Рік тому

    Liked, subscribed and commented for stats. Keep going :)

  • @charlieadleydog
    @charlieadleydog 4 місяці тому

    Hey Ryan, great video. Just wanted to ask how much RAM you suggest for these projects to be able to run quickly?

  • @dj-mt1pz
    @dj-mt1pz 7 місяців тому +2

    My kernel keeps dying whenever I combine all the filters of the df to create df2. Does anyone know how to resolve this issue? Otherwise I can't progress :(

    • @linda_erose
      @linda_erose 2 місяці тому

      same, did u figure it out?

  • @AmbarGharat
    @AmbarGharat 6 місяців тому +2

    Hi Ryan, Instead of df['Event name'].str.split('(').str.get(1).str.split(')').str.get(0) == 'USA' can we use df['Event name'].str[-5:] == '(USA)'?

    • @shailendra_kunwar
      @shailendra_kunwar 5 місяців тому

      Yes this is somehow giving 1408416 rows while the method that Ryan in the video is giving 1398540 rows.

  • @geoffreycg5650
    @geoffreycg5650 8 місяців тому

    Great video!

  • @aliomar9594
    @aliomar9594 Рік тому +1

    Great

  • @onurdatascience
    @onurdatascience Рік тому

    Great project!

  • @Al-Ahdal
    @Al-Ahdal 5 місяців тому

    In event_len column there are many row items with km, mi, h..... how can we check all these to get the correct count, and how to extract numbers only. Should we be using REGEX for that?

  • @chalamohamed2013
    @chalamohamed2013 11 місяців тому

    Hello Ryan,
    Thanks for sharing your skills.
    I would like to understand why you have dropped Athlethe Club and Country ?
    I thinks it would be better if you had dropped rows whose have an empty value than you can modify the type of column.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  11 місяців тому

      There’s a lot of ways you could look at a dataset. I did this a long time ago so can’t remember exactly why I did but for what I was working on I don’t believe it mattered

  • @michaelg9359
    @michaelg9359 8 місяців тому

    thanks for the vid -- very good - your camera view cuts off far right side of visual, though

  • @MiguelGracia-g2d
    @MiguelGracia-g2d 2 місяці тому

    Hi, had a quick question!
    at 17:27 would there be any downside to me using something like df[df['Event name'].str.contains('USA')] instead?
    Thanks!

  • @katehudson7405
    @katehudson7405 7 місяців тому

    is it okay if I add this project to my portfolio after completing it? great video!

  • @JC_333
    @JC_333 Рік тому

    Subscribed!

  • @rishidixit7939
    @rishidixit7939 4 місяці тому

    Between Matplotlib and Seaborn which one should be used or both should be used ?

  • @Naadiaajmal
    @Naadiaajmal День тому

    NameError: name 'df' is not defined

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  День тому

      Ask in our discord server

    • @Naadiaajmal
      @Naadiaajmal День тому

      @@RyanAndMattDataScience thank you. Please link it. I cant see it on your profile

    • @BreakoutCards
      @BreakoutCards 23 години тому

      @@Naadiaajmalpinned comment

    • @Naadiaajmal
      @Naadiaajmal 23 години тому

      @@BreakoutCards thanksss

  • @jonathangarcia8124
    @jonathangarcia8124 4 місяці тому

    Is this lesson possible in vscode or would I need to learn to use jupyternotebook?

  • @mikefranko2832
    @mikefranko2832 11 місяців тому

    What is the reason behind cleaning up NaN values?

  • @dennisbunarta1190
    @dennisbunarta1190 5 місяців тому

    I can't find 2020 year of event.. Any solution?

    • @J4vierC
      @J4vierC 3 місяці тому

      same problem here, i made with .contains() and i dont know why i cant return 2020 rows

  • @mikefranko2832
    @mikefranko2832 11 місяців тому

    What is the reason behind dropping columns?

  • @iniuntukutube
    @iniuntukutube Рік тому

    halloo, ryan... can i ask something? is there any other tools (software/ application/ website) that can be used for using python? im so soorry for the question,, please dont laugh for me,, hehehehe... im very new beginner learning for data analyst... i have a dream to become business analyst... do u have some suggestion for me please?

  • @SriramKoyalkar
    @SriramKoyalkar 4 місяці тому

    Where do I find this project source code?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  4 місяці тому

      I plan on putting all the code from videos on my website, but I need to scale up a bit dont have the resources atm

  • @GreyHatGenX
    @GreyHatGenX 2 місяці тому

    comment

  • @tosinwilliams9343
    @tosinwilliams9343 8 місяців тому

    Thanks Ryan