Solving Real-World Data Science Interview Questions! (with Python Pandas)

Поділитися
Вставка
  • Опубліковано 13 січ 2025

КОМЕНТАРІ • 88

  • @KeithGalli
    @KeithGalli  2 роки тому +10

    Thank you Brilliant for sponsoring this video! Check out brilliant.org/KeithGalli/ to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription.
    Hope you all enjoyed this video :). I'm working on a bunch of new content right now so be on the lookout for another video or two in the next couple of weeks. If you have any questions about the topics covered in this or have a request for a future video, let me know here in the comments!!

    • @edwardj.warden5072
      @edwardj.warden5072 2 роки тому +1

      Hi @KeithGalli.
      I’ve got two questions to ask you. I have watched lots of your videos that I like, and learned a lot.
      My question is do you think that the certificate that Datacamp provides for data science is worth to earn, and would it help me to find a data science job?
      And, what best place, you recommend, in online to get certificate for data science that would help me to find a data science job?
      Thank you.

  • @hardiktyagi1955
    @hardiktyagi1955 2 роки тому +37

    At 37:48
    I work for Amazon's RPA team, trying to make a career in data science. Last month I was appearing for an IJP and got the same question in SQL coding round.
    Thanks for making this Keith. Keep them coming.

    • @KeithGalli
      @KeithGalli  2 роки тому +2

      Dang that's too funny. My hope is that this video will help people in similar situations to yours moving forward. Thanks for watching!

    • @shivamburnwal7765
      @shivamburnwal7765 2 роки тому

      Hey Hardik, can you tell me why exactly you are trying to make a career in Data Science? Is it because RPA doesn't have a good future in the industry or it is because you personally prefer the Data Science field. I am asking this question as I am also starting as a member of EXL's RPA team.

  • @Frdy12345
    @Frdy12345 2 роки тому +2

    Here's a one liner chained version I've come up with for coding #6
    df = ms_user_dimension.merge(ms_acc_dimension, on = 'acc_id').merge(ms_download_facts
    ,on ='user_id').pivot_table(index = 'date',columns = 'paying_customer',values = 'downloads',aggfunc ='sum').reset_index().query('no > yes')

  • @danielefarotti1061
    @danielefarotti1061 Рік тому +4

    I really like your approach in explaining things. I am currently transitioning from pure maths into data science, and I find these videos very helpful!

  • @masked00000
    @masked00000 2 роки тому +4

    You're literally the best tutor I have seen, I myself am a Data Scientist but the amount of data science approaches I learn from you is incredible, I started from your channel and always wait for you to post new video, Hat's off. Love from Pakistan.

  • @arashomranpour5468
    @arashomranpour5468 2 роки тому +6

    good having you back

  • @yogeshuttekar8542
    @yogeshuttekar8542 2 роки тому +5

    Glad to see you back mate. I have really learned more from your videos than attending University.

  • @expat2010
    @expat2010 Рік тому +1

    I really enjoy the real world feel of your videos. Probably now ChatGPT would be a lot faster than searching Stackoverflow or the Pandas docs for those things that one doesn't know by heart.

  • @dinkinflicka157
    @dinkinflicka157 2 роки тому +2

    Yay! Another real world problem solving video. Thanks Keith. Love your content as always.

    • @KeithGalli
      @KeithGalli  2 роки тому +2

      Glad to hear it, I appreciate your support!! :)

  • @jovanjanjic9029
    @jovanjanjic9029 Рік тому

    In question #3 Counting Instances in Text you should add filters=re.I to account for capital letters: len(re.findall(r'\bbull\b', text, flags=re.I)))

  • @bobbyg603
    @bobbyg603 2 роки тому +1

    Glad you're back bro!

  • @deepaksaikumar5178
    @deepaksaikumar5178 2 роки тому +2

    Hi Keith,
    You have been a great resource to learn Python and Data science-related skills.
    Thank you!

  • @nicholasgrandizio7596
    @nicholasgrandizio7596 2 роки тому +3

    Thank you for all the hard work you put into teaching Data Science. Your videos and others like you, provide more to the community such as myself trying to build a career in data than what University Programs provide. Your playing an important role in the future of Data Science by leading current students along the path to future industry leaders.

  • @phoenixcollege6608
    @phoenixcollege6608 2 роки тому

    makes it easy to understand
    watching your vid on a friday night and these are the best years of my young life

  • @MiguelNFer
    @MiguelNFer 2 роки тому

    Glad you're back bro ;) love this types of vids. Love from Portugal

  • @mekuzeeyo
    @mekuzeeyo 2 роки тому +1

    Thank you for coming back🤗

  • @zanerios2776
    @zanerios2776 2 роки тому

    really love the style and format of vid, just subbed

    • @KeithGalli
      @KeithGalli  2 роки тому

      Glad you liked it man! Thanks for the sub

  • @troy671
    @troy671 2 роки тому

    Thanks for the video. It is great to see your thinking process even though you are not an expert in pandas.

  • @laurentreynaud4404
    @laurentreynaud4404 2 роки тому +1

    Thank you so much for these data science courses!

  • @phsopher
    @phsopher 2 роки тому +4

    For the fifth problem, pandas has an in-built percentage difference method (pct_change). The solution could be as follows for example:
    sf_transactions['year_and_month'] = sf_transactions.created_at.dt.strftime("%Y-%m")
    monthly_revenue = sf_transactions.groupby(["year_and_month"]).sum().reset_index()
    monthly_revenue['pct_change'] =(monthly_revenue.value.pct_change()*100).round(2)
    monthly_revenue[['year_and_month','pct_change']]

    • @KeithGalli
      @KeithGalli  2 роки тому +3

      Oh cool, I didn't know that! Thanks for sharing :). Nice solution 🤠.

  • @iamTHIEN013
    @iamTHIEN013 Рік тому

    Hi Keith , Thank you so much for these videos, could you make more videos about power PI or Tableau, really really appreciate it .

  • @netanelmad
    @netanelmad 2 роки тому

    Thanks for the video! Would love to see your approach to more non-coding questions specifically :)

  • @niteshprajapat7918
    @niteshprajapat7918 2 роки тому +1

    You are gem ❤️ the way you explain concepts are at next level 🔥🔥

  • @kennethstephani692
    @kennethstephani692 Рік тому

    Great video, Keith!

  • @anonviewerciv
    @anonviewerciv 2 роки тому

    That first one and others are SQL problems converted to pandas. I suppose that's a decent way to get basic pd questions. (28:48)
    17:20 I know it's more a reference to the stock market terms, but I can't stop thinking of Fallout: New Vegas.
    1:11:00 If you have the locations that's just a simple matter of putting it on a map and seeing where it clusters the most.
    1:28:00 Context, context, context. Was that the only reduction?

    • @konstantinpluzhnikov4862
      @konstantinpluzhnikov4862 2 роки тому +1

      These stratascratch tasks could be solved in sql. The site provides this option.

  • @a.5214
    @a.5214 2 роки тому +2

    amazing! we want more of this stuff 👌

    • @KeithGalli
      @KeithGalli  2 роки тому +2

      Appreciate it! More coming soon :)

  • @omer55kurt
    @omer55kurt 2 роки тому +2

    Yeeeeeeeeyyy!!!! i love your enthusiastic cry of success :D 26:31

  • @ansekao4516
    @ansekao4516 2 роки тому

    Great video, please do more like that. Watching you for a long time

  • @edwardj.warden5072
    @edwardj.warden5072 2 роки тому

    Very helpful. Thank you Keith.

  • @wahaha108
    @wahaha108 2 роки тому

    long time no see keith, welcome back 😀😀

  • @adeafni9544
    @adeafni9544 2 роки тому

    Thank you Keith, you're amazingg, keep it up!!!

  • @kumaripritika2799
    @kumaripritika2799 2 роки тому +1

    Really helpful video!

  • @n_12346
    @n_12346 Рік тому

    Brilliant video! very helpfil

  • @wiz8058
    @wiz8058 2 роки тому

    Great work man!! you're always doing the best.🔥🔥🔥

    • @KeithGalli
      @KeithGalli  2 роки тому

      Thank you for the support as always!!

  • @DendrocnideMoroides
    @DendrocnideMoroides 2 роки тому

    yes please make more videos like this

  • @AIdevel
    @AIdevel 2 роки тому

    The problem lays in your use of round function you supposed to wrap the equation with round and then select the decimals 2

  • @udayabhaskar1495
    @udayabhaskar1495 2 роки тому

    Thank you for this video!👍

  • @pratikpawar336
    @pratikpawar336 2 роки тому

    great video, please make more video like this

  • @prof_albert
    @prof_albert 2 роки тому

    That was great. Bravo and all of your videos are awesome 🌺👌💞🤩💪

  • @АндрейТоцкий-л4и
    @АндрейТоцкий-л4и 9 місяців тому

    It is very great. Thank You!

  • @iamfavoured9142
    @iamfavoured9142 2 роки тому

    Welcome back Keith 💃🏻💃🏻

  • @fantasyxpress7966
    @fantasyxpress7966 Рік тому

    Is dsa important for data scientists too keith

  • @user-zm6kj7oi3d
    @user-zm6kj7oi3d 2 роки тому +1

    you are helping a high schooler out by being back

    • @KeithGalli
      @KeithGalli  2 роки тому +2

      More videos coming soon :)

  • @9eartheyes
    @9eartheyes 2 роки тому

    great video! thank you!

  • @ranjithraghunathan1267
    @ranjithraghunathan1267 2 роки тому

    Thanks Keith

  • @mohanadashour4835
    @mohanadashour4835 3 місяці тому

    I think that for the probability of getting a sister given that they have 2 children is 0.75
    Sample space
    Sister,brother
    Brother,sister
    Sister,sister
    Brother,brother
    3/4
    And for 3 children you need to account for the chance that the three children are girls so it will be 7/8

  • @CultureofSpeech
    @CultureofSpeech 7 місяців тому

    Bravo 👏 Lit 🌠 Impressive 👌 ❤ Gratitude 🥳 for your satisfactory Work 💪🚀💯💪

  • @MikeResurrected
    @MikeResurrected Рік тому

    Could you actually google for help during a DS coding interview nowadays?

  • @fcoatis
    @fcoatis 2 роки тому

    Great video Keith. I just got curious how you comment a block of code?

  • @RahmanIITDelhi
    @RahmanIITDelhi 2 роки тому

    Hey ,Keith ..Can we access library during the solving at real time exam?

  • @SerDunk-224
    @SerDunk-224 Місяць тому

    Hey Keith, dunno if this changed from 2 years ago when you posted this video, but currently the 'Check Solution' button is only available for Premium Users :/
    Still, seeing how you go about solving the problems is great to get in the right mindset!

  • @vanshmalik1446
    @vanshmalik1446 2 роки тому

    Hey!
    Does anyone knows more of the data analysis pay after placement programs accepting applications all over the globe?

  • @ranjithraghunathan1267
    @ranjithraghunathan1267 2 роки тому

    how can i download or copy the raw dataset for each part ?

  • @mehdismaeili3743
    @mehdismaeili3743 2 роки тому

    excellent, thanks.

  • @AIdevel
    @AIdevel 2 роки тому

    Replace yes with 1 and no with zero and sum them

  • @konstantinpluzhnikov4862
    @konstantinpluzhnikov4862 2 роки тому +2

    LifeHack: if you are short of money, but want to use a service, use vpn of relatively poor country. Result will be interesting.

  • @finnnelson5472
    @finnnelson5472 2 роки тому

    TY :)

  • @manphu2515
    @manphu2515 2 роки тому

    Thanks so much for the video, learn a lot from you. And you are super cute 😍

  • @jovanjanjic9029
    @jovanjanjic9029 Рік тому +4

    Your solution for the Probability of Having a Sister question is not correct. We know for sure that the random girl must be from the [1, 2, 3, 4] part of the dataset, which amounts to 0.7. We should divide the probabilities for 1, 2, 3, 4 with 0.7, to get the probabilities that the girl is from each of these families. She theoretically can't be from families with 0 and 5 children. Essentially, you are counting in the possibilities of she being in families 0 and 5, even tough it's impossible. (In practical terms, you are needlessly being blind about the info you already have.) So the correct solution is: 0.25/0.7 x 0 + 0.2/0.7 x 0.5 + 0.15/0.7 x 0.75 + 0.1/0.7 x 0.875 = 0.42857, which is 0.43 when we round it up.

  • @balakumar.n4891
    @balakumar.n4891 2 роки тому

    super

  • @Manishsingh-u1r9q
    @Manishsingh-u1r9q 29 днів тому

    @KeithGalli I don't know why you look like Elon Musk must to me 🤣🤣

  • @meujie8835
    @meujie8835 2 роки тому

    Hi, I'm Jiemeu and I love your channel. I hope to discuss business cooperation with you.....

  • @YunusFidan_
    @YunusFidan_ 2 роки тому

    Noice!

  • @YaIdcReportMe
    @YaIdcReportMe 7 місяців тому

    Probably not a good use of your time to watch this guy struggle with coding questions for over an hour

  • @doulaishamrashikhasan8425
    @doulaishamrashikhasan8425 2 роки тому

    you disappeared again 😢

    • @KeithGalli
      @KeithGalli  2 роки тому +3

      My apologies! I have a video that I'm finalizing the editing for. It should be out in the next 3-4 days and then I'm going to try to be more consistent!!

  • @ratkillerthe
    @ratkillerthe Рік тому

    I solved the Bathrooms/Bedrooms problem with:
    cols_of_interest = airbnb_search_details[['city', 'property_type', 'bathrooms', 'bedrooms']]
    property_results = cols_of_interest.groupby(['city','property_type']).agg(
    avg_bathrooms = ('bathrooms', 'mean'),
    avg_bedrooms = ('bedrooms', 'mean')).reset_index()

  • @thebunnda5450
    @thebunnda5450 5 місяців тому

    Simple Solution for #5
    (sf_transactions
    .assign(year_month = lambda df_: df_.created_at.dt.strftime('%Y-%m'))
    .groupby('year_month', as_index=False)
    .agg(revenue = ('value', 'sum'))
    .assign(revenue_diff_pct = lambda df_: df_.revenue.sub(df_.revenue.shift(1)).div(df_.revenue.shift(1)).mul(100))
    .loc[:, ['year_month', 'revenue_diff_pct']]
    )