SQL Case Statements For Data Science Interviews in 2021

Поділитися
Вставка
  • Опубліковано 25 лип 2024
  • Let's cover 3 types of SQL case statements that show up on data science interviews and that you'd use daily as a data scientist. I'll cover 3 different use cases for SQL case statements, explain the nuances, and what goes on behind the scenes when you execute your SQL query. All of this hopefully to make you better on your data science interviews.
    Follow along interactively by coding the solutions to the questions:
    - CASE statement for numerical values: platform.stratascratch.com/co...
    - CASE statement for text: platform.stratascratch.com/co...
    - The most popular CASE statement on SQL interviews: platform.stratascratch.com/co...
    ______________________________________________________________________
    This series is for both beginner and intermediate data scientists and analysts interested in learning how to solve common data science interview questions in SQL. These are real data science interview questions. For some background context and an intro about what this series is about: bit.ly/36kKbxG
    👉 Subscribe to my channel: bit.ly/2GsFxmA
    👉 Playlist for more data science interview questions and answers: bit.ly/3jifw81
    👉 Playlist for data science interview tips: bit.ly/2G5hNoJ
    👉 Practice more real data science interview questions: platform.stratascratch.com/co...
    ______________________________________________________________________
    Timestamps:
    Intro: (0:00)
    Categorizing based on numerical values: (1:13)
    Case statement helpful tip #1: (3:33)
    Case statement helpful tip #2: (5:30)
    Categorizing based on text: (6:30)
    SQL helpful tip #3: (8:41)
    Quick & dirty Case statement: (9:52)
    Case statement helpful tip #4: (11:46)
    Outro: (14:00)
    _____________________________________________________________________
    About The Platform:
    I'm using StrataScratch, a platform that allows you to practice real data science interview questions. There are over 1000+ interview questions that cover coding (SQL and python), statistics, probability, product sense, and business cases.
    Head over to platform.stratascratch.com/co... to solve a data science coding question.
    ______________________________________________________________________
    Contact:
    If you have any questions, comments, or feedback, please leave them here!
    Feel free to also email me at nathan@stratascratch.com
    ______________________________________________________________________
    #sql #sqlinterviews #datascienceinterviews

КОМЕНТАРІ • 84

  • @TinaHuang1
    @TinaHuang1 3 роки тому +18

    Your thumbnails are getting more awesome :D Also awesome content ofc 🔥

    • @stratascratch
      @stratascratch  3 роки тому +3

      haha, I'm trying to push the ridiculousness up a level to get that CTR up.

  • @gauravdutta5486
    @gauravdutta5486 2 роки тому +7

    These practical scenarios of case are really helpful n useful even for our own IT implementations.

  • @mosherchtman
    @mosherchtman 2 роки тому +8

    in my opinion, the most confusing part of the question is understanding the need for Host_id, a bit tricky.

  • @rahaapooya
    @rahaapooya 3 роки тому

    Very helpful, thanks a lot!

  • @malvikachandhok8571
    @malvikachandhok8571 3 роки тому

    Thank you! Your videos are super helpful :)

  • @elfridhasman4181
    @elfridhasman4181 2 роки тому

    Thank you very much for sharing sir, you help me a lot, God Bless You and Family :)

  • @niveditakumari701
    @niveditakumari701 3 роки тому

    Really Informative video

  • @miraskhabibulla1345
    @miraskhabibulla1345 3 роки тому

    you are awesome! keep it up!

  • @kathyle6275
    @kathyle6275 2 роки тому

    this video is super helpful and awesome, thank you a bunch :D

    • @stratascratch
      @stratascratch  2 роки тому +1

      Glad you found it educational and helpful!

  • @damian2955
    @damian2955 3 роки тому

    nice one, thanks

  • @rollinas1
    @rollinas1 3 роки тому +1

    The most valuable training material on SQL ever.
    Sampling data, probability, p-value - these things would be very interesting for me.

    • @stratascratch
      @stratascratch  3 роки тому +2

      Thanks for the kind words. I am going to be doing some python coding sometime in the near future which would include some data sampling (or more specifically how to do that). I'll be creating tutorials on how to work with python soon once I get through some "interviewing" material. Hope you keep watching.

    • @malvikachandhok8571
      @malvikachandhok8571 3 роки тому

      @@stratascratch Will look forward to that as well! Thanks again :)

  • @miteshbagwe9765
    @miteshbagwe9765 2 роки тому

    Excellent explanation. Thanks for posting.

  • @adelabdallah3833
    @adelabdallah3833 2 роки тому

    Can you please show examples of counting retained users by date difference between first_transaction_date and transaction date but in a dynamic way.

  • @mikemihay
    @mikemihay 3 роки тому

    Excellent content!

  • @sunny2253
    @sunny2253 Рік тому

    I gave a sql interview recently and 3 out of 4 questions were exactly based on this logic

  • @songsong2334
    @songsong2334 3 роки тому

    Thanks for the good content! Thumbs up! One question from some of ur videos, how did u come up the metrics design by looking at the SQL data?

    • @stratascratch
      @stratascratch  3 роки тому

      It's usually through looking at the table schema first. I can usually figure out how to design the metrics. I mainly only look at the schema because on interviews, you don't usually have access to the data so you have to make all your assumptions based on the column names. The platform is nice in that you can look at the data so I then will explore the dataset to make sure my assumptions are correct.

  • @sonyrao2615
    @sonyrao2615 2 роки тому +2

    this content is awesome! thank you! I had a question, in the business inspection scores problem, why did you need to write "distinct inspection_type" as a part of the sub query. I don't understand why/how you knew we need to use distinct to solve this.

    • @stratascratch
      @stratascratch  2 роки тому

      mainly because I didn't want duplicates in my output so I used distinct in the subquery. We probably could have used distinct at the very end but I just chose to do it in the subquery. A lot of this is about playing with the data but if you're on an interview you can ask the interviewer about whether or not you care about uniqueness or duplicates in the final output.

  • @followmycrafts8811
    @followmycrafts8811 3 роки тому

    Thank You

  • @zulumane
    @zulumane 3 роки тому +2

    I LOVE YOUR CONTENT! Could you maybe do a series on the most common SQL scenarios that a DS would face in a work environment?

    • @stratascratch
      @stratascratch  3 роки тому

      Thanks! Many of my interview questions series can be adapted for the work environment. In the newer videos I always try to explain how you'd solve the problem on an interview and how to do so on the job so I try to straddle both realms. But I see your point -- If I find a good set of questions that are perfect for the work environment, I'll definitely cover it in a video

    • @zulumane
      @zulumane 3 роки тому

      @@stratascratch Awesome thanks! Another suggestion might be going over some of the SQL theory, specifically comparing computing power used for certain queries or certain parts of those queries. I know you mention it in some of your videos but that could still be helpful! Thanks again for all the great resources that you put on UA-cam, I'll be using Strata Scratch very soon!

    • @stratascratch
      @stratascratch  3 роки тому

      @@zulumane Yea that's a great suggestion. For some of the videos, I can dive deeper into optimizations and performance.

  • @majafuntv4538
    @majafuntv4538 3 роки тому

    Best SQL Tutorial

    • @stratascratch
      @stratascratch  3 роки тому +2

      Thanks! I'm glad you've enjoyed them. More to come! And maybe I'll dive into some python if people are interested.

  • @3danim8r1
    @3danim8r1 2 роки тому

    Hey man Thanks for sharing.. Just a question i have not found second and third question on your website could you please suggest how i can find these questions

  • @mdabulkalamazad6775
    @mdabulkalamazad6775 3 роки тому

    Thanks a lot Sir

    • @stratascratch
      @stratascratch  3 роки тому

      Thanks so much! Glad you enjoyed this one.

  • @NikhilKumar-ot3vi
    @NikhilKumar-ot3vi 2 роки тому

    Sir please mention you are doing this in postgres SQL. there's nothing like ilike in mysql.

  • @classkori5507
    @classkori5507 3 роки тому

    Thank you

    • @stratascratch
      @stratascratch  3 роки тому

      Thanks for watching. Let me know if you have any topics you'd like me to cover. Happy to make videos to help others.

  • @meghasyam427
    @meghasyam427 3 роки тому

    Ilike is cool one. Learnt a new function today.
    Btw nate, what rdbms is it exactly...like hive or mysql or something else

    • @stratascratch
      @stratascratch  3 роки тому +1

      This is postgres. Thanks for watching

  • @observer698
    @observer698 2 роки тому

    So especially the 3rd one, what if we use Where to select the special cases???

  • @iitian2012
    @iitian2012 3 роки тому

    Your sql videos are really very helpful....It would be great if you could prepare a video on real time sql query containing multiple sub queries (basically multiple from, may be 3-4 inner queries ) covering around 100 plus line of codes......Bsically need to understand how someone approach. to code such a large and complex query and where do they start querying since that involve multiple set of inner queries....

    • @stratascratch
      @stratascratch  3 роки тому

      oh man...yea I see where you're going with this. Basically, I probably would have 3-4 inner queries and CTEs but I'd probably also have a lot of tables and temp tables to help me manage the data as well. Let me think about what I can do for 100+ lines. It's not something too easy to come up with but I'll see what I can do.

    • @iitian2012
      @iitian2012 3 роки тому

      @@stratascratch thanx a lot...we are really learning a lot 😊

    • @stratascratch
      @stratascratch  3 роки тому +2

      @@iitian2012 more videos to come!

  • @pendyalarajeshwari499
    @pendyalarajeshwari499 2 роки тому

    Hi, could u help me in this query if u are having two tables flights consists of following columns I'd,company name , departure name and jet _id and jet consists of I'd , capacity here jet_id is foreign key reference id in jet table I want sort the data with multiple condition like if total number of rows are equal by each company name then sort by sum of capacity column in jet and if count two rows are not equal sort by count of each company and then depature name, and still equal count sort by departure time

  • @abhistdixit3614
    @abhistdixit3614 3 роки тому

    Hi, Can you pls help let me know why we did not use this filter as well: fb_search_results.notes ILIKE CONCAT ('%',fb_search_results.query,'%')

  • @gauthamambethkar4483
    @gauthamambethkar4483 2 роки тому +3

    Hello Nate, thanks for this awesome video. I have one doubt. In 1st question, inside the cte why do we use group by 1,2,3 because inside the cte we don't use any aggregate function. But if I remove that the average column value doesn't come right in the final output. I'm missing something here. Could you please help?

    • @stratascratch
      @stratascratch  2 роки тому

      What I'm trying to do with the group by 1,2,3 is dedup any information since I want everything to be unique. Probably not required on this dataset though but give it a try on the platform

    • @gauthamambethkar4483
      @gauthamambethkar4483 2 роки тому +1

      @@stratascratch but without that group by the answer doesn't come right. It is needed. But not sure why

  • @pakfrd7887
    @pakfrd7887 2 роки тому

    Anyway I can pay for an hour over teams to check my answers?

  • @vargaslu85
    @vargaslu85 2 роки тому

    What are the requirements to get in as a DS, questions are seem too simple.

  • @portiseremacunix
    @portiseremacunix 3 роки тому

    Can I simply use CTE instead of the subqueries? CTE looks very clean.

    • @stratascratch
      @stratascratch  3 роки тому +2

      Yes, of course use a CTE. I prefer them as well. Sometimes when I'm working through a problem and communicating my solution, it's more natural to use them as subqueries just based on how I'm explaining the concepts.

  • @rakibraihanrimon8784
    @rakibraihanrimon8784 3 роки тому

    Tnx

  • @priyankalad7789
    @priyankalad7789 3 роки тому

    count(case when position

    • @stratascratch
      @stratascratch  3 роки тому

      Because there's no 2nd column being returned. It's just 1 column which is that aggregate. So what would you be grouping by? Let me know if that makes sense.

  • @mimibhattacharjee9668
    @mimibhattacharjee9668 3 роки тому

    Do you have videos on data model questions from facebook?

    • @stratascratch
      @stratascratch  3 роки тому

      Unfortunately not really. I don't cover any data modeling questions yet...

  • @henryyin2731
    @henryyin2731 2 роки тому +1

    I am not sure if you can still see this comment after almost a year but I am wondering for the first Airbnb question, why do you have to specify "Group by 1, 2, 3". I didn't specify and my answer is so close to the right answer, and could you explain what does 1, 2, 3 mean in this situation?

    • @stratascratch
      @stratascratch  2 роки тому

      Group by 1,2,3 will group the output by the first 3 columns - host id, num of reviews, and price. This helps to de-duplicate any rows. You kind of want to dedup because there shouldn't be any rows where there are duplicates of host_id, number of reviews, and price. These 3 combinations of columns are unique. Hope this clears it!

  • @ahassignment3961
    @ahassignment3961 3 роки тому +1

    I wonder why I get a different answer for the average price in the first question?
    This is my SQL
    select A.host_pop_rating, min(A.price) as min_price, max(A.price) as max_price, avg(A.price) as avg_price
    from
    (select *,
    case when number_of_reviews = 0 then 'New'
    when number_of_reviews

    • @stratascratch
      @stratascratch  3 роки тому

      Can you post this solution onto the user discussion board on this question? Someone can help you within 1-2 days.

    • @stratascratch
      @stratascratch  3 роки тому

      Here's the link to the question (platform.stratascratch.com/coding-question?id=9632&python=)

  • @huanchenli4137
    @huanchenli4137 3 роки тому +3

    Dont understand why here needs a group by 1,2,3 on in inner code

    • @stratascratch
      @stratascratch  3 роки тому +1

      I think it's because the case statement (column 4) was considered an aggregate so I required to group the columns in order for the code to run successfully.

    • @nadavnavon2140
      @nadavnavon2140 2 роки тому

      It could be that instead of the GROUP BY, just use a DISTINCT initially at the select statement, its worth checking it out..
      Seems like the group by acts for getting unique combinations of the first 3 columns.
      Personally I didin't know that you can set a group by when there is no aggregate. nor that a case statement is considered as an aggregate

  • @manugupta3062
    @manugupta3062 3 роки тому

    Which coding platform it is??

    • @stratascratch
      @stratascratch  3 роки тому

      This is off of platform.stratascratch.com which uses a postgres db engine and python3

  • @atifkhan9560
    @atifkhan9560 2 роки тому

    why we cannot use this query
    select
    case
    when number_of_reviews40 then 'Hot'
    end as a,
    min(price),avg(price),max(price)
    from airbnb_host_searches
    group by a
    order by a

    • @stratascratch
      @stratascratch  2 роки тому

      Try it in the platform! I don't see anything wrong with it but I haven't executed the query to see what the differences are between mine and yours.

  • @modhua4497
    @modhua4497 3 роки тому

    What is the link for this question? Thanks

    • @stratascratch
      @stratascratch  3 роки тому

      It's in the description. But here it is copy and pasted:
      - CASE statement for numerical values: bit.ly/2Wmnvad
      - CASE statement for text: bit.ly/3p0eboP
      - The most popular CASE statement on SQL interviews: bit.ly/2ITCi9f

    • @modhua4497
      @modhua4497 3 роки тому

      @@stratascratchthanks Nate. Is there “additive Case statement” in sql? If yes can you show an example? Thanks a lot! You have been very helpful to me in learning sql.
      Also can you show an example of “cross apply” vs “inner join”?

    • @stratascratch
      @stratascratch  3 роки тому

      @@modhua4497 What do you mean by additive case statement? I haven't used cross apply in SQL but here are some resources I found
      1. stackoverflow.com/questions/1139160/when-should-i-use-cross-apply-over-inner-join
      2. explainextended.com/2009/07/16/inner-join-vs-cross-apply/
      Seems like you'd use cross apply where there's no simple join query. I haven't had to use cross apply in my work but maybe it's because I use python for anything I can't solve with my knowledge in SQL. Hope that helps!

    • @modhua4497
      @modhua4497 3 роки тому

      Thanks Nate. Have a good day.

  • @priyankadhanawade8631
    @priyankadhanawade8631 3 роки тому

    please create some content on TIMESTAMP .thank you!

    • @stratascratch
      @stratascratch  3 роки тому

      Will try to find some examples! Thanks for the suggestion.

  • @fitriwidyananda9598
    @fitriwidyananda9598 2 роки тому

    Hello!
    I was trying to solve the second problem using the query below but why the result is different from yours. Can you tell me, please? thank you.
    WITH new_view AS(
    SELECT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
    number_of_reviews,
    price,
    CASE
    WHEN number_of_reviews = 0 THEN 'New'
    WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
    WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
    WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
    WHEN number_of_reviews > 40 THEN 'Hot'
    END AS host_popularity
    FROM airbnb_host_searches
    GROUP BY 1,2,3)
    SELECT host_popularity AS host_pop_rating,
    MIN(price) AS min_price,
    MAX(price) AS max_price,
    AVG(price) AS avg_price
    FROM new_view
    GROUP BY host_pop_rating;

    • @stratascratch
      @stratascratch  2 роки тому

      Hi, is it possible for you to ask this question on the platform's discussion forum? Someone on my team will answer your question within a few days!

    • @observer698
      @observer698 2 роки тому

      Your host_id creation has one more value, maybe that makes the difference.