Advanced Facebook Data Science SQL interview question [RANK()]

Поділитися
Вставка
  • Опубліковано 12 січ 2025

КОМЕНТАРІ • 79

  • @oscararmandocisnerosruvalc8503
    @oscararmandocisnerosruvalc8503 2 роки тому +1

    Bro, your channel is just gold !!!! I´ve been spending 3 hours a day watching every day because before watching your videos I was not able to solve a single question, but since I started watching your videos everything seems to be a piece of cake.

    • @stratascratch
      @stratascratch  2 роки тому

      Wow, thanks! Music to my ears. I am happy the videos I prepared, together with my team have helped you.

  • @kumarshishir92
    @kumarshishir92 3 роки тому +14

    Thanks for the walkthrough bro! Key takeaways: Understand data, formulate approach and then write code. BTW you used 2 CTEs which means 2 different joins on the same set of tables. You could have just used the one join along with sum(case when ...) statements to sum up the Dec and Jan comments. Just some food for thought. Cheers!

  • @ishannahsi
    @ishannahsi 3 роки тому +5

    For sure the best resources online for interview prep for not just solving SQL problem but also approaching it conceptually. Thanks again.

    • @stratascratch
      @stratascratch  3 роки тому +1

      Thanks for watching! Really appreciate the support.

  • @rick2591
    @rick2591 2 роки тому +4

    I took a different approach:
    with sums as (
    SELECT country,
    sum(case when created_at between cast('2019-12-1' as date) and cast('2019-12-31' as date) then number else 0 end) dec,
    sum(case when created_at between cast('2020-1-1' as date) and cast('2020-1-31' as date) then number else 0 end) jan
    FROM users a inner join counts b on a.user_id=b.user_id)
    group by country),
    ranks as (
    select country,
    dense_rank() over (partition by country order by dec) decRank,
    dense_rank() over (partition by country order by jan) JanRank
    from sums)
    select country
    from ranks
    where janRank < decRank

  • @d.xinshengguo970
    @d.xinshengguo970 3 роки тому +2

    Just wanted to say that your videos and the StrataScratch platform have been incredibly helpful in preparing for interviews. Thanks for the amazing content.

  • @parantikaghosh4396
    @parantikaghosh4396 3 роки тому +1

    you're the best SQL teacher I ever got!! :O amazing explanations :)

    • @stratascratch
      @stratascratch  3 роки тому

      Thanks for watching! Really appreciate it 😎

  • @manphu2515
    @manphu2515 2 роки тому

    Great solution, learn a lot from this. Thank you

  • @michaelb1082
    @michaelb1082 3 роки тому

    This explanation is so good. I wish I found this channel a few months earlier!

  • @sauravkumar9454
    @sauravkumar9454 3 роки тому +2

    Great Explanation Nate. Thanks. Btw I also implemented the solution using group by on "Country" and "Month" and pivoted the table on the month column then used dense_rank() over "December" and "January" to extract the country that improved their ranking. It sounds complex though.😁

  • @callravik
    @callravik 2 роки тому

    God, am I glad I found StrataScratch or what?

  • @ashokvishalsr5591
    @ashokvishalsr5591 2 роки тому

    That was awesome!💥🔥💡 Thanks a lot.

  • @shivaniagarwal1841
    @shivaniagarwal1841 3 роки тому +3

    Hi, thank you for your time and effort in creating these videos! Immensely helpful!I had a query about the date syntax - Is there a specific reason you use the operator clause for filtering instead of between?

    • @stratascratch
      @stratascratch  3 роки тому +1

      No there was no specific reason. In reality, you can use either. I often switch. between the two myself. Thanks for watching!

  • @kebincui
    @kebincui 3 роки тому

    Thanks Nate👍🏻👍🏻👍🏻

  • @joaopedroreissilva7075
    @joaopedroreissilva7075 3 роки тому

    That was AMAZING!

  • @faiazrummankhan5589
    @faiazrummankhan5589 3 роки тому +1

    Your videos helped me a lot in increasing my SQL skills ! Do you use any other language (python / R) at your work ?! Thanks !

    • @stratascratch
      @stratascratch  3 роки тому +1

      Yes, I use python a lot! Some people use R but python helps with automation

  • @priyapatil724
    @priyapatil724 3 роки тому

    I have doubt that in your output Australia is having same number of comments in both the months means there is no rise so how it should be in the output. Was is necessary to use ranking function? Instead can we compare the comments?

  • @fadwa2413
    @fadwa2413 3 роки тому

    best of the best
    please, keep going

  • @oguzhanakkurt3947
    @oguzhanakkurt3947 3 роки тому

    If you can shoot a video each week, it will be amazing for us. Thanks Nate :)

    • @stratascratch
      @stratascratch  3 роки тому +1

      That was my original plan! It's been difficult but I have more videos coming out soon.

  • @SelmanAy
    @SelmanAy 3 роки тому +1

    ​ @Nate at StrataScratch Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.

  • @yuthpatirathi2719
    @yuthpatirathi2719 2 роки тому

    Thanks nate

  • @szco9814
    @szco9814 3 роки тому

    Dude, you rock!

  • @playlist4371
    @playlist4371 3 роки тому

    Thanks for this Nate, really helpful. Quick question: is there a reason why you are summing in the window functions? I am assuming each country only has one row

    • @stratascratch
      @stratascratch  3 роки тому

      The window function just helps with the ranking. I'm not partitioning my data but just taking the ranking over the entire window (which is the entire dataset) so that I can get a ranking.

    • @emanne8067
      @emanne8067 3 роки тому

      Hi @@stratascratch: Shouldn't the dense_rank be ordered by num_of_comments rather than sum(num_of_comments)

  • @plttji2615
    @plttji2615 3 роки тому

    Hello, I've been watching this youtube channel a lot and found it really helpful. If you don't mind me asking a simple question?. Although I saw your video related which kinds of interview questions do DS role has to take, is it quite common to solve normal algorithm coding test (the one that for SWE for usual) for a DS interview? Thank you.

    • @stratascratch
      @stratascratch  3 роки тому +1

      Yea it's pretty common these days to be asked algo questions on a DS interview. It might not be very common for a data analyst or similar position but for a DS it's starting to trend. I'm hoping to do a few videos on this once I add these types of questions to my platform. Thanks for watching!

    • @plttji2615
      @plttji2615 3 роки тому +2

      @@stratascratch Thank you so much for answering my question. I got a lot of help from watching your channel as a MS DS student. Thank yoiu

  • @beyzayurt5350
    @beyzayurt5350 3 роки тому +1

    Amazing video! One question: Every Facebook Data Science interview example I've seen included SQL only, so I was wondering if FB only asks SQL coding questions? Or do they also ask Python?

    • @stratascratch
      @stratascratch  3 роки тому +1

      I think most people decide to use SQL. You can pick any scripting language to answer coding questions on FB interviews (from my experience).

    • @SelmanAy
      @SelmanAy 3 роки тому

      Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.

    • @stratascratch
      @stratascratch  3 роки тому +3

      @@SelmanAy Yes, I've been playing around with the idea of some python videos! I'm planning to add some algorithm questions that pops up on a few interviews so I may use python for that. Thanks for watching!

    • @SelmanAy
      @SelmanAy 3 роки тому

      @@stratascratch so happy to hear this! looking forward to it!

  • @shobhamourya8396
    @shobhamourya8396 3 роки тому

    Hi Nate, here's my solution without left join
    with cte1
    as
    (
    -- for users who have commented get the country details
    -- extract the month and year from created_at
    select c.user_id, u.country,
    extract(YEAR from created_at) as year,
    extract(MONTH from created_at) as month,
    number_of_comments as comments
    from fb_comments_count c,
    fb_active_users u
    where c.user_id = u.user_id
    ),
    cte2
    as
    (
    -- for each country find all the comments in dec and jan
    select country,
    sum(case when month = 12 and year = '2019' then comments else 0 end ) as dec_comments,
    sum(case when month = 1 and year = '2020' then comments else 0 end ) as jan_comments
    from cte1
    group by country
    order by country
    ),
    cte3
    as
    (
    -- rank the each country based on dec comments and jan comments
    select country,dec_comments,
    dense_rank() over(order by dec_comments desc) dec_rank,
    jan_comments,
    dense_rank() over(order by jan_comments desc) jan_rank
    from cte2
    order by country
    )
    -- highest number of comments get first rank, second-highest second rank
    -- lesser rank in Jan means risen in ranking
    --select country, jan_comments,jan_rank, dec_comments, dec_rank
    select country
    from cte3
    where jan_rank < dec_rank
    order by country

  • @rajyalakshmikadiyala6002
    @rajyalakshmikadiyala6002 2 роки тому

    Can't we use Rank function here in window function

  • @tigerbear3038
    @tigerbear3038 3 роки тому

    Hey Nate, your "created_at

    • @rick2591
      @rick2591 2 роки тому

      Not a huge deal as the optimizer will determine this and switch to an inner join.

  • @nikhilverma2605
    @nikhilverma2605 3 роки тому

    Should i learn MYSQL or Postgresql
    please tell me that

    • @stratascratch
      @stratascratch  3 роки тому +2

      Given the 2, I would learn postgres bc it's the engine is created for analytics in mind. Many analytical teams use postgres. But in practice, it doesn't really matter. Most industry jobs will use other engines like Snowflake or HIVE. What's important is that you learn any SQL engine and learn to think about how to solve analytical questions.

    • @nikhilverma2605
      @nikhilverma2605 3 роки тому

      Would you reccomend learning from udemy

    • @stratascratch
      @stratascratch  3 роки тому

      @@nikhilverma2605 yea some courses are ok. I don’t know if any Udemy courses unfortunately. Try some free courses first like Mode Analytics before paying.

    • @nikhilverma2605
      @nikhilverma2605 3 роки тому

      well what about data analytic by google at cousera

    • @stratascratch
      @stratascratch  3 роки тому

      @@nikhilverma2605 ahh yea that's right. I forgot about that one. I hear great things about it.

  • @bindidesai2191
    @bindidesai2191 2 роки тому

    An attendance log for every student in a school district attendance_events :
    date | student_id | attendance
    • A summary table with demographics for each student in the district all_
    students : student_id | school_id | grade_level | date_of_birth | hometown
    Using this data, you could answer questions like the following:
    • What percent of students attend school on their birthday?
    • Which grade level had the largest drop in attendance between yesterday
    and today?

  • @ayeoh47
    @ayeoh47 2 роки тому

    no Coalesce?

    • @stratascratch
      @stratascratch  2 роки тому

      If you wanted to, I supposed you could. But it wasn't required for this question.

  • @walter_ullon
    @walter_ullon 3 роки тому

    Basically, I'm screwed... :(

    • @stratascratch
      @stratascratch  3 роки тому +1

      Keep practicing! it gets much easier!

  • @joaojoca100
    @joaojoca100 2 роки тому

    this is a hard question....

  • @y_yy_2844
    @y_yy_2844 2 роки тому

    Obtain country and count of comments by month. Restrict it to December 2019 and January 2020 comments. Subtract December 2019 from January 2020 value. See the country that had the maximum positive difference.
    I could do this query with click and drag in about 3 seconds in Microsoft Access, but no, let's write out SQL code like it's 1976.

    • @ssoossdd
      @ssoossdd 2 роки тому

      Now try to do this with 1 billion rows , ok buddy?

    • @y_yy_2844
      @y_yy_2844 2 роки тому

      @@ssoossdd ? You just make the query in MS Access so all t he SQL is written for you. Then copy it to whatever else you're using.

  • @jaymo2024
    @jaymo2024 2 роки тому +1

    I believe the online ide is not working properly, unless you put in the expected result. Can someone please explain, why it is throwing errors when I use user_id and country columns (it is saying column name is ambiguous). I had to comment it out in order for the above code to run.
    SELECT
    --user_id,
    sum(number_of_comments) as comments_dec,
    --country,
    dense_rank() over(order by sum(number_of_comments)desc) as country_rank
    FROM fb_active_users as a
    LEFT JOIN fb_comments_count as b
    ON a.user_id = b.user_id
    WHERE created_at BETWEEN DATE('2019-12-01') AND ('2019-12-31') and country is not null
    GROUP BY country;

    • @stratascratch
      @stratascratch  2 роки тому +1

      It's throwing an error because there's column user_id in both tables. So, yes, these columns are ambiguous. What you're missing is an alias in front of the column user_id. Which alias you'll use depends on which table you want to fetch user_id from. But if you're following my solution, there's no user_id in the SELECT statement; only country, SUM(), and DENSE_RANK()

    • @jaymo2024
      @jaymo2024 2 роки тому +1

      @@stratascratch I was a super noop when I wrote that comment but I know now. Thanks 🙏🏾 for the tutorials

    • @stratascratch
      @stratascratch  2 роки тому

      @@jaymo2024 Everyone has to start somewhere. You're doing great!

    • @jaymo2024
      @jaymo2024 2 роки тому

      @@stratascratch I don’t think I should have been working on window functions as a beginner. I hope you make a video guide of the sql concepts to learn first from A to Z

  • @sandeepreddy4178
    @sandeepreddy4178 3 роки тому +6

    This is much faster and efficient:
    With cte as
    (
    select base.*, dense_rank() over(partition by year_month order by total_comments desc) as ranks from
    (Select country, format(created_at, ‘YYYY-MM) year_month, sum(comments) total_comments
    From fb_active_users u
    inner join comments_per_user c
    On c.user_id = u.user_id and
    created_at between ‘2019-12-01’ and ‘2020-01-30’
    Group by country, year_month) base
    )
    select cte.country, ranks, lead(ranks) over (partition by country order by year_month) as next_month_ranks from cte
    where next_month_ranks < ranks;

    • @stratascratch
      @stratascratch  3 роки тому +1

      Love it! Thanks for sharing

    • @harshalpatel555
      @harshalpatel555 3 роки тому

      I like your query but I think Sandeep there is flaw when u r calculating the rank based on partition of only year and month , as it is calculating the rank for Dec or Jan month by comparing with all the Dec or Jan dates of all countries. What if we put country also in that partition so it would give us the rank based on country and year_month. Please correct me if I am going wrong way

    • @harshalpatel555
      @harshalpatel555 3 роки тому

      We can discuss more on this Sandeep if you are okay.

    • @sandeepreddy4178
      @sandeepreddy4178 3 роки тому

      Could work! I honestly didn’t run this against real data, just thought using a two level ranking function could make the query smaller so put it in here quick and dirty! So whatever works best 😊

  • @rameespudussery2700
    @rameespudussery2700 3 роки тому

    Hi Nate. Thanks for your video. I tried the below code to get sum of number of comments for Dec and Jan. But I am getting a different number when compared with yours. For Eg : In USA for Dec and Jan i am getting 11 and 9 respectively.
    But for you both months are 11. Confused!!
    select COUNTRY,
    SUM(CASE WHEN MONTH(created_at)='12' AND Year(created_at)='2019' THEN number_of_comments END) as dec_2019,
    SUM(CASE WHEN MONTH(created_at)='1' AND Year(created_at)='2020' THEN number_of_comments END) as Jan_2020
    from fb_active_users a
    LEFT JOIN fb_comments_count b
    ON a.user_id=b.user_id
    GROUP BY COUNTRY;

    • @stratascratch
      @stratascratch  3 роки тому +1

      Hey, thanks for giving it a try. Can you leave a comment in the discussion forum on the platform? Someone from my team will be able to help much more quickly with technical questions!

    • @rameespudussery2700
      @rameespudussery2700 3 роки тому

      @@stratascratch Sure. Thanks for your reply