Hard SQL Interview Question From FACEBOOK | Data Science Coding Interviews (Popularity Percentage)

Поділитися
Вставка
  • Опубліковано 8 січ 2025

КОМЕНТАРІ • 83

  • @tanvimehta7960
    @tanvimehta7960 3 роки тому +27

    That's insightful. Using a CTE helped me shorten the query. Also when using UNION, you get rid of duplicates with the individual tables as well. So using distinct is not required.
    with total as (
    select user1, user2 from facebook_friends
    union
    select user2, user1 from facebook_friends)
    select distinct user1, cast(count(user2) over (partition by user1) as float)*100/(select count(distinct user1) from total) from total
    order by user1;

  • @dwaipayansaha4443
    @dwaipayansaha4443 2 роки тому +1

    This is one of the best union combined with cte problem I have come across
    My solution:-
    with t1 as (select * from facebook_friends
    union all
    select user2,user1 from facebook_friends),
    t2 as (select user1, count(user2) no_friends from t1
    group by user1
    order by user1),
    t3 as (select user1, no_friends,count(*) over() total from t2)
    select user1,(no_friends/total)*100.0 pop_per from t3

  • @gagansingh3481
    @gagansingh3481 2 роки тому +1

    Awesome dear you did a great explanation kindly make more videos like this 😀

    • @stratascratch
      @stratascratch  2 роки тому

      Thank you. I am glad you find it very helpful.

  • @avanichheda
    @avanichheda 3 роки тому +2

    Great video and content! I was able to use count(user2) OVER (Partition by user1) after doing the union to get to the friends of the users.
    Was able to confirm by submitting on the platform too :)

  • @rashvinganesh
    @rashvinganesh 4 роки тому +4

    Thank you so much Nate. Find it very easy to understand what’s the solution you’re taking. As a beginner in SQL this really helps me understand more clear.

    • @stratascratch
      @stratascratch  4 роки тому +3

      That's great. Hope you enjoy the entire series of SQL problems then. Let me know if you have any questions or feedback.

  • @caiyu538
    @caiyu538 2 роки тому

    Great solution, without your explanation, it is hard to figure it out. Thank you.

  • @meghasyam427
    @meghasyam427 3 роки тому +2

    Tricky question but in reality all the users even in the user2 column should be present in the user1 column too. This is the basis of a user table. This question is all about what if all the users are not present in user1 column, then your code is the ultimate solution.
    Hatsoff to your thinking which considers the worst case i didn't expect.

    • @stratascratch
      @stratascratch  3 роки тому +2

      I agree that if this was a user table then you'd have all the users in the user1 column. But what if this was a mapping table? That assumption might not be valid. It's always good to ask the interview to clarify any assumptions. I always assume the worse case =) Thanks for watching and for the great detailed comment.

    • @shyamsundark1556
      @shyamsundark1556 2 роки тому

      Exactly ! the only catch in this Q is the fact that they dont explicitly state the structure of the data.

  • @SudhirKumar-rl4wt
    @SudhirKumar-rl4wt 2 роки тому +1

    I am not sure if we need to use DISTINCT if we are using union already

  • @StanleySI
    @StanleySI 3 роки тому

    I found this platform is very helpful because it allows me to run code and let me see result step by step. In this way, I can better understand what I'm doing and where I got wrong.

  • @finvestomate
    @finvestomate 2 роки тому

    Thank you for the insights. I also tried doing it in similar manner, with a shorter query
    select user1, Round((Friends/max(users_id) over())*100,3) as PP
    from
    (select *,
    count(user1) as Friends,
    row_number() over(order by user1) as users_id from
    (select *
    from facebook_friends
    UNION ALL
    select user2,user1 from facebook_friends) as A
    group by user1
    order by user1) as B;

  • @joaopedroreissilva7075
    @joaopedroreissilva7075 3 роки тому

    Thank you, Nate.
    Really good approach.

  • @thepogchamp7783
    @thepogchamp7783 2 роки тому +1

    Hey Nate! Thanks for the video - they are really insightful, and helps a lot when I do the questions! Quick question - is there a reason to join the tuu value? What I did was just to put it as the denominator in the select statement, which I thought should save some time since you don't have to join. Thanks!

    • @stratascratch
      @stratascratch  2 роки тому

      I think it was just to ensure that I've captured all users since there is a user1 and user2. I would validate your approach with the platform. We capture most use cases/edge cases on the platform, so if your thinking is right, your solution should validate correctly.

  • @ismafoot11
    @ismafoot11 4 роки тому +2

    if you use "union" you don't need distinct before user1 and user2. union will do that for you

    • @stratascratch
      @stratascratch  4 роки тому +1

      Yup that's true! I just added the DISTINCT to be explicit.

  • @okygy2125
    @okygy2125 3 роки тому

    Excellent!

  • @chandraprakash0211
    @chandraprakash0211 2 роки тому

    Hey Nate ! Can we use self join and not union for this problem ?

    • @stratascratch
      @stratascratch  2 роки тому

      Yes! There are many ways to solve the problem. You can even try your self join approach on the platfrom. There's a link in the description.

  • @Hotobu
    @Hotobu 2 роки тому

    This is a "hard" question? This is SUPER easy!

  • @yashovardhan9841
    @yashovardhan9841 3 роки тому +1

    My approach -
    with temp as (
    select user1 as u, count(*) as tot_friends from facebook_friends group by 1
    union all
    select user2 as u, count(*) as tot_friends from facebook_friends group by 1
    )
    select u, (sum(tot_friends)/(select count(distinct u) from temp)::float)*100 as popularity_percentage
    from
    temp
    group by 1
    order by 1

  • @ghislaineamrani2775
    @ghislaineamrani2775 4 роки тому +2

    Hi Nate, thank you !
    This solution below seems to work also but I used WITH instead of subqueries, is it an issue ? Thank you for your feedback !
    WITH total_users as (
    select distinct user as all_users from facebook_friends
    union
    select distinct friend from facebook_friends
    ),
    consolidated_table as(
    select all_users, count(user) as number_user, (select count(*) from total_users) as total_u
    from total_users
    cross join facebook_friends
    where all_users = user
    OR all_users = friend
    group by 1
    )
    select all_users, (cast(number_user as float)/cast(total_u as float))*100 as percent from consolidated_table;

    • @stratascratch
      @stratascratch  4 роки тому +1

      It's not at all an issue if you used a CTE (i.e., using a WITH) instead of a subquery. The reason why I used a subquery is because some SQL flavors don't allow using WITH so I wanted to write the solution so that the code can be used on any SQL flavor. Thanks for watching and more thanks for trying out this question!

  • @rithikgaur3830
    @rithikgaur3830 3 роки тому

    Thank you so much for videos and stratscratch , i had general query , as in the above video unless we see tha data and if we only go with the schema mentioned we tend to believe that user 1 column has all users already , so in a real facebook or top company interviews do we actually have the data available for us or do we actually have to ask the interviewer on the limitations on data set .

    • @stratascratch
      @stratascratch  3 роки тому +1

      In my experience, you won't have the data so you need to ask the interviewer what the data set looks like. Basically what are some of the limitations and/or example values. Then you can start to develop your solution and identify edge cases.

  • @radhikashroff2643
    @radhikashroff2643 3 роки тому +1

    Hi Nate,
    To calculate number of friends each user has, can we do this ?
    SELECT user1, COUNT(user2)
    FROM facebookfriends
    GROUP BY user1
    so this would group all the friends of each user in user1 since right column are the friends right ?

    • @stratascratch
      @stratascratch  3 роки тому

      Yes, so long as all users are found in the user1 column. But our assumption is that not all users are found in user1. So the question is slightly more complicated.

    • @GouthamSudini
      @GouthamSudini 2 роки тому +1

      @@stratascratch wouldn't user1 carry all users based on the table structure? How can a user2 be a friend without being a user in 1st place?

  • @iceindia007
    @iceindia007 3 роки тому

    u can simply use something like this:
    select a.* , a.i/(sum(a.i) over()) as aasd
    from (
    select requester_id as id, count(accepter_id ) as i from request_accepted
    group by 1
    union
    select accepter_id as id , count(requester_id) as i from request_accepted
    group by 1
    ) a
    order by 3 desc

  • @danielxing1034
    @danielxing1034 2 роки тому

    Great walkthrough! But still think the number of friend of each user should not be computed using union because A friends with B does not mean B friends with A. Those who only show up in the second column should be considered having no friends.

  • @Baddiemuan4real
    @Baddiemuan4real 3 роки тому

    Hi Nate at Strata Scratch pls can you RECOMMEND a platform where I can study ADVANCE SQL. Thanks in advance.

    • @stratascratch
      @stratascratch  3 роки тому +3

      For advanced level SQL, I would recommend leetcode, stratascratch, interviewquery, sqlpad.io as great platforms that will give you some advanced level skills. If you need something even more difficult then I would suggest doing real projects to improved. Real projects tend to have much longer sql scripts that are necessary to be written.

  • @96satenik
    @96satenik 2 роки тому

    Can total number of users be defined as Max(users1) ?

  • @AmanSingh-od2ue
    @AmanSingh-od2ue 3 роки тому

    Hey Nate,
    I do have concern from the top query where the FROM clause ends at total_unique_users. Could you please explain as to why it is ending there and why not at the end where the LEFT JOIN is completed?

    • @stratascratch
      @stratascratch  3 роки тому

      It's because the total_unique_users is the 1st subquery and then I do another subquery called tuu that uses total_unique_users. I use tuu to join to user_friends. Once I join those 2 subqueries, I take the aggregate count. So the main reason why it ends at tuu is because I'm doing a subquery within a subquery. Hope that makes sense. If you want to troubleshoot, I would try to see how the data is aggregated at each level of subquery. Then write more code and see how the data is aggregated again. Keep doing this until the entire query is built. That should help you understand what's going on.

    • @AmanSingh-od2ue
      @AmanSingh-od2ue 3 роки тому

      @@stratascratch Thank you. I got your point, Nate.

  • @ohh_nina_nyc
    @ohh_nina_nyc 3 роки тому

    Hi, why do you use MAX() to get all _users number? Can we just divide by tuu.all_users?

    • @stratascratch
      @stratascratch  3 роки тому

      Actually, yes you could just use all_users since it's the count() and you'll just be given 1 number anyways. good catch.

  • @vijayjayaraman5990
    @vijayjayaraman5990 3 роки тому +1

    While I agree with the solution, why wouldn't Bob be a user1 in the original table? That doesn't sound intuitive at all.. I would have done the SELECT user1, count(distinct user2) right away because user1 was supposed to have all the users on Facebook..

    • @stratascratch
      @stratascratch  3 роки тому +1

      I didn't have the assumption that user1 should have all the users on FB. I could make that case if it was a user table (like a copy from production) but if it's a mapping table or some other type of table used for analytics, I wouldn't feel comfortable making that assumption without asking the interviewer first (or checking it if I was at work).

  • @celiahan3787
    @celiahan3787 3 роки тому

    since you are doing union to get all the users. So how do you think the user1 column did not contain all the users and we need to do the union?

    • @stratascratch
      @stratascratch  3 роки тому +2

      I thought about edge cases where Friend A might be friends with Friend B, but Friend B might not be friends with A. Many platforms are like this (FB might not be one of them but I know IG is) so dong the UNION takes care of that. You can always ask the interviewer what they think. I don't think there's a right or wrong answer here since you have to assume certain things for you to end up coding a solution.

    • @celiahan3787
      @celiahan3787 3 роки тому

      @@stratascratch Thank you so much, I got you. You are talking about the followers on IG. Yes, if in that case, it makes more sense

  • @TheRaju991
    @TheRaju991 3 роки тому

    This was uniquely tough to think through.

  • @Soulfulreader786
    @Soulfulreader786 Рік тому

    my solution
    with cte as
    (select user1,user2
    from facebook_friends
    union
    select user2,user1
    from facebook_friends),
    cte2 as
    (select distinct user1,count(user2) over(partition by user1) as total_friends
    from cte)
    select user1,total_friends/9.0*100 as perc_pop
    from cte2;

  • @sandeepvenkatasairam9922
    @sandeepvenkatasairam9922 3 роки тому

    Thanks Nate for keeping us educated ! Just a logical Question I had , Since you are saying user1 is the user and user2 is their friend list , I hope even the friend should be a user on Platform , So can we cant consider select distinct user1 from facebook_friends as total users , Correct me if I am missing something .

    • @stratascratch
      @stratascratch  3 роки тому +1

      That's a great assumption. That's something you'd want to talk to the interviewer about to see if your assumption makes sense. Usually when it's captured on a db table then the user's friend should also be on the platform. Why is it that you can't consider distinct user1 from the table?

  • @nanditha4477
    @nanditha4477 3 роки тому

    Hi Nate ..Can u please have a Sql session once. Can u please ping me if thats possible

  • @philintheblank4360
    @philintheblank4360 3 роки тому +1

    it seems there's an assumtption that person a can be friends with person b, but person b is not friends with person a... I don't think that's how this platform or friendship works. Clarifying a simplifying assumption with the interviewer would streamline a lot of this.

    • @stratascratch
      @stratascratch  3 роки тому +1

      Yea you're right. There are some assumptions that are needed to solve the question the right way. And you should talk to the interviewer to get the assumptions clarified. Even the question here has changed a bit since I created this video. Thanks for pointing this out.

  • @123admini2r
    @123admini2r 3 роки тому +1

    damn, understanding the data provided is the tough part

    • @stratascratch
      @stratascratch  3 роки тому

      To me it's the data schema that's hard because there's so many assumptions you'll need to make based on the structure of the table. And then once you have an understanding of that, the values for each column can add complexity. So this part takes a lot of time when on an interview.

  • @techiewithcamera
    @techiewithcamera Рік тому

    My Solution:
    with total_population(pop_cnt) as
    (select count(user)
    from
    (select distinct user1 as user from facebook_friends
    union
    select distinct user2 as user from facebook_friends)x),
    total_pairs as
    (select distinct user1 as frnd1, user2 as frnd2 from facebook_friends
    union
    select distinct user2 as frnd1, user1 as frnd2 from facebook_friends)
    select frnd1,
    count(distinct frnd2)/max(pop_cnt)::float*100 as popularity_percentage
    from total_pairs, total_population
    group by 1;

  • @poopah4497
    @poopah4497 3 роки тому +1

    Hey Nate , I must over-simplify it. Why I cannot use a sub query to calculate total user and window function to calculate friends per user.
    Select user1, (select count(distinct user2) from xx) as totalususer, count(distinct user2 ) over (partition by user1) from …

    • @stratascratch
      @stratascratch  3 роки тому

      You can probably use a subquery to solve it. Give it a try on the platform to see if you get the output you want

  • @saikatdasgupta2006
    @saikatdasgupta2006 3 роки тому

    Another solution
    select user1, (b.sum1/c.cnt)*100 as popularity_percent from
    (select count(distinct user1) as cnt from (select distinct user1 from facebook_friends union select distinct user2 from facebook_friends) d) c,
    (
    select user1, sum(cnt) sum1 from
    (select user1,count(*) cnt from facebook_friends group by user1
    union
    select user2,count(*) from facebook_friends group by user2) a
    group by user1) b
    order by user1