That's insightful. Using a CTE helped me shorten the query. Also when using UNION, you get rid of duplicates with the individual tables as well. So using distinct is not required. with total as ( select user1, user2 from facebook_friends union select user2, user1 from facebook_friends) select distinct user1, cast(count(user2) over (partition by user1) as float)*100/(select count(distinct user1) from total) from total order by user1;
This is one of the best union combined with cte problem I have come across My solution:- with t1 as (select * from facebook_friends union all select user2,user1 from facebook_friends), t2 as (select user1, count(user2) no_friends from t1 group by user1 order by user1), t3 as (select user1, no_friends,count(*) over() total from t2) select user1,(no_friends/total)*100.0 pop_per from t3
Great video and content! I was able to use count(user2) OVER (Partition by user1) after doing the union to get to the friends of the users. Was able to confirm by submitting on the platform too :)
Thank you so much Nate. Find it very easy to understand what’s the solution you’re taking. As a beginner in SQL this really helps me understand more clear.
Tricky question but in reality all the users even in the user2 column should be present in the user1 column too. This is the basis of a user table. This question is all about what if all the users are not present in user1 column, then your code is the ultimate solution. Hatsoff to your thinking which considers the worst case i didn't expect.
I agree that if this was a user table then you'd have all the users in the user1 column. But what if this was a mapping table? That assumption might not be valid. It's always good to ask the interview to clarify any assumptions. I always assume the worse case =) Thanks for watching and for the great detailed comment.
I found this platform is very helpful because it allows me to run code and let me see result step by step. In this way, I can better understand what I'm doing and where I got wrong.
Thank you for the insights. I also tried doing it in similar manner, with a shorter query select user1, Round((Friends/max(users_id) over())*100,3) as PP from (select *, count(user1) as Friends, row_number() over(order by user1) as users_id from (select * from facebook_friends UNION ALL select user2,user1 from facebook_friends) as A group by user1 order by user1) as B;
Hey Nate! Thanks for the video - they are really insightful, and helps a lot when I do the questions! Quick question - is there a reason to join the tuu value? What I did was just to put it as the denominator in the select statement, which I thought should save some time since you don't have to join. Thanks!
I think it was just to ensure that I've captured all users since there is a user1 and user2. I would validate your approach with the platform. We capture most use cases/edge cases on the platform, so if your thinking is right, your solution should validate correctly.
My approach - with temp as ( select user1 as u, count(*) as tot_friends from facebook_friends group by 1 union all select user2 as u, count(*) as tot_friends from facebook_friends group by 1 ) select u, (sum(tot_friends)/(select count(distinct u) from temp)::float)*100 as popularity_percentage from temp group by 1 order by 1
Hi Nate, thank you ! This solution below seems to work also but I used WITH instead of subqueries, is it an issue ? Thank you for your feedback ! WITH total_users as ( select distinct user as all_users from facebook_friends union select distinct friend from facebook_friends ), consolidated_table as( select all_users, count(user) as number_user, (select count(*) from total_users) as total_u from total_users cross join facebook_friends where all_users = user OR all_users = friend group by 1 ) select all_users, (cast(number_user as float)/cast(total_u as float))*100 as percent from consolidated_table;
It's not at all an issue if you used a CTE (i.e., using a WITH) instead of a subquery. The reason why I used a subquery is because some SQL flavors don't allow using WITH so I wanted to write the solution so that the code can be used on any SQL flavor. Thanks for watching and more thanks for trying out this question!
Thank you so much for videos and stratscratch , i had general query , as in the above video unless we see tha data and if we only go with the schema mentioned we tend to believe that user 1 column has all users already , so in a real facebook or top company interviews do we actually have the data available for us or do we actually have to ask the interviewer on the limitations on data set .
In my experience, you won't have the data so you need to ask the interviewer what the data set looks like. Basically what are some of the limitations and/or example values. Then you can start to develop your solution and identify edge cases.
Hi Nate, To calculate number of friends each user has, can we do this ? SELECT user1, COUNT(user2) FROM facebookfriends GROUP BY user1 so this would group all the friends of each user in user1 since right column are the friends right ?
Yes, so long as all users are found in the user1 column. But our assumption is that not all users are found in user1. So the question is slightly more complicated.
u can simply use something like this: select a.* , a.i/(sum(a.i) over()) as aasd from ( select requester_id as id, count(accepter_id ) as i from request_accepted group by 1 union select accepter_id as id , count(requester_id) as i from request_accepted group by 1 ) a order by 3 desc
Great walkthrough! But still think the number of friend of each user should not be computed using union because A friends with B does not mean B friends with A. Those who only show up in the second column should be considered having no friends.
For advanced level SQL, I would recommend leetcode, stratascratch, interviewquery, sqlpad.io as great platforms that will give you some advanced level skills. If you need something even more difficult then I would suggest doing real projects to improved. Real projects tend to have much longer sql scripts that are necessary to be written.
Hey Nate, I do have concern from the top query where the FROM clause ends at total_unique_users. Could you please explain as to why it is ending there and why not at the end where the LEFT JOIN is completed?
It's because the total_unique_users is the 1st subquery and then I do another subquery called tuu that uses total_unique_users. I use tuu to join to user_friends. Once I join those 2 subqueries, I take the aggregate count. So the main reason why it ends at tuu is because I'm doing a subquery within a subquery. Hope that makes sense. If you want to troubleshoot, I would try to see how the data is aggregated at each level of subquery. Then write more code and see how the data is aggregated again. Keep doing this until the entire query is built. That should help you understand what's going on.
While I agree with the solution, why wouldn't Bob be a user1 in the original table? That doesn't sound intuitive at all.. I would have done the SELECT user1, count(distinct user2) right away because user1 was supposed to have all the users on Facebook..
I didn't have the assumption that user1 should have all the users on FB. I could make that case if it was a user table (like a copy from production) but if it's a mapping table or some other type of table used for analytics, I wouldn't feel comfortable making that assumption without asking the interviewer first (or checking it if I was at work).
I thought about edge cases where Friend A might be friends with Friend B, but Friend B might not be friends with A. Many platforms are like this (FB might not be one of them but I know IG is) so dong the UNION takes care of that. You can always ask the interviewer what they think. I don't think there's a right or wrong answer here since you have to assume certain things for you to end up coding a solution.
my solution with cte as (select user1,user2 from facebook_friends union select user2,user1 from facebook_friends), cte2 as (select distinct user1,count(user2) over(partition by user1) as total_friends from cte) select user1,total_friends/9.0*100 as perc_pop from cte2;
Thanks Nate for keeping us educated ! Just a logical Question I had , Since you are saying user1 is the user and user2 is their friend list , I hope even the friend should be a user on Platform , So can we cant consider select distinct user1 from facebook_friends as total users , Correct me if I am missing something .
That's a great assumption. That's something you'd want to talk to the interviewer about to see if your assumption makes sense. Usually when it's captured on a db table then the user's friend should also be on the platform. Why is it that you can't consider distinct user1 from the table?
it seems there's an assumtption that person a can be friends with person b, but person b is not friends with person a... I don't think that's how this platform or friendship works. Clarifying a simplifying assumption with the interviewer would streamline a lot of this.
Yea you're right. There are some assumptions that are needed to solve the question the right way. And you should talk to the interviewer to get the assumptions clarified. Even the question here has changed a bit since I created this video. Thanks for pointing this out.
To me it's the data schema that's hard because there's so many assumptions you'll need to make based on the structure of the table. And then once you have an understanding of that, the values for each column can add complexity. So this part takes a lot of time when on an interview.
My Solution: with total_population(pop_cnt) as (select count(user) from (select distinct user1 as user from facebook_friends union select distinct user2 as user from facebook_friends)x), total_pairs as (select distinct user1 as frnd1, user2 as frnd2 from facebook_friends union select distinct user2 as frnd1, user1 as frnd2 from facebook_friends) select frnd1, count(distinct frnd2)/max(pop_cnt)::float*100 as popularity_percentage from total_pairs, total_population group by 1;
Hey Nate , I must over-simplify it. Why I cannot use a sub query to calculate total user and window function to calculate friends per user. Select user1, (select count(distinct user2) from xx) as totalususer, count(distinct user2 ) over (partition by user1) from …
Another solution select user1, (b.sum1/c.cnt)*100 as popularity_percent from (select count(distinct user1) as cnt from (select distinct user1 from facebook_friends union select distinct user2 from facebook_friends) d) c, ( select user1, sum(cnt) sum1 from (select user1,count(*) cnt from facebook_friends group by user1 union select user2,count(*) from facebook_friends group by user2) a group by user1) b order by user1
That's insightful. Using a CTE helped me shorten the query. Also when using UNION, you get rid of duplicates with the individual tables as well. So using distinct is not required.
with total as (
select user1, user2 from facebook_friends
union
select user2, user1 from facebook_friends)
select distinct user1, cast(count(user2) over (partition by user1) as float)*100/(select count(distinct user1) from total) from total
order by user1;
This is one of the best union combined with cte problem I have come across
My solution:-
with t1 as (select * from facebook_friends
union all
select user2,user1 from facebook_friends),
t2 as (select user1, count(user2) no_friends from t1
group by user1
order by user1),
t3 as (select user1, no_friends,count(*) over() total from t2)
select user1,(no_friends/total)*100.0 pop_per from t3
Awesome dear you did a great explanation kindly make more videos like this 😀
Thank you. I am glad you find it very helpful.
Great video and content! I was able to use count(user2) OVER (Partition by user1) after doing the union to get to the friends of the users.
Was able to confirm by submitting on the platform too :)
Thank you so much Nate. Find it very easy to understand what’s the solution you’re taking. As a beginner in SQL this really helps me understand more clear.
That's great. Hope you enjoy the entire series of SQL problems then. Let me know if you have any questions or feedback.
Great solution, without your explanation, it is hard to figure it out. Thank you.
Tricky question but in reality all the users even in the user2 column should be present in the user1 column too. This is the basis of a user table. This question is all about what if all the users are not present in user1 column, then your code is the ultimate solution.
Hatsoff to your thinking which considers the worst case i didn't expect.
I agree that if this was a user table then you'd have all the users in the user1 column. But what if this was a mapping table? That assumption might not be valid. It's always good to ask the interview to clarify any assumptions. I always assume the worse case =) Thanks for watching and for the great detailed comment.
Exactly ! the only catch in this Q is the fact that they dont explicitly state the structure of the data.
I am not sure if we need to use DISTINCT if we are using union already
I found this platform is very helpful because it allows me to run code and let me see result step by step. In this way, I can better understand what I'm doing and where I got wrong.
That's for a great testimonial =)
Thank you for the insights. I also tried doing it in similar manner, with a shorter query
select user1, Round((Friends/max(users_id) over())*100,3) as PP
from
(select *,
count(user1) as Friends,
row_number() over(order by user1) as users_id from
(select *
from facebook_friends
UNION ALL
select user2,user1 from facebook_friends) as A
group by user1
order by user1) as B;
Thank you, Nate.
Really good approach.
Hey Nate! Thanks for the video - they are really insightful, and helps a lot when I do the questions! Quick question - is there a reason to join the tuu value? What I did was just to put it as the denominator in the select statement, which I thought should save some time since you don't have to join. Thanks!
I think it was just to ensure that I've captured all users since there is a user1 and user2. I would validate your approach with the platform. We capture most use cases/edge cases on the platform, so if your thinking is right, your solution should validate correctly.
if you use "union" you don't need distinct before user1 and user2. union will do that for you
Yup that's true! I just added the DISTINCT to be explicit.
Excellent!
Hey Nate ! Can we use self join and not union for this problem ?
Yes! There are many ways to solve the problem. You can even try your self join approach on the platfrom. There's a link in the description.
This is a "hard" question? This is SUPER easy!
You are a cut above the rest.
My approach -
with temp as (
select user1 as u, count(*) as tot_friends from facebook_friends group by 1
union all
select user2 as u, count(*) as tot_friends from facebook_friends group by 1
)
select u, (sum(tot_friends)/(select count(distinct u) from temp)::float)*100 as popularity_percentage
from
temp
group by 1
order by 1
Hi Nate, thank you !
This solution below seems to work also but I used WITH instead of subqueries, is it an issue ? Thank you for your feedback !
WITH total_users as (
select distinct user as all_users from facebook_friends
union
select distinct friend from facebook_friends
),
consolidated_table as(
select all_users, count(user) as number_user, (select count(*) from total_users) as total_u
from total_users
cross join facebook_friends
where all_users = user
OR all_users = friend
group by 1
)
select all_users, (cast(number_user as float)/cast(total_u as float))*100 as percent from consolidated_table;
It's not at all an issue if you used a CTE (i.e., using a WITH) instead of a subquery. The reason why I used a subquery is because some SQL flavors don't allow using WITH so I wanted to write the solution so that the code can be used on any SQL flavor. Thanks for watching and more thanks for trying out this question!
Thank you so much for videos and stratscratch , i had general query , as in the above video unless we see tha data and if we only go with the schema mentioned we tend to believe that user 1 column has all users already , so in a real facebook or top company interviews do we actually have the data available for us or do we actually have to ask the interviewer on the limitations on data set .
In my experience, you won't have the data so you need to ask the interviewer what the data set looks like. Basically what are some of the limitations and/or example values. Then you can start to develop your solution and identify edge cases.
Hi Nate,
To calculate number of friends each user has, can we do this ?
SELECT user1, COUNT(user2)
FROM facebookfriends
GROUP BY user1
so this would group all the friends of each user in user1 since right column are the friends right ?
Yes, so long as all users are found in the user1 column. But our assumption is that not all users are found in user1. So the question is slightly more complicated.
@@stratascratch wouldn't user1 carry all users based on the table structure? How can a user2 be a friend without being a user in 1st place?
u can simply use something like this:
select a.* , a.i/(sum(a.i) over()) as aasd
from (
select requester_id as id, count(accepter_id ) as i from request_accepted
group by 1
union
select accepter_id as id , count(requester_id) as i from request_accepted
group by 1
) a
order by 3 desc
Great walkthrough! But still think the number of friend of each user should not be computed using union because A friends with B does not mean B friends with A. Those who only show up in the second column should be considered having no friends.
Hi Nate at Strata Scratch pls can you RECOMMEND a platform where I can study ADVANCE SQL. Thanks in advance.
For advanced level SQL, I would recommend leetcode, stratascratch, interviewquery, sqlpad.io as great platforms that will give you some advanced level skills. If you need something even more difficult then I would suggest doing real projects to improved. Real projects tend to have much longer sql scripts that are necessary to be written.
Can total number of users be defined as Max(users1) ?
Hey Nate,
I do have concern from the top query where the FROM clause ends at total_unique_users. Could you please explain as to why it is ending there and why not at the end where the LEFT JOIN is completed?
It's because the total_unique_users is the 1st subquery and then I do another subquery called tuu that uses total_unique_users. I use tuu to join to user_friends. Once I join those 2 subqueries, I take the aggregate count. So the main reason why it ends at tuu is because I'm doing a subquery within a subquery. Hope that makes sense. If you want to troubleshoot, I would try to see how the data is aggregated at each level of subquery. Then write more code and see how the data is aggregated again. Keep doing this until the entire query is built. That should help you understand what's going on.
@@stratascratch Thank you. I got your point, Nate.
Hi, why do you use MAX() to get all _users number? Can we just divide by tuu.all_users?
Actually, yes you could just use all_users since it's the count() and you'll just be given 1 number anyways. good catch.
While I agree with the solution, why wouldn't Bob be a user1 in the original table? That doesn't sound intuitive at all.. I would have done the SELECT user1, count(distinct user2) right away because user1 was supposed to have all the users on Facebook..
I didn't have the assumption that user1 should have all the users on FB. I could make that case if it was a user table (like a copy from production) but if it's a mapping table or some other type of table used for analytics, I wouldn't feel comfortable making that assumption without asking the interviewer first (or checking it if I was at work).
since you are doing union to get all the users. So how do you think the user1 column did not contain all the users and we need to do the union?
I thought about edge cases where Friend A might be friends with Friend B, but Friend B might not be friends with A. Many platforms are like this (FB might not be one of them but I know IG is) so dong the UNION takes care of that. You can always ask the interviewer what they think. I don't think there's a right or wrong answer here since you have to assume certain things for you to end up coding a solution.
@@stratascratch Thank you so much, I got you. You are talking about the followers on IG. Yes, if in that case, it makes more sense
This was uniquely tough to think through.
my solution
with cte as
(select user1,user2
from facebook_friends
union
select user2,user1
from facebook_friends),
cte2 as
(select distinct user1,count(user2) over(partition by user1) as total_friends
from cte)
select user1,total_friends/9.0*100 as perc_pop
from cte2;
Thanks Nate for keeping us educated ! Just a logical Question I had , Since you are saying user1 is the user and user2 is their friend list , I hope even the friend should be a user on Platform , So can we cant consider select distinct user1 from facebook_friends as total users , Correct me if I am missing something .
That's a great assumption. That's something you'd want to talk to the interviewer about to see if your assumption makes sense. Usually when it's captured on a db table then the user's friend should also be on the platform. Why is it that you can't consider distinct user1 from the table?
Hi Nate ..Can u please have a Sql session once. Can u please ping me if thats possible
it seems there's an assumtption that person a can be friends with person b, but person b is not friends with person a... I don't think that's how this platform or friendship works. Clarifying a simplifying assumption with the interviewer would streamline a lot of this.
Yea you're right. There are some assumptions that are needed to solve the question the right way. And you should talk to the interviewer to get the assumptions clarified. Even the question here has changed a bit since I created this video. Thanks for pointing this out.
damn, understanding the data provided is the tough part
To me it's the data schema that's hard because there's so many assumptions you'll need to make based on the structure of the table. And then once you have an understanding of that, the values for each column can add complexity. So this part takes a lot of time when on an interview.
My Solution:
with total_population(pop_cnt) as
(select count(user)
from
(select distinct user1 as user from facebook_friends
union
select distinct user2 as user from facebook_friends)x),
total_pairs as
(select distinct user1 as frnd1, user2 as frnd2 from facebook_friends
union
select distinct user2 as frnd1, user1 as frnd2 from facebook_friends)
select frnd1,
count(distinct frnd2)/max(pop_cnt)::float*100 as popularity_percentage
from total_pairs, total_population
group by 1;
Hey Nate , I must over-simplify it. Why I cannot use a sub query to calculate total user and window function to calculate friends per user.
Select user1, (select count(distinct user2) from xx) as totalususer, count(distinct user2 ) over (partition by user1) from …
You can probably use a subquery to solve it. Give it a try on the platform to see if you get the output you want
Another solution
select user1, (b.sum1/c.cnt)*100 as popularity_percent from
(select count(distinct user1) as cnt from (select distinct user1 from facebook_friends union select distinct user2 from facebook_friends) d) c,
(
select user1, sum(cnt) sum1 from
(select user1,count(*) cnt from facebook_friends group by user1
union
select user2,count(*) from facebook_friends group by user2) a
group by user1) b
order by user1