Bro, your channel is just gold !!!! I´ve been spending 3 hours a day watching every day because before watching your videos I was not able to solve a single question, but since I started watching your videos everything seems to be a piece of cake.
Thanks for the walkthrough bro! Key takeaways: Understand data, formulate approach and then write code. BTW you used 2 CTEs which means 2 different joins on the same set of tables. You could have just used the one join along with sum(case when ...) statements to sum up the Dec and Jan comments. Just some food for thought. Cheers!
I took a different approach: with sums as ( SELECT country, sum(case when created_at between cast('2019-12-1' as date) and cast('2019-12-31' as date) then number else 0 end) dec, sum(case when created_at between cast('2020-1-1' as date) and cast('2020-1-31' as date) then number else 0 end) jan FROM users a inner join counts b on a.user_id=b.user_id) group by country), ranks as ( select country, dense_rank() over (partition by country order by dec) decRank, dense_rank() over (partition by country order by jan) JanRank from sums) select country from ranks where janRank < decRank
Just wanted to say that your videos and the StrataScratch platform have been incredibly helpful in preparing for interviews. Thanks for the amazing content.
Great Explanation Nate. Thanks. Btw I also implemented the solution using group by on "Country" and "Month" and pivoted the table on the month column then used dense_rank() over "December" and "January" to extract the country that improved their ranking. It sounds complex though.😁
Hi, thank you for your time and effort in creating these videos! Immensely helpful!I had a query about the date syntax - Is there a specific reason you use the operator clause for filtering instead of between?
I have doubt that in your output Australia is having same number of comments in both the months means there is no rise so how it should be in the output. Was is necessary to use ranking function? Instead can we compare the comments?
@Nate at StrataScratch Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.
Thanks for this Nate, really helpful. Quick question: is there a reason why you are summing in the window functions? I am assuming each country only has one row
The window function just helps with the ranking. I'm not partitioning my data but just taking the ranking over the entire window (which is the entire dataset) so that I can get a ranking.
Hello, I've been watching this youtube channel a lot and found it really helpful. If you don't mind me asking a simple question?. Although I saw your video related which kinds of interview questions do DS role has to take, is it quite common to solve normal algorithm coding test (the one that for SWE for usual) for a DS interview? Thank you.
Yea it's pretty common these days to be asked algo questions on a DS interview. It might not be very common for a data analyst or similar position but for a DS it's starting to trend. I'm hoping to do a few videos on this once I add these types of questions to my platform. Thanks for watching!
Amazing video! One question: Every Facebook Data Science interview example I've seen included SQL only, so I was wondering if FB only asks SQL coding questions? Or do they also ask Python?
Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.
@@SelmanAy Yes, I've been playing around with the idea of some python videos! I'm planning to add some algorithm questions that pops up on a few interviews so I may use python for that. Thanks for watching!
Hi Nate, here's my solution without left join with cte1 as ( -- for users who have commented get the country details -- extract the month and year from created_at select c.user_id, u.country, extract(YEAR from created_at) as year, extract(MONTH from created_at) as month, number_of_comments as comments from fb_comments_count c, fb_active_users u where c.user_id = u.user_id ), cte2 as ( -- for each country find all the comments in dec and jan select country, sum(case when month = 12 and year = '2019' then comments else 0 end ) as dec_comments, sum(case when month = 1 and year = '2020' then comments else 0 end ) as jan_comments from cte1 group by country order by country ), cte3 as ( -- rank the each country based on dec comments and jan comments select country,dec_comments, dense_rank() over(order by dec_comments desc) dec_rank, jan_comments, dense_rank() over(order by jan_comments desc) jan_rank from cte2 order by country ) -- highest number of comments get first rank, second-highest second rank -- lesser rank in Jan means risen in ranking --select country, jan_comments,jan_rank, dec_comments, dec_rank select country from cte3 where jan_rank < dec_rank order by country
Given the 2, I would learn postgres bc it's the engine is created for analytics in mind. Many analytical teams use postgres. But in practice, it doesn't really matter. Most industry jobs will use other engines like Snowflake or HIVE. What's important is that you learn any SQL engine and learn to think about how to solve analytical questions.
@@nikhilverma2605 yea some courses are ok. I don’t know if any Udemy courses unfortunately. Try some free courses first like Mode Analytics before paying.
An attendance log for every student in a school district attendance_events : date | student_id | attendance • A summary table with demographics for each student in the district all_ students : student_id | school_id | grade_level | date_of_birth | hometown Using this data, you could answer questions like the following: • What percent of students attend school on their birthday? • Which grade level had the largest drop in attendance between yesterday and today?
Obtain country and count of comments by month. Restrict it to December 2019 and January 2020 comments. Subtract December 2019 from January 2020 value. See the country that had the maximum positive difference. I could do this query with click and drag in about 3 seconds in Microsoft Access, but no, let's write out SQL code like it's 1976.
I believe the online ide is not working properly, unless you put in the expected result. Can someone please explain, why it is throwing errors when I use user_id and country columns (it is saying column name is ambiguous). I had to comment it out in order for the above code to run. SELECT --user_id, sum(number_of_comments) as comments_dec, --country, dense_rank() over(order by sum(number_of_comments)desc) as country_rank FROM fb_active_users as a LEFT JOIN fb_comments_count as b ON a.user_id = b.user_id WHERE created_at BETWEEN DATE('2019-12-01') AND ('2019-12-31') and country is not null GROUP BY country;
It's throwing an error because there's column user_id in both tables. So, yes, these columns are ambiguous. What you're missing is an alias in front of the column user_id. Which alias you'll use depends on which table you want to fetch user_id from. But if you're following my solution, there's no user_id in the SELECT statement; only country, SUM(), and DENSE_RANK()
@@stratascratch I don’t think I should have been working on window functions as a beginner. I hope you make a video guide of the sql concepts to learn first from A to Z
This is much faster and efficient: With cte as ( select base.*, dense_rank() over(partition by year_month order by total_comments desc) as ranks from (Select country, format(created_at, ‘YYYY-MM) year_month, sum(comments) total_comments From fb_active_users u inner join comments_per_user c On c.user_id = u.user_id and created_at between ‘2019-12-01’ and ‘2020-01-30’ Group by country, year_month) base ) select cte.country, ranks, lead(ranks) over (partition by country order by year_month) as next_month_ranks from cte where next_month_ranks < ranks;
I like your query but I think Sandeep there is flaw when u r calculating the rank based on partition of only year and month , as it is calculating the rank for Dec or Jan month by comparing with all the Dec or Jan dates of all countries. What if we put country also in that partition so it would give us the rank based on country and year_month. Please correct me if I am going wrong way
Could work! I honestly didn’t run this against real data, just thought using a two level ranking function could make the query smaller so put it in here quick and dirty! So whatever works best 😊
Hi Nate. Thanks for your video. I tried the below code to get sum of number of comments for Dec and Jan. But I am getting a different number when compared with yours. For Eg : In USA for Dec and Jan i am getting 11 and 9 respectively. But for you both months are 11. Confused!! select COUNTRY, SUM(CASE WHEN MONTH(created_at)='12' AND Year(created_at)='2019' THEN number_of_comments END) as dec_2019, SUM(CASE WHEN MONTH(created_at)='1' AND Year(created_at)='2020' THEN number_of_comments END) as Jan_2020 from fb_active_users a LEFT JOIN fb_comments_count b ON a.user_id=b.user_id GROUP BY COUNTRY;
Hey, thanks for giving it a try. Can you leave a comment in the discussion forum on the platform? Someone from my team will be able to help much more quickly with technical questions!
Bro, your channel is just gold !!!! I´ve been spending 3 hours a day watching every day because before watching your videos I was not able to solve a single question, but since I started watching your videos everything seems to be a piece of cake.
Wow, thanks! Music to my ears. I am happy the videos I prepared, together with my team have helped you.
Thanks for the walkthrough bro! Key takeaways: Understand data, formulate approach and then write code. BTW you used 2 CTEs which means 2 different joins on the same set of tables. You could have just used the one join along with sum(case when ...) statements to sum up the Dec and Jan comments. Just some food for thought. Cheers!
For sure the best resources online for interview prep for not just solving SQL problem but also approaching it conceptually. Thanks again.
Thanks for watching! Really appreciate the support.
I took a different approach:
with sums as (
SELECT country,
sum(case when created_at between cast('2019-12-1' as date) and cast('2019-12-31' as date) then number else 0 end) dec,
sum(case when created_at between cast('2020-1-1' as date) and cast('2020-1-31' as date) then number else 0 end) jan
FROM users a inner join counts b on a.user_id=b.user_id)
group by country),
ranks as (
select country,
dense_rank() over (partition by country order by dec) decRank,
dense_rank() over (partition by country order by jan) JanRank
from sums)
select country
from ranks
where janRank < decRank
Just wanted to say that your videos and the StrataScratch platform have been incredibly helpful in preparing for interviews. Thanks for the amazing content.
you're the best SQL teacher I ever got!! :O amazing explanations :)
Thanks for watching! Really appreciate it 😎
Great solution, learn a lot from this. Thank you
This explanation is so good. I wish I found this channel a few months earlier!
Thanks for watching, buddy!
Great Explanation Nate. Thanks. Btw I also implemented the solution using group by on "Country" and "Month" and pivoted the table on the month column then used dense_rank() over "December" and "January" to extract the country that improved their ranking. It sounds complex though.😁
I thought the same
God, am I glad I found StrataScratch or what?
That was awesome!💥🔥💡 Thanks a lot.
Definitely welcome.
Hi, thank you for your time and effort in creating these videos! Immensely helpful!I had a query about the date syntax - Is there a specific reason you use the operator clause for filtering instead of between?
No there was no specific reason. In reality, you can use either. I often switch. between the two myself. Thanks for watching!
Thanks Nate👍🏻👍🏻👍🏻
That was AMAZING!
Your videos helped me a lot in increasing my SQL skills ! Do you use any other language (python / R) at your work ?! Thanks !
Yes, I use python a lot! Some people use R but python helps with automation
I have doubt that in your output Australia is having same number of comments in both the months means there is no rise so how it should be in the output. Was is necessary to use ranking function? Instead can we compare the comments?
best of the best
please, keep going
If you can shoot a video each week, it will be amazing for us. Thanks Nate :)
That was my original plan! It's been difficult but I have more videos coming out soon.
@Nate at StrataScratch Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.
Thanks nate
Dude, you rock!
Thanks, dude! Really appreciate it!
Thanks for this Nate, really helpful. Quick question: is there a reason why you are summing in the window functions? I am assuming each country only has one row
The window function just helps with the ranking. I'm not partitioning my data but just taking the ranking over the entire window (which is the entire dataset) so that I can get a ranking.
Hi @@stratascratch: Shouldn't the dense_rank be ordered by num_of_comments rather than sum(num_of_comments)
Hello, I've been watching this youtube channel a lot and found it really helpful. If you don't mind me asking a simple question?. Although I saw your video related which kinds of interview questions do DS role has to take, is it quite common to solve normal algorithm coding test (the one that for SWE for usual) for a DS interview? Thank you.
Yea it's pretty common these days to be asked algo questions on a DS interview. It might not be very common for a data analyst or similar position but for a DS it's starting to trend. I'm hoping to do a few videos on this once I add these types of questions to my platform. Thanks for watching!
@@stratascratch Thank you so much for answering my question. I got a lot of help from watching your channel as a MS DS student. Thank yoiu
Amazing video! One question: Every Facebook Data Science interview example I've seen included SQL only, so I was wondering if FB only asks SQL coding questions? Or do they also ask Python?
I think most people decide to use SQL. You can pick any scripting language to answer coding questions on FB interviews (from my experience).
Hi Nate, I love your videos and they are very helpful, would you also prefer similar videos for python? for those people who prepares data engineer interviews ( like me :) ) would be very useful as well. Thanks for creating amazing content.
@@SelmanAy Yes, I've been playing around with the idea of some python videos! I'm planning to add some algorithm questions that pops up on a few interviews so I may use python for that. Thanks for watching!
@@stratascratch so happy to hear this! looking forward to it!
Hi Nate, here's my solution without left join
with cte1
as
(
-- for users who have commented get the country details
-- extract the month and year from created_at
select c.user_id, u.country,
extract(YEAR from created_at) as year,
extract(MONTH from created_at) as month,
number_of_comments as comments
from fb_comments_count c,
fb_active_users u
where c.user_id = u.user_id
),
cte2
as
(
-- for each country find all the comments in dec and jan
select country,
sum(case when month = 12 and year = '2019' then comments else 0 end ) as dec_comments,
sum(case when month = 1 and year = '2020' then comments else 0 end ) as jan_comments
from cte1
group by country
order by country
),
cte3
as
(
-- rank the each country based on dec comments and jan comments
select country,dec_comments,
dense_rank() over(order by dec_comments desc) dec_rank,
jan_comments,
dense_rank() over(order by jan_comments desc) jan_rank
from cte2
order by country
)
-- highest number of comments get first rank, second-highest second rank
-- lesser rank in Jan means risen in ranking
--select country, jan_comments,jan_rank, dec_comments, dec_rank
select country
from cte3
where jan_rank < dec_rank
order by country
Can't we use Rank function here in window function
Hey Nate, your "created_at
Not a huge deal as the optimizer will determine this and switch to an inner join.
Should i learn MYSQL or Postgresql
please tell me that
Given the 2, I would learn postgres bc it's the engine is created for analytics in mind. Many analytical teams use postgres. But in practice, it doesn't really matter. Most industry jobs will use other engines like Snowflake or HIVE. What's important is that you learn any SQL engine and learn to think about how to solve analytical questions.
Would you reccomend learning from udemy
@@nikhilverma2605 yea some courses are ok. I don’t know if any Udemy courses unfortunately. Try some free courses first like Mode Analytics before paying.
well what about data analytic by google at cousera
@@nikhilverma2605 ahh yea that's right. I forgot about that one. I hear great things about it.
An attendance log for every student in a school district attendance_events :
date | student_id | attendance
• A summary table with demographics for each student in the district all_
students : student_id | school_id | grade_level | date_of_birth | hometown
Using this data, you could answer questions like the following:
• What percent of students attend school on their birthday?
• Which grade level had the largest drop in attendance between yesterday
and today?
can you pls solve this?
no Coalesce?
If you wanted to, I supposed you could. But it wasn't required for this question.
Basically, I'm screwed... :(
Keep practicing! it gets much easier!
this is a hard question....
Obtain country and count of comments by month. Restrict it to December 2019 and January 2020 comments. Subtract December 2019 from January 2020 value. See the country that had the maximum positive difference.
I could do this query with click and drag in about 3 seconds in Microsoft Access, but no, let's write out SQL code like it's 1976.
Now try to do this with 1 billion rows , ok buddy?
@@ssoossdd ? You just make the query in MS Access so all t he SQL is written for you. Then copy it to whatever else you're using.
I believe the online ide is not working properly, unless you put in the expected result. Can someone please explain, why it is throwing errors when I use user_id and country columns (it is saying column name is ambiguous). I had to comment it out in order for the above code to run.
SELECT
--user_id,
sum(number_of_comments) as comments_dec,
--country,
dense_rank() over(order by sum(number_of_comments)desc) as country_rank
FROM fb_active_users as a
LEFT JOIN fb_comments_count as b
ON a.user_id = b.user_id
WHERE created_at BETWEEN DATE('2019-12-01') AND ('2019-12-31') and country is not null
GROUP BY country;
It's throwing an error because there's column user_id in both tables. So, yes, these columns are ambiguous. What you're missing is an alias in front of the column user_id. Which alias you'll use depends on which table you want to fetch user_id from. But if you're following my solution, there's no user_id in the SELECT statement; only country, SUM(), and DENSE_RANK()
@@stratascratch I was a super noop when I wrote that comment but I know now. Thanks 🙏🏾 for the tutorials
@@jaymo2024 Everyone has to start somewhere. You're doing great!
@@stratascratch I don’t think I should have been working on window functions as a beginner. I hope you make a video guide of the sql concepts to learn first from A to Z
This is much faster and efficient:
With cte as
(
select base.*, dense_rank() over(partition by year_month order by total_comments desc) as ranks from
(Select country, format(created_at, ‘YYYY-MM) year_month, sum(comments) total_comments
From fb_active_users u
inner join comments_per_user c
On c.user_id = u.user_id and
created_at between ‘2019-12-01’ and ‘2020-01-30’
Group by country, year_month) base
)
select cte.country, ranks, lead(ranks) over (partition by country order by year_month) as next_month_ranks from cte
where next_month_ranks < ranks;
Love it! Thanks for sharing
I like your query but I think Sandeep there is flaw when u r calculating the rank based on partition of only year and month , as it is calculating the rank for Dec or Jan month by comparing with all the Dec or Jan dates of all countries. What if we put country also in that partition so it would give us the rank based on country and year_month. Please correct me if I am going wrong way
We can discuss more on this Sandeep if you are okay.
Could work! I honestly didn’t run this against real data, just thought using a two level ranking function could make the query smaller so put it in here quick and dirty! So whatever works best 😊
Hi Nate. Thanks for your video. I tried the below code to get sum of number of comments for Dec and Jan. But I am getting a different number when compared with yours. For Eg : In USA for Dec and Jan i am getting 11 and 9 respectively.
But for you both months are 11. Confused!!
select COUNTRY,
SUM(CASE WHEN MONTH(created_at)='12' AND Year(created_at)='2019' THEN number_of_comments END) as dec_2019,
SUM(CASE WHEN MONTH(created_at)='1' AND Year(created_at)='2020' THEN number_of_comments END) as Jan_2020
from fb_active_users a
LEFT JOIN fb_comments_count b
ON a.user_id=b.user_id
GROUP BY COUNTRY;
Hey, thanks for giving it a try. Can you leave a comment in the discussion forum on the platform? Someone from my team will be able to help much more quickly with technical questions!
@@stratascratch Sure. Thanks for your reply