Thanks for the kind words. I am going to be doing some python coding sometime in the near future which would include some data sampling (or more specifically how to do that). I'll be creating tutorials on how to work with python soon once I get through some "interviewing" material. Hope you keep watching.
Thanks! Many of my interview questions series can be adapted for the work environment. In the newer videos I always try to explain how you'd solve the problem on an interview and how to do so on the job so I try to straddle both realms. But I see your point -- If I find a good set of questions that are perfect for the work environment, I'll definitely cover it in a video
@@stratascratch Awesome thanks! Another suggestion might be going over some of the SQL theory, specifically comparing computing power used for certain queries or certain parts of those queries. I know you mention it in some of your videos but that could still be helpful! Thanks again for all the great resources that you put on UA-cam, I'll be using Strata Scratch very soon!
this content is awesome! thank you! I had a question, in the business inspection scores problem, why did you need to write "distinct inspection_type" as a part of the sub query. I don't understand why/how you knew we need to use distinct to solve this.
mainly because I didn't want duplicates in my output so I used distinct in the subquery. We probably could have used distinct at the very end but I just chose to do it in the subquery. A lot of this is about playing with the data but if you're on an interview you can ask the interviewer about whether or not you care about uniqueness or duplicates in the final output.
Hey man Thanks for sharing.. Just a question i have not found second and third question on your website could you please suggest how i can find these questions
Hello Nate, thanks for this awesome video. I have one doubt. In 1st question, inside the cte why do we use group by 1,2,3 because inside the cte we don't use any aggregate function. But if I remove that the average column value doesn't come right in the final output. I'm missing something here. Could you please help?
What I'm trying to do with the group by 1,2,3 is dedup any information since I want everything to be unique. Probably not required on this dataset though but give it a try on the platform
It's usually through looking at the table schema first. I can usually figure out how to design the metrics. I mainly only look at the schema because on interviews, you don't usually have access to the data so you have to make all your assumptions based on the column names. The platform is nice in that you can look at the data so I then will explore the dataset to make sure my assumptions are correct.
Your sql videos are really very helpful....It would be great if you could prepare a video on real time sql query containing multiple sub queries (basically multiple from, may be 3-4 inner queries ) covering around 100 plus line of codes......Bsically need to understand how someone approach. to code such a large and complex query and where do they start querying since that involve multiple set of inner queries....
oh man...yea I see where you're going with this. Basically, I probably would have 3-4 inner queries and CTEs but I'd probably also have a lot of tables and temp tables to help me manage the data as well. Let me think about what I can do for 100+ lines. It's not something too easy to come up with but I'll see what I can do.
Hi, could u help me in this query if u are having two tables flights consists of following columns I'd,company name , departure name and jet _id and jet consists of I'd , capacity here jet_id is foreign key reference id in jet table I want sort the data with multiple condition like if total number of rows are equal by each company name then sort by sum of capacity column in jet and if count two rows are not equal sort by count of each company and then depature name, and still equal count sort by departure time
I am not sure if you can still see this comment after almost a year but I am wondering for the first Airbnb question, why do you have to specify "Group by 1, 2, 3". I didn't specify and my answer is so close to the right answer, and could you explain what does 1, 2, 3 mean in this situation?
Group by 1,2,3 will group the output by the first 3 columns - host id, num of reviews, and price. This helps to de-duplicate any rows. You kind of want to dedup because there shouldn't be any rows where there are duplicates of host_id, number of reviews, and price. These 3 combinations of columns are unique. Hope this clears it!
Because there's no 2nd column being returned. It's just 1 column which is that aggregate. So what would you be grouping by? Let me know if that makes sense.
I wonder why I get a different answer for the average price in the first question? This is my SQL select A.host_pop_rating, min(A.price) as min_price, max(A.price) as max_price, avg(A.price) as avg_price from (select *, case when number_of_reviews = 0 then 'New' when number_of_reviews
why we cannot use this query select case when number_of_reviews40 then 'Hot' end as a, min(price),avg(price),max(price) from airbnb_host_searches group by a order by a
I think it's because the case statement (column 4) was considered an aggregate so I required to group the columns in order for the code to run successfully.
It could be that instead of the GROUP BY, just use a DISTINCT initially at the select statement, its worth checking it out.. Seems like the group by acts for getting unique combinations of the first 3 columns. Personally I didin't know that you can set a group by when there is no aggregate. nor that a case statement is considered as an aggregate
Yes, of course use a CTE. I prefer them as well. Sometimes when I'm working through a problem and communicating my solution, it's more natural to use them as subqueries just based on how I'm explaining the concepts.
It's in the description. But here it is copy and pasted: - CASE statement for numerical values: bit.ly/2Wmnvad - CASE statement for text: bit.ly/3p0eboP - The most popular CASE statement on SQL interviews: bit.ly/2ITCi9f
@@stratascratchthanks Nate. Is there “additive Case statement” in sql? If yes can you show an example? Thanks a lot! You have been very helpful to me in learning sql. Also can you show an example of “cross apply” vs “inner join”?
@@modhua4497 What do you mean by additive case statement? I haven't used cross apply in SQL but here are some resources I found 1. stackoverflow.com/questions/1139160/when-should-i-use-cross-apply-over-inner-join 2. explainextended.com/2009/07/16/inner-join-vs-cross-apply/ Seems like you'd use cross apply where there's no simple join query. I haven't had to use cross apply in my work but maybe it's because I use python for anything I can't solve with my knowledge in SQL. Hope that helps!
Hello! I was trying to solve the second problem using the query below but why the result is different from yours. Can you tell me, please? thank you. WITH new_view AS( SELECT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id, number_of_reviews, price, CASE WHEN number_of_reviews = 0 THEN 'New' WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising' WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up' WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular' WHEN number_of_reviews > 40 THEN 'Hot' END AS host_popularity FROM airbnb_host_searches GROUP BY 1,2,3) SELECT host_popularity AS host_pop_rating, MIN(price) AS min_price, MAX(price) AS max_price, AVG(price) AS avg_price FROM new_view GROUP BY host_pop_rating;
Your thumbnails are getting more awesome :D Also awesome content ofc 🔥
haha, I'm trying to push the ridiculousness up a level to get that CTR up.
These practical scenarios of case are really helpful n useful even for our own IT implementations.
The most valuable training material on SQL ever.
Sampling data, probability, p-value - these things would be very interesting for me.
Thanks for the kind words. I am going to be doing some python coding sometime in the near future which would include some data sampling (or more specifically how to do that). I'll be creating tutorials on how to work with python soon once I get through some "interviewing" material. Hope you keep watching.
@@stratascratch Will look forward to that as well! Thanks again :)
I gave a sql interview recently and 3 out of 4 questions were exactly based on this logic
in my opinion, the most confusing part of the question is understanding the need for Host_id, a bit tricky.
I LOVE YOUR CONTENT! Could you maybe do a series on the most common SQL scenarios that a DS would face in a work environment?
Thanks! Many of my interview questions series can be adapted for the work environment. In the newer videos I always try to explain how you'd solve the problem on an interview and how to do so on the job so I try to straddle both realms. But I see your point -- If I find a good set of questions that are perfect for the work environment, I'll definitely cover it in a video
@@stratascratch Awesome thanks! Another suggestion might be going over some of the SQL theory, specifically comparing computing power used for certain queries or certain parts of those queries. I know you mention it in some of your videos but that could still be helpful! Thanks again for all the great resources that you put on UA-cam, I'll be using Strata Scratch very soon!
@@zulumane Yea that's a great suggestion. For some of the videos, I can dive deeper into optimizations and performance.
Really Informative video
Excellent explanation. Thanks for posting.
Thanks for watching! Appreciate it
this video is super helpful and awesome, thank you a bunch :D
Glad you found it educational and helpful!
this content is awesome! thank you! I had a question, in the business inspection scores problem, why did you need to write "distinct inspection_type" as a part of the sub query. I don't understand why/how you knew we need to use distinct to solve this.
mainly because I didn't want duplicates in my output so I used distinct in the subquery. We probably could have used distinct at the very end but I just chose to do it in the subquery. A lot of this is about playing with the data but if you're on an interview you can ask the interviewer about whether or not you care about uniqueness or duplicates in the final output.
Hey man Thanks for sharing.. Just a question i have not found second and third question on your website could you please suggest how i can find these questions
Sir please mention you are doing this in postgres SQL. there's nothing like ilike in mysql.
Can you please show examples of counting retained users by date difference between first_transaction_date and transaction date but in a dynamic way.
Best SQL Tutorial
Thanks! I'm glad you've enjoyed them. More to come! And maybe I'll dive into some python if people are interested.
Thank you! Your videos are super helpful :)
Thank you for watching!
you are awesome! keep it up!
Hello Nate, thanks for this awesome video. I have one doubt. In 1st question, inside the cte why do we use group by 1,2,3 because inside the cte we don't use any aggregate function. But if I remove that the average column value doesn't come right in the final output. I'm missing something here. Could you please help?
What I'm trying to do with the group by 1,2,3 is dedup any information since I want everything to be unique. Probably not required on this dataset though but give it a try on the platform
@@stratascratch but without that group by the answer doesn't come right. It is needed. But not sure why
Very helpful, thanks a lot!
So especially the 3rd one, what if we use Where to select the special cases???
Thank you very much for sharing sir, you help me a lot, God Bless You and Family :)
Thanks for the good content! Thumbs up! One question from some of ur videos, how did u come up the metrics design by looking at the SQL data?
It's usually through looking at the table schema first. I can usually figure out how to design the metrics. I mainly only look at the schema because on interviews, you don't usually have access to the data so you have to make all your assumptions based on the column names. The platform is nice in that you can look at the data so I then will explore the dataset to make sure my assumptions are correct.
Excellent content!
Thanks for watching!
Your sql videos are really very helpful....It would be great if you could prepare a video on real time sql query containing multiple sub queries (basically multiple from, may be 3-4 inner queries ) covering around 100 plus line of codes......Bsically need to understand how someone approach. to code such a large and complex query and where do they start querying since that involve multiple set of inner queries....
oh man...yea I see where you're going with this. Basically, I probably would have 3-4 inner queries and CTEs but I'd probably also have a lot of tables and temp tables to help me manage the data as well. Let me think about what I can do for 100+ lines. It's not something too easy to come up with but I'll see what I can do.
@@stratascratch thanx a lot...we are really learning a lot 😊
@@iitian2012 more videos to come!
Ilike is cool one. Learnt a new function today.
Btw nate, what rdbms is it exactly...like hive or mysql or something else
This is postgres. Thanks for watching
Anyway I can pay for an hour over teams to check my answers?
Hi, could u help me in this query if u are having two tables flights consists of following columns I'd,company name , departure name and jet _id and jet consists of I'd , capacity here jet_id is foreign key reference id in jet table I want sort the data with multiple condition like if total number of rows are equal by each company name then sort by sum of capacity column in jet and if count two rows are not equal sort by count of each company and then depature name, and still equal count sort by departure time
nice one, thanks
What are the requirements to get in as a DS, questions are seem too simple.
Hi, Can you pls help let me know why we did not use this filter as well: fb_search_results.notes ILIKE CONCAT ('%',fb_search_results.query,'%')
I am not sure if you can still see this comment after almost a year but I am wondering for the first Airbnb question, why do you have to specify "Group by 1, 2, 3". I didn't specify and my answer is so close to the right answer, and could you explain what does 1, 2, 3 mean in this situation?
Group by 1,2,3 will group the output by the first 3 columns - host id, num of reviews, and price. This helps to de-duplicate any rows. You kind of want to dedup because there shouldn't be any rows where there are duplicates of host_id, number of reviews, and price. These 3 combinations of columns are unique. Hope this clears it!
count(case when position
Because there's no 2nd column being returned. It's just 1 column which is that aggregate. So what would you be grouping by? Let me know if that makes sense.
I wonder why I get a different answer for the average price in the first question?
This is my SQL
select A.host_pop_rating, min(A.price) as min_price, max(A.price) as max_price, avg(A.price) as avg_price
from
(select *,
case when number_of_reviews = 0 then 'New'
when number_of_reviews
Can you post this solution onto the user discussion board on this question? Someone can help you within 1-2 days.
Here's the link to the question (platform.stratascratch.com/coding-question?id=9632&python=)
Do you have videos on data model questions from facebook?
Unfortunately not really. I don't cover any data modeling questions yet...
why we cannot use this query
select
case
when number_of_reviews40 then 'Hot'
end as a,
min(price),avg(price),max(price)
from airbnb_host_searches
group by a
order by a
Try it in the platform! I don't see anything wrong with it but I haven't executed the query to see what the differences are between mine and yours.
Dont understand why here needs a group by 1,2,3 on in inner code
I think it's because the case statement (column 4) was considered an aggregate so I required to group the columns in order for the code to run successfully.
It could be that instead of the GROUP BY, just use a DISTINCT initially at the select statement, its worth checking it out..
Seems like the group by acts for getting unique combinations of the first 3 columns.
Personally I didin't know that you can set a group by when there is no aggregate. nor that a case statement is considered as an aggregate
Can I simply use CTE instead of the subqueries? CTE looks very clean.
Yes, of course use a CTE. I prefer them as well. Sometimes when I'm working through a problem and communicating my solution, it's more natural to use them as subqueries just based on how I'm explaining the concepts.
Thanks a lot Sir
Thanks so much! Glad you enjoyed this one.
Which coding platform it is??
This is off of platform.stratascratch.com which uses a postgres db engine and python3
What is the link for this question? Thanks
It's in the description. But here it is copy and pasted:
- CASE statement for numerical values: bit.ly/2Wmnvad
- CASE statement for text: bit.ly/3p0eboP
- The most popular CASE statement on SQL interviews: bit.ly/2ITCi9f
@@stratascratchthanks Nate. Is there “additive Case statement” in sql? If yes can you show an example? Thanks a lot! You have been very helpful to me in learning sql.
Also can you show an example of “cross apply” vs “inner join”?
@@modhua4497 What do you mean by additive case statement? I haven't used cross apply in SQL but here are some resources I found
1. stackoverflow.com/questions/1139160/when-should-i-use-cross-apply-over-inner-join
2. explainextended.com/2009/07/16/inner-join-vs-cross-apply/
Seems like you'd use cross apply where there's no simple join query. I haven't had to use cross apply in my work but maybe it's because I use python for anything I can't solve with my knowledge in SQL. Hope that helps!
Thanks Nate. Have a good day.
Thank you
Thanks for watching. Let me know if you have any topics you'd like me to cover. Happy to make videos to help others.
Hello!
I was trying to solve the second problem using the query below but why the result is different from yours. Can you tell me, please? thank you.
WITH new_view AS(
SELECT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
number_of_reviews,
price,
CASE
WHEN number_of_reviews = 0 THEN 'New'
WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
WHEN number_of_reviews > 40 THEN 'Hot'
END AS host_popularity
FROM airbnb_host_searches
GROUP BY 1,2,3)
SELECT host_popularity AS host_pop_rating,
MIN(price) AS min_price,
MAX(price) AS max_price,
AVG(price) AS avg_price
FROM new_view
GROUP BY host_pop_rating;
Hi, is it possible for you to ask this question on the platform's discussion forum? Someone on my team will answer your question within a few days!
Your host_id creation has one more value, maybe that makes the difference.
please create some content on TIMESTAMP .thank you!
Will try to find some examples! Thanks for the suggestion.
Tnx
Thanks for watching!
Thank You