7:14 people make this mistake because of the phrasing of the question. "customer" should be changed to "customer(s)" and "first name" should be changed to "first name(s)". this way, the question suggests the possibility that the solution can contain more than one person. 8:09 what is "data at scale issues"?
I agree with you. It's important to talk to the interviewer to identify potential issues like this. You won't always get a perfectly phrased question in real life, especially on the job, so the purpose of this question is really to think of all possible scenarios and build a solution that solves for it. Doing it without someone prompting you to look for these scenarios is even better. Data at scale issues are data issues that pop-up only when you're dealing with a massive amount of data (ie, the amount of data that Facebook or Google might encounter). Because of this you might have to handle the data differently.
Hi , just one thing if we could aggregate the total order value at customer level between the dates then we can accurately get the results. As in solution, it might be case that one customer has various order value and after aggregating we get total order value for each customer and then decide which is the top ones.
Trying out the StrataScratch question - I'm returning two records, Mia and Farida, both with 400 for the total and their respective order dates, but it's not accepting it. The Expected Output button is showing 430 for the order totals, but I'm not seeing that order total in the data. Am I missing something? Edit: Ah, multiple orders on the same date, got it. Was thrown off since you had 400 in your video there.
Hi, yes, there's actually a change in the solution because I realized that the 400 corresponded to 400 for 1 item that was purchased on a specific day, not all purchases on a specific day. When you count all purchases made by 1 customer on a specific day, the max is 430. So the code below will get you there. The concept is the same though. In the video, I really wanted to highlight the fact that LIMIT 1s are common mistakes people make. To properly find the customers with the highest number, you have to consider that many people meet that max, and write your solution accordingly. Thanks for trying the code out on the platform. Sorry for the confusion. SELECT first_name, order_date, sum(total_order_cost) FROM (SELECT first_name, order_cost * order_quantity AS total_order_cost, order_date FROM orders o LEFT JOIN customers c ON o.cust_id = c.id WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' ) a GROUP BY first_name, order_date HAVING sum(total_order_cost) = (SELECT max(total_order_cost) FROM (SELECT sum(order_cost * order_quantity) AS total_order_cost FROM orders WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' GROUP BY cust_id, order_date) b )
Hello, Thank you for the great video. I have a question! Since we typed in the subquery limit 1, why the overall output still returning 2 rows? cheerz!
The limit one was just to get the max total order cost. Once I got that, I used it to get all the customers that matched the max. And that's why you see 2 rows in the output. Hope that helps and thanks for watching!
My public one is not other than the one that I linked to in the description. I have one at work and private ones for StrataScratch. But I'll be posting more projects throughout the year and putting them on youtube.
7:14 people make this mistake because of the phrasing of the question. "customer" should be changed to "customer(s)" and "first name" should be changed to "first name(s)". this way, the question suggests the possibility that the solution can contain more than one person.
8:09 what is "data at scale issues"?
I agree with you. It's important to talk to the interviewer to identify potential issues like this. You won't always get a perfectly phrased question in real life, especially on the job, so the purpose of this question is really to think of all possible scenarios and build a solution that solves for it. Doing it without someone prompting you to look for these scenarios is even better.
Data at scale issues are data issues that pop-up only when you're dealing with a massive amount of data (ie, the amount of data that Facebook or Google might encounter). Because of this you might have to handle the data differently.
What do you think about using the RANK function to handle the case of ties?
Or even simpler: a max() on the computed total without sorting and limit.
This could reduce the I/O a lot depending the table size.
max() as mentioned would be great to optimize the code. And then match the max value with others in the table.
thank you, Nate! Luv the t-shirt!
haha I'm a big fan of Boiler Room!
Great video.
Thanks for watching! Let me know if you have any feedback or topic suggestions.
Hi , just one thing if we could aggregate the total order value at customer level between the dates then we can accurately get the results. As in solution, it might be case that one customer has various order value and after aggregating we get total order value for each customer and then decide which is the top ones.
Awesome video as always!
Thanks for watching! Let me know if you have any feedback or topic suggestions.
Trying out the StrataScratch question - I'm returning two records, Mia and Farida, both with 400 for the total and their respective order dates, but it's not accepting it. The Expected Output button is showing 430 for the order totals, but I'm not seeing that order total in the data. Am I missing something?
Edit: Ah, multiple orders on the same date, got it. Was thrown off since you had 400 in your video there.
Hi, yes, there's actually a change in the solution because I realized that the 400 corresponded to 400 for 1 item that was purchased on a specific day, not all purchases on a specific day. When you count all purchases made by 1 customer on a specific day, the max is 430. So the code below will get you there.
The concept is the same though. In the video, I really wanted to highlight the fact that LIMIT 1s are common mistakes people make. To properly find the customers with the highest number, you have to consider that many people meet that max, and write your solution accordingly.
Thanks for trying the code out on the platform. Sorry for the confusion.
SELECT first_name,
order_date,
sum(total_order_cost)
FROM
(SELECT first_name,
order_cost * order_quantity AS total_order_cost,
order_date
FROM orders o
LEFT JOIN customers c ON o.cust_id = c.id
WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' ) a
GROUP BY first_name,
order_date
HAVING sum(total_order_cost) =
(SELECT max(total_order_cost)
FROM
(SELECT sum(order_cost * order_quantity) AS total_order_cost
FROM orders
WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1'
GROUP BY cust_id,
order_date) b )
@@stratascratch the question is vague, because it could be interpreted as the total cost for the week or the month too.
great video as always, Thanks
Thanks for watching as always
Hello,
Thank you for the great video.
I have a question!
Since we typed in the subquery limit 1, why the overall output still returning 2 rows?
cheerz!
The limit one was just to get the max total order cost. Once I got that, I used it to get all the customers that matched the max. And that's why you see 2 rows in the output. Hope that helps and thanks for watching!
Hey, we can also do this using DENSE_RANK() ?
Your github is not updated since past one year Nate?
My public one is not other than the one that I linked to in the description. I have one at work and private ones for StrataScratch. But I'll be posting more projects throughout the year and putting them on youtube.
select top 1 with ties (ms sql server) ,I was trying to do it exactly the way I do in ms sql server then found out it doesnt work out in postgreSQL
Your explanation very and easy to understand..but your site name is very difficult to remember
haha I know right? If I had to do it all over again, I would have picked an easier name =)
Excellent
Thanks for watching. Let me know if you have any feedback or ideas for videos!
Thank You
Thanks for watching!