Top Data Science Interview Question And Answer Mistakes 2021 [Asked By Amazon]

Поділитися
Вставка
  • Опубліковано 9 січ 2025

КОМЕНТАРІ • 29

  • @dominic2446
    @dominic2446 3 роки тому +2

    7:14 people make this mistake because of the phrasing of the question. "customer" should be changed to "customer(s)" and "first name" should be changed to "first name(s)". this way, the question suggests the possibility that the solution can contain more than one person.
    8:09 what is "data at scale issues"?

    • @stratascratch
      @stratascratch  3 роки тому

      I agree with you. It's important to talk to the interviewer to identify potential issues like this. You won't always get a perfectly phrased question in real life, especially on the job, so the purpose of this question is really to think of all possible scenarios and build a solution that solves for it. Doing it without someone prompting you to look for these scenarios is even better.
      Data at scale issues are data issues that pop-up only when you're dealing with a massive amount of data (ie, the amount of data that Facebook or Google might encounter). Because of this you might have to handle the data differently.

  • @culpgrant21
    @culpgrant21 4 роки тому +6

    What do you think about using the RANK function to handle the case of ties?

    • @tripy75
      @tripy75 4 роки тому +2

      Or even simpler: a max() on the computed total without sorting and limit.
      This could reduce the I/O a lot depending the table size.

    • @stratascratch
      @stratascratch  4 роки тому +3

      max() as mentioned would be great to optimize the code. And then match the max value with others in the table.

  • @BirSozben
    @BirSozben 3 роки тому

    thank you, Nate! Luv the t-shirt!

  • @anuragsingh4766
    @anuragsingh4766 4 роки тому +2

    Great video.

    • @stratascratch
      @stratascratch  4 роки тому

      Thanks for watching! Let me know if you have any feedback or topic suggestions.

  • @NeelanshuniSingh
    @NeelanshuniSingh 7 місяців тому

    Hi , just one thing if we could aggregate the total order value at customer level between the dates then we can accurately get the results. As in solution, it might be case that one customer has various order value and after aggregating we get total order value for each customer and then decide which is the top ones.

  • @MyRevoltec
    @MyRevoltec 4 роки тому

    Awesome video as always!

    • @stratascratch
      @stratascratch  4 роки тому

      Thanks for watching! Let me know if you have any feedback or topic suggestions.

  • @IanWitherow
    @IanWitherow 4 роки тому +1

    Trying out the StrataScratch question - I'm returning two records, Mia and Farida, both with 400 for the total and their respective order dates, but it's not accepting it. The Expected Output button is showing 430 for the order totals, but I'm not seeing that order total in the data. Am I missing something?
    Edit: Ah, multiple orders on the same date, got it. Was thrown off since you had 400 in your video there.

    • @stratascratch
      @stratascratch  4 роки тому

      Hi, yes, there's actually a change in the solution because I realized that the 400 corresponded to 400 for 1 item that was purchased on a specific day, not all purchases on a specific day. When you count all purchases made by 1 customer on a specific day, the max is 430. So the code below will get you there.
      The concept is the same though. In the video, I really wanted to highlight the fact that LIMIT 1s are common mistakes people make. To properly find the customers with the highest number, you have to consider that many people meet that max, and write your solution accordingly.
      Thanks for trying the code out on the platform. Sorry for the confusion.
      SELECT first_name,
      order_date,
      sum(total_order_cost)
      FROM
      (SELECT first_name,
      order_cost * order_quantity AS total_order_cost,
      order_date
      FROM orders o
      LEFT JOIN customers c ON o.cust_id = c.id
      WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' ) a
      GROUP BY first_name,
      order_date
      HAVING sum(total_order_cost) =
      (SELECT max(total_order_cost)
      FROM
      (SELECT sum(order_cost * order_quantity) AS total_order_cost
      FROM orders
      WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1'
      GROUP BY cust_id,
      order_date) b )

    • @dominic2446
      @dominic2446 3 роки тому

      @@stratascratch the question is vague, because it could be interpreted as the total cost for the week or the month too.

  • @mdabulkalamazad6775
    @mdabulkalamazad6775 4 роки тому

    great video as always, Thanks

  • @hta-bi249
    @hta-bi249 3 роки тому

    Hello,
    Thank you for the great video.
    I have a question!
    Since we typed in the subquery limit 1, why the overall output still returning 2 rows?
    cheerz!

    • @stratascratch
      @stratascratch  3 роки тому +1

      The limit one was just to get the max total order cost. Once I got that, I used it to get all the customers that matched the max. And that's why you see 2 rows in the output. Hope that helps and thanks for watching!

  • @niveditakumari701
    @niveditakumari701 3 роки тому

    Hey, we can also do this using DENSE_RANK() ?

  • @vikrant.9039
    @vikrant.9039 4 роки тому

    Your github is not updated since past one year Nate?

    • @stratascratch
      @stratascratch  4 роки тому

      My public one is not other than the one that I linked to in the description. I have one at work and private ones for StrataScratch. But I'll be posting more projects throughout the year and putting them on youtube.

  • @huseyindansman7415
    @huseyindansman7415 3 роки тому

    select top 1 with ties (ms sql server) ,I was trying to do it exactly the way I do in ms sql server then found out it doesnt work out in postgreSQL

  • @abhisheksinghchouhan8506
    @abhisheksinghchouhan8506 3 роки тому

    Your explanation very and easy to understand..but your site name is very difficult to remember

    • @stratascratch
      @stratascratch  3 роки тому

      haha I know right? If I had to do it all over again, I would have picked an easier name =)

  • @classkori5507
    @classkori5507 4 роки тому

    Excellent

    • @stratascratch
      @stratascratch  4 роки тому

      Thanks for watching. Let me know if you have any feedback or ideas for videos!

  • @rakibraihanrimon8784
    @rakibraihanrimon8784 4 роки тому

    Thank You