Performance Pains With NOT IN And NULLable Columns In SQL Server

Поділитися
Вставка
  • Опубліковано 27 жов 2024

КОМЕНТАРІ • 9

  • @BrandonChampion
    @BrandonChampion 3 дні тому +1

    "except an aggregate" 👍

  • @douglascoats7081
    @douglascoats7081 3 дні тому +1

    youre so close to 5k subscribers. neat

  • @DDEGStakeholderManagement
    @DDEGStakeholderManagement 2 дні тому +1

    Guess I'll briefly explain why NOT IN (list containing NULL) doesn't work. Paraphrased from a video by Erland Sommarskog.
    First, realise that SQL Server logic is tri-valued. There's True, False, and NULL. For the last one, we can think of this as meaning 'Unknown'. It's shorthand for "Something belongs here, but SQL Server doesn't know what it is". As such, "1=1" is True, "1=0" is False, and "1=NULL" is NULL. It's neither True nor False, because the NULL could be equal to 1, or it could be 0, or anything else.
    With that primer out of the way, what happens when we evaluate "SELECT 1 WHERE 1 IN (1, NULL)"? There's an easy trick to simplify the meaning here. This query is equivalent to "SELECT 1 WHERE 1=1 OR 1=NULL". "1=1" is True, and "1=NULL" is NULL. However, "True OR x" evaluates to True regardless of the value of x. Therefore, the overall query simplifies to "SELECT 1 WHERE True", which of course returns 1.
    Now how about "SELECT 1 WHERE 1 NOT IN (0, NULL)"? Well, we use the same simplification trick, but this time it's slightly different. Instead, we get "SELECT 1 WHERE 10 AND 1NULL". Now, "10" is True, and "1NULL" is NULL. And here lies the crux of the problem: "True AND x" evaluates to x. In our case, x is NULL. Our overall query simplifies to "SELECT 1 WHERE NULL". As such, SQL Server returns nothing, as an arbitrary choice. It could have been chosen to return everything, but a blank result is more visually identifiable as an error.

    • @ErikDarlingData
      @ErikDarlingData  День тому

      That’s a great explanation! Thanks for adding it in here.

  •  2 дні тому

    Do you have one on why not exists fails and produces super slow anti-joins in some cases compared to left joining and eliminating nulls?
    I don't remember if it is a null issue also, but most things work better with separate literal null filtering on all the things that have them.
    Except is neat also, if you are loading something from multiple sources under different conditions and don't wont to insert rows that are already there.

    • @ErikDarlingData
      @ErikDarlingData  День тому

      I have a few things about why the opposite is true.
      Here’s one of them: erikdarling.com/why-not-exists-makes-more-sense-than-left-joins/
      I also have examples of when you can run into performance problems with not exists: erikdarling.com/performance-issues-with-not-exists-queries/
      In general though, I disagree with literal NULL filtering being better in most cases. Test carefully, as always.