SQL Tutorial - How to compare multiple columns in SQL

Поділитися
Вставка
  • Опубліковано 16 жов 2024

КОМЕНТАРІ • 17

  • @johnbreslin4521
    @johnbreslin4521 Рік тому +1

    Very useful and a different way of comparing columns. I will have to try out this method. Thanks very much for an easy to follow video as always.

  • @briandennehy6380
    @briandennehy6380 2 роки тому +4

    Really interesting, never seen these functions before, thanks as always 👍

  • @Sttuey
    @Sttuey 2 роки тому +2

    Good discussion of the various methods! Have done very similar for tasks such as loading regular product files with frequent changed data such as stock levels, pricing, fees etc. For some cases with only a handful of columns to compare I've actually used *not exists* in combination with *intersect*, which is both performant and cruicially dependable for nullable columns; for some use-cases with many columns to compare i've gone the Hashbytes route but with SHA2 which was the most performant in testing and requires relatively little storage (security is not a concern in this situation). I use hashed checksum via an indexed computed column in the target tables and calculate it on the fly in dynamically created views against the staging tables that receive data via bulkload or polybase. Performance is excellent even with a product feed such as Amazon's with many millions of rows to be compared for changes.

    • @BeardedDevData
      @BeardedDevData  2 роки тому +1

      Excellent points, I also use not exists usually for snapshot Fact tables where we want to insert any balance changes. There are many options to choose from and when you look at the amount of columns and data types it's always best to test out different solutions 👍.

  • @KarinS-tk3qt
    @KarinS-tk3qt Рік тому +2

    Excellent video. I have a question , your video shows data compression and then finding the difference if exists.
    If I want to update the data in table 2 according to the change in table 1 or alternatively add rows that were not in table 2 - how can I do that?

    • @BeardedDevData
      @BeardedDevData  Рік тому

      Personally I then break this down in to separate insert and updates statements within a transaction, I mainly do this for auditing purposes and to control the size of transactions. Another option will be to use MERGE but I would advise testing properly, because it can cause performance issues and some adverse effects.

  • @trinhk
    @trinhk 7 місяців тому +1

    Great video, thank you. Would you consider implementing the various checksum methods for data migration of millions or hundreds of millions records?

    • @BeardedDevData
      @BeardedDevData  7 місяців тому

      It would depend on your acceptable range of collisions, certainly checksum is out the window, you'd have to use hashbytes with an algorithm that generates at least a 64-bit value. I've just had a look online for some numbers, found some details that say at 609 million rows you have a 1 in 100 chance of a collision, that is different inputs returning the same value, this may be acceptable as the alternative is column by column comparison to get 100% results. If this is a one off activity you could also use hashing then handle any collisions separately.

  • @Beesay1978
    @Beesay1978 Рік тому +1

    In my SSMS select checksum('1')
    select checksum('-1 ') returns 133 and 597 . It is ssms 2019. Thank you for your educational video.

  • @shima1960ify
    @shima1960ify 2 роки тому +1

    Excelente

  • @allmazd
    @allmazd Рік тому +1

    In normal use why not to use just a traditional method and not showing off with these tricky functions?

    • @BeardedDevData
      @BeardedDevData  Рік тому +2

      Hi, can you explain what you mean by normal method? If you are referring to comparing column by column then that would be considered more traditional, however as I explained in the video if you have a lot of columns that will be very tedious and even more tricky. I'm not trying to show off but help people understand how things can be done simpler so that they are easier to manage 😀.

    • @allmazd
      @allmazd Рік тому +1

      @@BeardedDevData I meant not you showing off I mean a particular user who would do some comparison. All in all it is always good to have many tools to accomplish same operations. And you normal use I mean comparison each columns with each over, it terms of code it seems doesn't make it longer or more tedious.

    • @BeardedDevData
      @BeardedDevData  Рік тому

      I think you are referring to the times when we need to compare a small amount of columns, when you need to compare, for example 10+ columns, it can be become very tedious to write out a massive WHERE clause. Remember, we also probably want to do this comparison multiple times too, we will want to detect what doesn't exist in the target table, what's changed and what exists in target but not in the source table, instead I prefer to keep things simple, there also other advantages, we can optimise the process better by just comparing one column, it's also easier to visually see change when looking through the data, we can store the history of the hash value and see change, this is a massive advantage when it comes to debugging our SCD processes. I just don't see why you would want to write out a lot more code, have a process that performs slower and can cause a lot of headaches when there is a much simpler approach.