Understand Slowly Changing Dimensions

Поділитися
Вставка
  • Опубліковано 13 жов 2024

КОМЕНТАРІ • 58

  • @sheldonwayne1390
    @sheldonwayne1390 2 роки тому +3

    I'm from China and it's frustrating that you can't find any video doing a good job explaining knowledges about DWH, glad I found this on UA-cam. Cheers!

  • @cluebcke
    @cluebcke 3 роки тому +6

    This is really awesome, thank you so much. Just to add, one use case for Type 5 (which I had no idea was called Type 5 before watching this) is that if you have a lot of data in a record that doesn't change much, and only a few fields that change a lot, partitioning them into different dimensions can substantially reduce table bloat (as you're not creating many copies of unchanged data) and improve query performance. There are tradeoffs of course and you really have to evaluate (and test empirically) for your use case.

  • @italomacelloneable
    @italomacelloneable 3 роки тому +5

    Excellent content and very well explained. Thank you

  • @sandrafield9813
    @sandrafield9813 2 роки тому +1

    Thanks for this description! Its better than all the other ones I found. Its more clear with extra details without being convoluted and messy.

  • @maisamabbas6872
    @maisamabbas6872 7 місяців тому

    This is the best explaination for slowly changing dimensions that I've come across on UA-cam. You definitely deserve my like and subscribe. Thanks for such an amazing video. I hope you make a lot more videos on Data Engineering topics.

  • @houstonfirefox
    @houstonfirefox Рік тому

    Great explanation Bryan w/real-world examples and analogies. New sub!

  • @MuhammadMustafa-rd6vg
    @MuhammadMustafa-rd6vg 2 роки тому

    this is a good tutorial , watched dimensional modelling & SCD tutorial, you were pretty clear , explained in detail without making it boring !

  • @thoroth_
    @thoroth_ 3 місяці тому

    This is the best explanation of this concept that I’ve seen, thank you so much

  • @shivam6565
    @shivam6565 2 роки тому

    Thank you Bryan for this wonderful tutorial. Concept lucidly explained and I like your teaching style. Liked and subscribed.

  • @Pasdpawn
    @Pasdpawn Рік тому

    man I spent a fortune to get an IT degree and here i am , learning it from a youtube tutorial which is far better and easier to understand than my database lectures. thanks a lot sir for your effort.

  • @guillermopalaciosgaray
    @guillermopalaciosgaray 2 роки тому

    Thanks for your explanation! This is the best video I've found :)

  • @sauravmisra6061
    @sauravmisra6061 2 роки тому

    Thank you sir for explaining the code EDW concepts in both the videos - Dimensional modeling and SCD, so well and precisely. I also followed your Databricks and Spark and these are the best I have found so far on these topics. Thank you , sir and really appreciate your effort.

    • @BryanCafferky
      @BryanCafferky  2 роки тому

      Glad it is helpful. Thanks for your kind words.

  • @batmanbuddha
    @batmanbuddha 3 роки тому +1

    Awesome Explanation, even sharing to my friend too,

    • @BryanCafferky
      @BryanCafferky  3 роки тому +1

      Thanks. Yeah. The more that can be helped the better.

  • @I677000
    @I677000 3 роки тому +1

    Awesome, :) hop to get this question asked on my interview :) now I know how to answer it:)

  • @JimRohn-u8c
    @JimRohn-u8c 2 роки тому +2

    Thank you so much for this! I’d love to see more videos on this.
    Btw, how does this work for Snowflake Schemas? Are those schemas only used in OLTP Databases?

    • @BryanCafferky
      @BryanCafferky  2 роки тому +1

      Snowflake schemas are to be avoided if at all possible. I think the product Snowflake is causing confusion about this due to its name,

    • @SwapperTheFirst
      @SwapperTheFirst 2 роки тому +1

      @@BryanCafferky I guess these to be avoided only if (1) compute and storage are expensive (2) compute and storage are slow (3) you don't have MPP or vectorized calculations. In current environment, compute and storage cost much less than data engineer and data architect time, so data modeling approaches changed.
      What was true for Teradata or Netezza in 1999 is not true for Snowflake or Big Query in 2022.
      On the topic of SCD, you can find the video of Airflow original creator, where he asserts, that all these approaches with SCD are just thing of the past, and now we should create daily or weekly full snapshots of dimensional tables. I don't fully understand his approach (for example, Kimball Type 2 for me sounds very intuitive), but I don't fully grasp how to solve the SCD reporting problem, if you have (say) daily snapshots of all your dimensional tables.

    • @BryanCafferky
      @BryanCafferky  2 роки тому +1

      @@SwapperTheFirst SCD concepts I explained are based on Kimball. It's more about the functionality required rather than the technology used. Airflow has nothing to do with this subject so not show how the Airflow creator's comments apply. You should choose the best approach for your needs. Thanks

  • @zerovine8025
    @zerovine8025 Рік тому

    Thanks for the easy to understand explanation, man. (btw nice mustache)

  • @rickstratton1714
    @rickstratton1714 3 роки тому +1

    Good video, clear explanations and visual examples. Have to wonder what kind of cocktail parties you attend! :)

    • @BryanCafferky
      @BryanCafferky  3 роки тому

      Hmmm... Come to think of it, I was not invited back. :-) Thanks

  • @Jel.Awesh.M
    @Jel.Awesh.M 2 роки тому

    Thank you for sharing! A very good resource for SCDs!

  • @bestherk4522
    @bestherk4522 3 роки тому +1

    Good video, Clear Explanations. Thank you.

  • @marcosoliveira8731
    @marcosoliveira8731 4 роки тому

    Topics like this still so relevant.
    I hope you have more to share about DW Architecture.

  • @fatallny
    @fatallny Рік тому +1

    Awesome video, now everything is clear

  • @javedabdool1187
    @javedabdool1187 Рік тому +1

    great video boss. I have 2 questions though maybe a lil stupid :P if eveer i missed something:
    should we put the source key in the fact table?if not then why bring the source key then?
    do fact table need a fact id?

    • @BryanCafferky
      @BryanCafferky  Рік тому +1

      Thanks. The source dimension key like ProductKey should be stored in the dim table and the dim surrogate key (created by the BI ETL) is the Pk to teh Dim table and FK to the Fact table. You don't need a fact table key per se but SQL Tables perform much better with a primary key(pk) so an Identity column suffices. See my video on dim modeling ua-cam.com/video/lWPiSZf7-uQ/v-deo.html

    • @javedabdool1187
      @javedabdool1187 Рік тому

      @@BryanCafferky cools thnx!!

  • @helovesdata8483
    @helovesdata8483 2 роки тому

    you've helped me to understand data warehouses more

  • @a29miller
    @a29miller 2 роки тому

    Well done. I really appreciate the detail

  • @aaronvr_
    @aaronvr_ 4 роки тому +2

    Great video, very well explained! Is there an efficient way to actually identify/locate/spot such changes in large data sets? Slowly changing dimensions help when dealing with pre-identified changes, but as to how one could locate them in the first place..I'm stuck!

    • @BryanCafferky
      @BryanCafferky  4 роки тому +2

      Hi. SQL Server has a feature called change data capture. docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15

    • @BryanCafferky
      @BryanCafferky  4 роки тому +2

      Or you can use a merge statement that compares the data. Ideally, the source tables have a modified date/time that can be used to get only changes since the last ETL load. This is easier than using Change Data Capture, i.e. you just keep track of the last date/time you did a load and only pull data that comes after that.

  • @dataofbitcoin6685
    @dataofbitcoin6685 2 роки тому

    Amazing video, again! The best in biz.

  • @AC-hh2cb
    @AC-hh2cb 3 роки тому +1

    Great explanation, will help me at work :)

  • @marcosoliveira8731
    @marcosoliveira8731 4 роки тому +1

    THANK YOU! Real good explanation!
    btw: Nice R mug :)

  • @canyurt1
    @canyurt1 3 роки тому +1

    Hey Bryan - I think Type 4 resembles Data Vault initiatives. Would you not agree?

    • @BryanCafferky
      @BryanCafferky  3 роки тому

      I have not studied Data Vault modeling in depth but from the wiki on the subject, I would be careful equating concepts of the two methods as Data Vault modeling does not cleanse and optimize for querying per this link en.wikipedia.org/wiki/Data_vault_modeling

  • @usman_oz
    @usman_oz 3 роки тому

    Thanks for the explanation. I just wanted to check if the "Dim customer" table at around 13:20 playtime shows the correct Start & End dates. Shouldn't the "Start Date" be +1 day of the previous record's End Date ?

  • @anirudhvyas6069
    @anirudhvyas6069 11 місяців тому

    nicely presented thank you!

  • @gokukanishka
    @gokukanishka 2 роки тому +1

    Hey Bryan , can you please make videos on fast changing dimensions and how to handle them

    • @houstonfirefox
      @houstonfirefox Рік тому

      Apparently Gender went from a static dimension to a fast changing dimension 🤣

  • @elinadiary9357
    @elinadiary9357 4 місяці тому

    Thank you sir

  • @JimRohn-u8c
    @JimRohn-u8c 2 роки тому +2

    Also why would a “NULL” be better than “9999-12-31” etc. ?

    • @BryanCafferky
      @BryanCafferky  2 роки тому +2

      a NULL means unknown whereas '9999-12-31' is a value. If you filter where date > 2022-01-01, the later would return TRUE but usually a NULL compare would not return TRUE. Forcing vlaues is an option but technically inaccurate with repercussions. Bear in mind the 9999-12-31 would appear in groupings on reports, etc. A text value of 'NULL' is just a glitch of the extract from a database and not a true NULL.

  • @LAPOVETS
    @LAPOVETS 2 роки тому

    thank you!

  • @nameunknown007
    @nameunknown007 3 роки тому +2

    Writes a book about Microsoft and sells it on Amazon like a legend 👍

  • @marcjkeppler3590
    @marcjkeppler3590 Місяць тому

    How does this video have fewer than 1k likes? 😅