Data Warehouse vs Data Lake | Explained (non-technical)

Поділитися
Вставка
  • Опубліковано 13 жов 2024

КОМЕНТАРІ • 23

  • @KahanDataSolutions
    @KahanDataSolutions  2 роки тому +3

    Get my Modern Data Essentials training (for free) & start building more reliable data architectures
    www.ModernDataCommunity.com

  • @Day925
    @Day925 Рік тому +3

    Your channel is now my new bible when it comes to Data Engineering

  • @beepboop1237
    @beepboop1237 2 роки тому +1

    Great video! Perfect for anyone looking to understand some of the key first steps in setting up a solid data architecture.

  • @BlakeC341
    @BlakeC341 9 місяців тому

    Great. Straightforward and simple.

  • @kkindahouse7153
    @kkindahouse7153 2 роки тому +2

    Very easy to understand!
    Could you explain more about fact/dimensions and slowly changed pls?

    • @KahanDataSolutions
      @KahanDataSolutions  2 роки тому +2

      Thanks! This is a complex topic in itself, but here are the short answers to your question....
      Facts/Dimensions - These terms come from what's called "Dimensional Modeling" which is a strategy for creating tables in a data warehouse. They are still just database tables, but are described with these terms to help indicate their function in an overall strategy.
      Fact Tables - Typically represent an activity (ex. sales or comments) and include quantitative data (ex. price, quantity, etc.) along w/ foreign keys to dimensions.
      Dimensions - Provide qualitative context around fact tables, these are descriptions. (ex. color, type, name, etc.). The goal is to join facts to dimensions and create various types of views of the underlying business activity (fact) for reporting. When designed properly, this type of relationship makes slicing up data really straightforward.
      Slowly Changing Dimensions - Dimensions that have attributes that may change over time (ex. the location of an employee). This is called a slowly changing dimension. There are various strategies for handling the change (ex. overwrite it vs add a new row and attach time frames to them).
      Again, this topic could be an entire video in itself but ultimately it revolves around a strategy for organizing a data warehouse. I suggest looking into Kimball Data Modeling to learn more as well. Hope that helps!
      Here's a wiki link - en.wikipedia.org/wiki/Dimensional_modeling

    • @kkindahouse7153
      @kkindahouse7153 2 роки тому

      @@KahanDataSolutions love how you explain in your way, pretty clear for me as always.
      Hope you could share more data architecturing topics in a simple way :)

  • @davidk4682
    @davidk4682 2 роки тому +4

    You rock bro. Real clean and concise.

  • @Reinales
    @Reinales 2 роки тому +1

    I loved those animation parts, nice video! 😎😎

  • @InnovativeBeautifulWorld
    @InnovativeBeautifulWorld Місяць тому

    Thanks a lot.

  • @josephojo1313
    @josephojo1313 2 роки тому +3

    Thank you so much. love your content!

  • @nomenetasaili8598
    @nomenetasaili8598 Рік тому +1

    When or what situation you would need a data lake? Wouldnt tranforming the various data directly into the data wharehouse be more efficient?

  • @pooshpoosh9232
    @pooshpoosh9232 10 місяців тому

    So if I use a data lake and a data warehouse this means that I necessarily am using an ELT? Since I'm getting the data, loading it into the lake, then structuring it better on the warehouse

  • @AlexKashie
    @AlexKashie Рік тому

    Will be appreciated to do a more technical difference vidéo please.

  • @nlopedebarrios
    @nlopedebarrios Рік тому

    How do you know if you need a data lake? suppose all your data sources are dbs in the cloud, except maybe 2 or 3 files uploaded to an S3 bucket periodically. I don't see how data is sent to the operational dbs to cloud storage, instead of doing a traditional ETL to the data warehouse

  • @ashishlimaye2408
    @ashishlimaye2408 Рік тому

    Great videos!!

  • @SchylarBrock-fb4tt
    @SchylarBrock-fb4tt Рік тому

    Would your data lake and data warehouse be the same tool (example: Snowflake)?

    • @ivani3237
      @ivani3237 Рік тому

      data lake - is actually file storage, dwh - relational db storage