Should You Use Databricks Delta Live Tables?

Поділитися
Вставка
  • Опубліковано 16 лип 2024
  • Delta Live Tables (DLT) is Databricks new declarative framework to build reliable, maintainable, and testable data processing pipelines. It is a big deal and loaded with functionality. If you are overwhelmed developing and maintaining Databricks ETL pipelines, you need to consider Delta Live Tables. In this short video, I discuss critical things to consider to determine if DLT is right for your organization.
    Support Me on Patreon Community and Watch this Video without Ads!
    www.patreon.com/bePatron?u=63...
    Link to slides:
    github.com/bcafferky/shared/b...
  • Наука та технологія

КОМЕНТАРІ • 19

  • @colinsorensen7382
    @colinsorensen7382 5 місяців тому +1

    Omg--finally someone who can talk about the WHY and the tradeoffs in using xyz feature in Databricks instead of just soapboxing about how cool it is and/or how to implement. So helpful, thank you!!

  • @joshcaro1226
    @joshcaro1226 10 місяців тому +4

    Please, do make a video on migrating workflows to Delta Live!! My colleagues and I are very interested! If it helps, I bought your book :)

    • @BryanCafferky
      @BryanCafferky  10 місяців тому +1

      It does help. Good to know I'm not the only one trying to migrate to DLT. Thanks!

  • @andrewrodriguez8720
    @andrewrodriguez8720 4 місяці тому

    Great video, thank you for what you do!

  • @JustBigdata
    @JustBigdata 9 місяців тому +1

    Quick question. If the source file schema keeps on changing and source files are received once in a month, would using DLT be efficient ?

    • @BryanCafferky
      @BryanCafferky  9 місяців тому

      DLT and non DLT Delta supports schema evolution. See learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/dlt

  • @rahulsood81
    @rahulsood81 7 місяців тому +1

    Hey @Bryan can you enlighten 💡💡💡💡 us what's the alternative to using Delta Live tables (DLT).
    Is it KSQL or Snowflake streams ?

    • @BryanCafferky
      @BryanCafferky  7 місяців тому

      DLT is a good option. Other options, depending on your needs. Structured streaming with standard Databricks workflows can be done, perhaps Eventstreams using MS Fabric? I'm not convinced true real time streaming is really required in some cases so I would confirm something less expensive and simpler like frequent refreshes are not an option.

  • @fb-gu2er
    @fb-gu2er Місяць тому +1

    If you need a lot of customization in your ETL, DLT are not for you. If wanna treat your ETL as more traditional apps, DLT are not for you. For example, do you wanna configure a custom logger, call APIs and other services within your pipeline ? Not for you either. DLTs can be great when you have a very simple pipeline, when you transform using fairly simple stages. Anything non trivial, not for you in most cases

  • @artus198
    @artus198 8 місяців тому

    I am proficient in SQL , and come from an on-premise datawarehouse background... I am trying to get a grip about this whole databricks framework, but I get stuck a lot of times as to how it works ! It is frustrating.

    • @BryanCafferky
      @BryanCafferky  8 місяців тому +1

      Databricks is really a data platform, so a lot more than a framework. Yeah, there is a lot and it is confusing. Watch this intro video which is part of a series to get a summary of what Databricks is. ua-cam.com/video/YvTzvZh3yTE/v-deo.html

  • @JMo268
    @JMo268 7 місяців тому +1

    Someone told me that DLTs can only have one storage location per storage account. So if you're using ADLS and have bronze/silver/gold containers you won't be able to store DLTs in each of those containers. Can anyone confirm if that is true?

    • @BryanCafferky
      @BryanCafferky  6 місяців тому +1

      Not aware of such a limitation. Read these blogs for the right answer. docs.databricks.com/en/delta-live-tables/settings.html#cloud-storage-configuration
      And docs.databricks.com/en/delta-live-tables/sql-ref.html#create-a-delta-live-tables-materialized-view-or-streaming-table
      When you create a DLT table, there is a location option which seems to allow pointing to wherever you like.

    • @wcharliee
      @wcharliee 6 місяців тому

      Think it depends on what you mean by "storage location". DLT can support multiple storage locations per DLT-pipeline. However, if you enable Unity Catalog together with DLT, then the whole DLT-pipeline needs to have the same "target catalog", which in turn needs to be defined on a single location / storage path. You can off course split into different DLT-pipelines depending on the target catalog, but by doing so might obstruct the orchestration benefit that DLT gives.

  • @ngneerin
    @ngneerin 10 місяців тому +2

    It's expensive

    • @BryanCafferky
      @BryanCafferky  10 місяців тому +5

      So are data engineers. Usually people cost more than compute.

    • @awadelrahman
      @awadelrahman 20 днів тому

      @@BryanCafferky😅😅

  • @ngneerin
    @ngneerin 10 місяців тому

    Can write a job instead