Should You Use Databricks Delta Live Tables?
Вставка
- Опубліковано 16 лип 2024
- Delta Live Tables (DLT) is Databricks new declarative framework to build reliable, maintainable, and testable data processing pipelines. It is a big deal and loaded with functionality. If you are overwhelmed developing and maintaining Databricks ETL pipelines, you need to consider Delta Live Tables. In this short video, I discuss critical things to consider to determine if DLT is right for your organization.
Support Me on Patreon Community and Watch this Video without Ads!
www.patreon.com/bePatron?u=63...
Link to slides:
github.com/bcafferky/shared/b... - Наука та технологія
Omg--finally someone who can talk about the WHY and the tradeoffs in using xyz feature in Databricks instead of just soapboxing about how cool it is and/or how to implement. So helpful, thank you!!
Please, do make a video on migrating workflows to Delta Live!! My colleagues and I are very interested! If it helps, I bought your book :)
It does help. Good to know I'm not the only one trying to migrate to DLT. Thanks!
Great video, thank you for what you do!
You're Welcome!
Quick question. If the source file schema keeps on changing and source files are received once in a month, would using DLT be efficient ?
DLT and non DLT Delta supports schema evolution. See learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/dlt
Hey @Bryan can you enlighten 💡💡💡💡 us what's the alternative to using Delta Live tables (DLT).
Is it KSQL or Snowflake streams ?
DLT is a good option. Other options, depending on your needs. Structured streaming with standard Databricks workflows can be done, perhaps Eventstreams using MS Fabric? I'm not convinced true real time streaming is really required in some cases so I would confirm something less expensive and simpler like frequent refreshes are not an option.
If you need a lot of customization in your ETL, DLT are not for you. If wanna treat your ETL as more traditional apps, DLT are not for you. For example, do you wanna configure a custom logger, call APIs and other services within your pipeline ? Not for you either. DLTs can be great when you have a very simple pipeline, when you transform using fairly simple stages. Anything non trivial, not for you in most cases
I am proficient in SQL , and come from an on-premise datawarehouse background... I am trying to get a grip about this whole databricks framework, but I get stuck a lot of times as to how it works ! It is frustrating.
Databricks is really a data platform, so a lot more than a framework. Yeah, there is a lot and it is confusing. Watch this intro video which is part of a series to get a summary of what Databricks is. ua-cam.com/video/YvTzvZh3yTE/v-deo.html
Someone told me that DLTs can only have one storage location per storage account. So if you're using ADLS and have bronze/silver/gold containers you won't be able to store DLTs in each of those containers. Can anyone confirm if that is true?
Not aware of such a limitation. Read these blogs for the right answer. docs.databricks.com/en/delta-live-tables/settings.html#cloud-storage-configuration
And docs.databricks.com/en/delta-live-tables/sql-ref.html#create-a-delta-live-tables-materialized-view-or-streaming-table
When you create a DLT table, there is a location option which seems to allow pointing to wherever you like.
Think it depends on what you mean by "storage location". DLT can support multiple storage locations per DLT-pipeline. However, if you enable Unity Catalog together with DLT, then the whole DLT-pipeline needs to have the same "target catalog", which in turn needs to be defined on a single location / storage path. You can off course split into different DLT-pipelines depending on the target catalog, but by doing so might obstruct the orchestration benefit that DLT gives.
It's expensive
So are data engineers. Usually people cost more than compute.
@@BryanCafferky😅😅
Can write a job instead