Delta Live Tables: Building Reliable ETL Pipelines with Azure Databricks

Поділитися
Вставка
  • Опубліковано 16 лип 2024
  • In this session, we will see how to use Delta Live Tables to build fast, reliable, scalable, and declarative ETL pipelines on Azure Databricks platform.
    Speaker: Mohit Batra SQLbits.com/speakers/Mohit_Batra
    SQLbits.com/Sessions/Delta_Liv...
    Tags: Azure,Spark,Data Lake,Big data analytics,Developing,Data Bricks,Big Data & Data Engineering,Data Loading Patterns & Techniques

КОМЕНТАРІ • 42

  • @germanareta7267
    @germanareta7267 Рік тому +3

    Great video, thanks.

  • @menezesnatalia
    @menezesnatalia 8 місяців тому +2

    Nice tutorial. Thanks for sharing. 👍

  • @samanthamccarthy9765
    @samanthamccarthy9765 5 місяців тому +1

    Awesome thanks so much . this is really useful for me as a Data Architect . much is expected from us with all the varying technology

  • @amadoumaliki
    @amadoumaliki 4 місяці тому +2

    As usual! Mahit wonderful!

  • @starmscloud
    @starmscloud Рік тому +1

    Learned a Lot from this . Thank You for this video !

    • @SQLBits
      @SQLBits  Рік тому

      Glad it was helpful!

  • @Rangapetluri
    @Rangapetluri Місяць тому

    Wonderful session. Sensible questions asked. Cool

  • @walter_ullon
    @walter_ullon 2 місяці тому

    Great stuff, thank you!

  • @artus198
    @artus198 8 місяців тому +4

    I sometimes feel, the good old ETL tools like SSIS , Informatica were easier to deal with ! 😄
    (I am a seasoned on premise SQL developer, transitioning into the Azure world slowly).

  • @ananyanayak7509
    @ananyanayak7509 Рік тому +2

    Well explained with so much clarity. Thanks 😊

    • @SQLBits
      @SQLBits  Рік тому

      Our pleasure 😊

    • @ADFTrainer
      @ADFTrainer 8 місяців тому +1

      @@SQLBits Can you provide code. Thanks in advance..

  • @MichaelEFerry
    @MichaelEFerry 9 місяців тому +2

    Great presentation.

    • @SQLBits
      @SQLBits  9 місяців тому

      Thanks for watching :)

  • @srinubathina7191
    @srinubathina7191 11 місяців тому +1

    Wow super stuff thank you sir

    • @SQLBits
      @SQLBits  11 місяців тому

      Glad you liked it!

  • @Databricks
    @Databricks 9 місяців тому +2

    Nice video🤩

  • @pankajjagdale2005
    @pankajjagdale2005 8 місяців тому +2

    crystal clear explanation thank you so much can you provide that notebook ?

  • @priyankpant2262
    @priyankpant2262 3 місяці тому +1

    Great video ! Can you share the github location of the files used ?

  • @trgalan6685
    @trgalan6685 Рік тому +1

    Great presentation. No example code. What's zero times zero?

  • @olegkazanskyi9752
    @olegkazanskyi9752 11 місяців тому

    Is there a video on how data is pulled from the original source, like a remote SQL/noSQL server, or some API?
    I wonder how data is getting to the data lake?
    I assume this first extraction should be a bronze layer.

  • @anantababa
    @anantababa 4 місяці тому +1

    Awesome training, can you please share the data file, i want to try it.

  • @user-cg6yw8ei6j
    @user-cg6yw8ei6j 6 місяців тому

    For complex rule based transformations how we can leverage it?

  • @ashwenkumar
    @ashwenkumar 5 місяців тому

    Does delta live tables in all the layers has filesystem linked to it as like in hive or Databricks ?

  • @prashanthmally5765
    @prashanthmally5765 2 місяці тому

    Thanks SQLBits. Question: Can we create a "View" on Gold Layer instead having "Live Table" ?

  • @user-cg6yw8ei6j
    @user-cg6yw8ei6j 6 місяців тому

    Is there any way to load new files sequentially if bunch of files arrived at a time?

  • @lostfrequency89
    @lostfrequency89 4 місяці тому

    Can we create dependency between two notebooks?

  • @guddu11000
    @guddu11000 4 місяці тому

    shoud have showed us how to trobleshoot or debug

  • @MohitSharma-vt8li
    @MohitSharma-vt8li Рік тому +2

    Can you please provide us the notebook DBC file or ipynb..
    By the way great session,
    Thanks

    • @SQLBits
      @SQLBits  Рік тому

      Hi Mohit, you can find all resources shared by the speaker here: events.sqlbits.com/2023/agenda
      You just need to find the session you're looking for and if they have supplied us with their notes etc, you will see it there once you click on it!

    • @MohitSharma-vt8li
      @MohitSharma-vt8li Рік тому

      @@SQLBits thanks so much

  • @ADFTrainer
    @ADFTrainer 8 місяців тому +1

    pls provide code links

  • @thinkbeyond18
    @thinkbeyond18 11 місяців тому

    I have a general doubt in autoloader . does autoloader required to run in a job or notebook triggering manually .Or no need to touch anything once we written the code as when as the file arrives it will run automatically and processed the files.

    • @Databricks
      @Databricks 9 місяців тому +1

      Trigger your notebook that contains your DLT + Auto Loader code with Databricks Workflows. You can trigger it using a schedule, a file arrival, or choose to run the job continuously. It doesn't matter how you trigger the job. Auto Loader will only process each file once.

  • @tratkotratkov126
    @tratkotratkov126 3 місяці тому

    Hm … Where in these pipelines you have specified that nature of the created/maintained entity - bronze, silver or gold other then the name of the object itself. Also where these LIVE tables are exactly stored - from your demonstration it appear they all live in the same schema / database while in real live the bronze, silver and gold entities have designated catalogs and schemas.

  • @TheDataArchitect
    @TheDataArchitect 7 місяців тому

    I don't get the usage of VIEWS between Bronze and Silver tables.

    • @TheDataArchitect
      @TheDataArchitect 7 місяців тому

      Anyone?

    • @SQLBits
      @SQLBits  6 місяців тому

      Hi Shzyincu, you can get in touch with the speakers who taught this video via LinkedIn and Twitter if you have any questions!

    • @richardslaughter4245
      @richardslaughter4245 3 місяці тому +1

      My understanding (as an "also figuring out data bricks" newb:
      * View: Because the difference between bronze and silver in this instance is very small (no granularity changes, no joins, no heavy calculations, just one validation constraint), it doesn't really make sense to make another copy of the table when the view would be just as performant in this case.
      * "Live" view: I think maybe this is required because the pipeline needs it to be a live view to properly calculate pipeline dependencies
      Hopefully that understanding is correct, or others will correct me :)
      My follow up question would be: As I think about that validation constraint, it really seems like in this case it seems functionally identical to just applying a filter on the view. Is that correct? If so, is the reason to use the validation constraint rather than a filter, mostly to keep code consistency between live tables and live views?

    • @anilkumarm2943
      @anilkumarm2943 5 днів тому

      You don't materialize as new tables evertime, We sometimes materialize it as views. So minor transformations like changing the type of the field etc.

  • @Ptelearn4free
    @Ptelearn4free 2 місяці тому

    Databricks have pathetic UI...