Це відео не доступне.
Перепрошуємо.

How Databricks Leverages Auto Loader to Ingest Millions of Files an Hour

Поділитися
Вставка
  • Опубліковано 29 сер 2021
  • Continuously and incrementally ingesting data as it arrives in cloud storage has become a common workflow in our customers’ ETL pipelines. However, managing this workflow is rife with challenges, such as scalable and efficient file discovery, schema inference and evolution, and fault tolerance with exactly-once guarantees. Auto Loader is a new Structured Streaming source in Databricks as our all-in-one solution to tackle these challenges.
    In this talk, we’ll discuss how Auto Loader:
    Can discover files efficiently through file notifications or incremental file listing
    Can scale to handling billions of files as metadata and still provide exactly once processing guarantees
    Can infer the schema of data and detect schema drift over time
    Can evolve the schema of the data being processed
    Is used within Databricks to ingest millions of files that are being uploaded every hour efficiently
    Connect with us:
    Website: databricks.com
    Facebook: / databricksinc
    Twitter: / databricks
    LinkedIn: / databricks
    Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com...

КОМЕНТАРІ • 9

  • @CHERANGA123
    @CHERANGA123 3 роки тому +5

    Hi,
    Thanks for this!
    Can you please provide the link for the notebook please? Also if you could kindly provide what are the security permissions to setup on the service principal that would be great!

  • @jennylam1935712995
    @jennylam1935712995 2 роки тому +2

    Hi appreciate this video. I see that after using the addNewColumns and the stream failed with UnknownFieldException, you manually restarted the stream with an updated schema. So how can I create a loop to continue to restart the stream until the no more UnknownFieldException pop up?

  • @ndbweurt34485
    @ndbweurt34485 Рік тому +1

    Hi, have a question, i didn't understand how schema evolution is an issue in structured streaming...as we can set mergeschema to true.

  • @shanhuahuang3063
    @shanhuahuang3063 Рік тому

    Can you please provide the link for the notebook please? Also if you could kindly provide what are the security permissions to setup on the service principal that would be great!

  • @shanhuahuang3063
    @shanhuahuang3063 Рік тому

    what happen on change of column length? does that will cut off the column? thanks

  • @prar_shah
    @prar_shah Рік тому

    Can you please zoom in,ca nt see the notebook properly, great video

  • @aks541
    @aks541 Рік тому

    Nice presentation - if giving the link of the notebook is not feasible, pls share the code with comments. Thanks in advance!

  • @shubhamaggarwal3676
    @shubhamaggarwal3676 Рік тому

    hi
    can you please provide the notebook link as video is not that clear to understand the code easily.

  • @nithints-sp1uc
    @nithints-sp1uc Рік тому

    without working notebook invisible text this is not useful even if content is good