Azure Databricks is Easier Than You Think

Поділитися
Вставка
  • Опубліковано 29 жов 2024

КОМЕНТАРІ • 27

  • @mugilkarthikeyan7131
    @mugilkarthikeyan7131 2 роки тому +4

    I've never seen anyone explain Azure Databricks as well as you.

  • @vaibhavrana4953
    @vaibhavrana4953 3 роки тому +2

    you have explained Spark & Azure Databricks very well. Thank you

  • @pashersil
    @pashersil 2 роки тому +1

    Wow you mentioned the SSIS problems and ETL problems I totally relate to .. you have earned cred with me.

  • @harishjulapalli448
    @harishjulapalli448 4 роки тому +1

    Great Intro to Databricks and Spark. Thank You.

  • @goselvam
    @goselvam Рік тому

    Thanks for the great video. Just wanted to let you know that the slide at 38:59 has the incorrect expansion for DAG, which is shown as Directed Acrylic Graph instead of Directed Acyclic Graph.

  • @rmravilla
    @rmravilla 2 роки тому

    Thanks for the presentation. It is very useful if one wants to learn Spark & Azure Databricks

  • @gopinathrajee
    @gopinathrajee 2 роки тому

    @ 29:50, when you say Azure, you mean the Azure PaaS? And by ExpressRoute, you mean Microsoft Peering? How does a VNet get created on the PaaS though? If it is a VNet does it not fall under Corp Network?

    • @Atmosera-
      @Atmosera-  2 роки тому

      Azure PaaS can connect to VNet using private endpoints. ExpressRoute enables on premises connectivity back into the VNet.

  • @mangeshxjoshi
    @mangeshxjoshi 4 роки тому +2

    good video and explanation, one question , if on-premise Informatica ETL tool need to migrate to cloud platform . is there any equivalent cloud tool which can replace Informatica ? or can we use Informatica cloud integration on Cloud platform ( as PaaS ) service. How these traditional ETL tool like informatica will replace with cloud infrastructure . i believe , Databricks is for processing power . but i believe we cannot do ETL transformation in databricks . please suggest

    • @PhilipHoyos
      @PhilipHoyos 4 роки тому

      You can use databricks for ETL.

    • @anmoltrehan6060
      @anmoltrehan6060 2 роки тому

      Azure Data Factory and Databricks will do the trick

    • @shiladityachakraborty9826
      @shiladityachakraborty9826 2 роки тому

      We can use IICS and Databricks. Infa BDM will cater to relevant ETL rules and Databricks will be there in order to visualise data. Now having said that it's always better to avail native functionalities of any tool, so if there is no tech upscaling issue it's better to handle etl through pyspark in databricks itself. This is cost effective as well.

  • @Pravinamadoori
    @Pravinamadoori 3 роки тому

    Instructor has given clear demo.. does he have any courses on Udemy?

  • @chan7354
    @chan7354 3 роки тому

    Explanation is very good and for us helped to understand the topic

  • @josecarlossilva3670
    @josecarlossilva3670 2 роки тому +1

    Great content!! It really helped me a lot. Congrats!

  • @svapneel1486
    @svapneel1486 4 роки тому +1

    Great Video. Made it very easy to explain

  • @Pravinamadoori
    @Pravinamadoori 3 роки тому

    Can someone suggest a good book useful to automate or testing ETL on AWS S3 using databricks?

  • @Cur8or88
    @Cur8or88 3 роки тому +1

    Directed Acrylic Graphs are more durable: 39:08

  • @TheSQLPro
    @TheSQLPro 4 роки тому +1

    Excellent video!

  • @denwo1982
    @denwo1982 3 роки тому

    Hi, I'm coming from a SQl Datbase background and at the moment I am not seeing a benefit of using Azure Databricks? there is nothing stopping me using ADF to pick up a file from ADL gen2, put it into a staging table and then createding a stored proc to do the transformation and then inserting that into a destination table. Or am I missing something here?

    • @Atmosera-
      @Atmosera-  3 роки тому

      Databricks is a way of doing ETL on Azure.
      ADF can do a lot, but it's much more limited in its scope. Doing complex transformations in Databricks tends to be easier.
      But if you're using ADF, there's nothing wrong with that.

    • @denwo1982
      @denwo1982 3 роки тому

      @@Atmosera- thanks for your reply, do you have any material in regards to deltas? So for example checking a csv file on ADL gen2 for new rows entered within the last hour based on the modified date? Or would it be a case of loading the file into a SQL staging table and compare this to the destination table, find the new rows and then putting that data into another csv file on ADL gen2 folder. Which tool would be more efficiency and cost effective?

    • @Atmosera-
      @Atmosera-  3 роки тому

      ​@@denwo1982 I would not use deltas. That is a costly comparison to do. The best thing to do is partition your data in ADL into separate files that are timestamped and load new records that way rather than use a single file.

  • @Shradha_tech
    @Shradha_tech 2 роки тому

    Thank you so much for this video 😀

  • @amjds1341
    @amjds1341 3 роки тому

    Great