Thanks for the great video. Just wanted to let you know that the slide at 38:59 has the incorrect expansion for DAG, which is shown as Directed Acrylic Graph instead of Directed Acyclic Graph.
@ 29:50, when you say Azure, you mean the Azure PaaS? And by ExpressRoute, you mean Microsoft Peering? How does a VNet get created on the PaaS though? If it is a VNet does it not fall under Corp Network?
good video and explanation, one question , if on-premise Informatica ETL tool need to migrate to cloud platform . is there any equivalent cloud tool which can replace Informatica ? or can we use Informatica cloud integration on Cloud platform ( as PaaS ) service. How these traditional ETL tool like informatica will replace with cloud infrastructure . i believe , Databricks is for processing power . but i believe we cannot do ETL transformation in databricks . please suggest
We can use IICS and Databricks. Infa BDM will cater to relevant ETL rules and Databricks will be there in order to visualise data. Now having said that it's always better to avail native functionalities of any tool, so if there is no tech upscaling issue it's better to handle etl through pyspark in databricks itself. This is cost effective as well.
Hi, I'm coming from a SQl Datbase background and at the moment I am not seeing a benefit of using Azure Databricks? there is nothing stopping me using ADF to pick up a file from ADL gen2, put it into a staging table and then createding a stored proc to do the transformation and then inserting that into a destination table. Or am I missing something here?
Databricks is a way of doing ETL on Azure. ADF can do a lot, but it's much more limited in its scope. Doing complex transformations in Databricks tends to be easier. But if you're using ADF, there's nothing wrong with that.
@@Atmosera- thanks for your reply, do you have any material in regards to deltas? So for example checking a csv file on ADL gen2 for new rows entered within the last hour based on the modified date? Or would it be a case of loading the file into a SQL staging table and compare this to the destination table, find the new rows and then putting that data into another csv file on ADL gen2 folder. Which tool would be more efficiency and cost effective?
@@denwo1982 I would not use deltas. That is a costly comparison to do. The best thing to do is partition your data in ADL into separate files that are timestamped and load new records that way rather than use a single file.
I've never seen anyone explain Azure Databricks as well as you.
you have explained Spark & Azure Databricks very well. Thank you
Wow you mentioned the SSIS problems and ETL problems I totally relate to .. you have earned cred with me.
Great Intro to Databricks and Spark. Thank You.
Thanks for the great video. Just wanted to let you know that the slide at 38:59 has the incorrect expansion for DAG, which is shown as Directed Acrylic Graph instead of Directed Acyclic Graph.
Thanks for the presentation. It is very useful if one wants to learn Spark & Azure Databricks
@ 29:50, when you say Azure, you mean the Azure PaaS? And by ExpressRoute, you mean Microsoft Peering? How does a VNet get created on the PaaS though? If it is a VNet does it not fall under Corp Network?
Azure PaaS can connect to VNet using private endpoints. ExpressRoute enables on premises connectivity back into the VNet.
good video and explanation, one question , if on-premise Informatica ETL tool need to migrate to cloud platform . is there any equivalent cloud tool which can replace Informatica ? or can we use Informatica cloud integration on Cloud platform ( as PaaS ) service. How these traditional ETL tool like informatica will replace with cloud infrastructure . i believe , Databricks is for processing power . but i believe we cannot do ETL transformation in databricks . please suggest
You can use databricks for ETL.
Azure Data Factory and Databricks will do the trick
We can use IICS and Databricks. Infa BDM will cater to relevant ETL rules and Databricks will be there in order to visualise data. Now having said that it's always better to avail native functionalities of any tool, so if there is no tech upscaling issue it's better to handle etl through pyspark in databricks itself. This is cost effective as well.
Instructor has given clear demo.. does he have any courses on Udemy?
Explanation is very good and for us helped to understand the topic
Great content!! It really helped me a lot. Congrats!
Great Video. Made it very easy to explain
Great to hear!
Can someone suggest a good book useful to automate or testing ETL on AWS S3 using databricks?
Directed Acrylic Graphs are more durable: 39:08
Excellent video!
Hi, I'm coming from a SQl Datbase background and at the moment I am not seeing a benefit of using Azure Databricks? there is nothing stopping me using ADF to pick up a file from ADL gen2, put it into a staging table and then createding a stored proc to do the transformation and then inserting that into a destination table. Or am I missing something here?
Databricks is a way of doing ETL on Azure.
ADF can do a lot, but it's much more limited in its scope. Doing complex transformations in Databricks tends to be easier.
But if you're using ADF, there's nothing wrong with that.
@@Atmosera- thanks for your reply, do you have any material in regards to deltas? So for example checking a csv file on ADL gen2 for new rows entered within the last hour based on the modified date? Or would it be a case of loading the file into a SQL staging table and compare this to the destination table, find the new rows and then putting that data into another csv file on ADL gen2 folder. Which tool would be more efficiency and cost effective?
@@denwo1982 I would not use deltas. That is a costly comparison to do. The best thing to do is partition your data in ADL into separate files that are timestamped and load new records that way rather than use a single file.
Thank you so much for this video 😀
Great