@@CMJTe In my personal opinion, both integrate well. However, historically speaking, Azure Databricks does come out with newer Delta Lake versions faster than Synapse, or Azure Data Factory for that matter. So, if it is the newest features you are after, go with Azure Databricks. If this is not a priority, both are good.
Azure Data Factory is similar to SSIS and doesn't have a data store to persist the data, but Azure Databricks and Azure Synapse has a database engine to support the storage of data. Azure Data Factory is only an ETL/ELT tool. But for the other two there are ETL/ELT and database. In case this, Azure Data Factory shouldn't be compared to a database.
Its a seperate tool that's true and as many people use ETL with Data Factory they do have doubts about Should I use Azure Synapse / Azure Databricks for my ETL or I should continue using Azure Data Factory. Noting don't know code can leverage UI with little extra cost and who knows code can save little too.
Migrating to Databricks can offer you a bit more flexibility, but you would have to migrate all the pipelines to code. Alternatively, you could use both tools, and make your new flows in Databricks. Notebooks and packaged code in databricks can easily be kicked off by ADF, making it a cool orchistrator!
If SQL Server is the target, you can indeed just connect Power BI to SQL Server and do your aggregations/data loading with ADF, no problem! My point was more regarding to connecting ADF to Power BI. In synapse and Databricks you can create tables and use these definitions directly in Power BI by connecting these tools. ADF has no such thing.
lake house is just the architectural approach. as of my knowledge, every analytical, cloud based solution is built on top of some kind of cloud data storage (adls, blob storage, aws s3 etc.) and this is only a data storage layer
as per my understanding, databricks and synapse store data in azure blob storage, and give you a database/ datawarehouse like model on top of that, so that you can do easier analytics or other stuffs. Even some projects creates data integration and pipeline in ADF to trigger databricks jobs/notebooks and synapse can do analytics and use BI tools over delta lake in databricks.
Here's a few off the top of my head: 1. Databricks clusters are more flexible. You can choose the cheaper Compute Optimized VMs for append-only incremental processing, or Storage Optimized VMs to enable caching on the local SSDs, among other VM types. In Synapse, you can only use Memory Optimized and GPU Optimized VMs; 2. Databricks clusters allow you to use Spot VMs for the workers, which are significantly cheaper as well. Synapse does not support Spot VMs; 3. Databricks allows for better cluster sharing, as the same cluster can have multiple Spark sessions active at once. Synapse reserves slots for each Spark session, and those slots will sit idle when the developer is not running any code -- they can't be used by other developers while they are reserved; 4. The notebook file format in Databricks lends itself better to git diffs in Pull Requests, as they are regular code files (e.g., Python code) with some comments for special cells. Synapse notebooks, on the other hand, are saved as JSON files which are much harder to review in a git diff interface; 5. Databricks has exclusive features such as Auto Loader and identity columns, which are really helpful for data engineering and framework development; 6. Databricks is the flagship product of the company founded by the creators of Apache Spark, and as such it will always have an edge in supporting new Spark versions and features. Meanwhile, Synapse is a PaaS offering from Microsoft, and Microsoft is now clearly focusing a lot more in their SaaS offering: Microsoft Fabric. If I had to build a data platform on Azure today, I would use Databricks as my transformation engine. Hope this helps! :)
the amount of work the developing community has put into it. Think of it as beta vs stable. Databricks is way more stable, has been developed through iterations to catch bugs and implement fixes already. Synapse Analytics is newer comparatively and is going through that iterative process now, so in time its reliability will catch up to that of Databricks.
I would say maturity is the level of knowledge and skills an organization has to support these tools. You are not going to give a graphing calculator to a 6 yr old child. You are not going to give databricks to a company that has everything in spreadsheets.
This is really helpful for someone starting new, thank you!
Thank you so much for the presentation! It was very informative, it gave me a great picture of those tools!
This was definitely helpful for my DP-900 exam
It was an excellent session regarding all these tools. It helps you a lot to understand when to use what.
Lisa is amazing!
What a cool presentation.
Nice info and fun to watch 😊
Excellent! Thanks, Lisa Hoving.
Thank you for summarization, this is quite helpful
Lovely Session Thanks👌
Thanks a lot @lisa. I got a whole lot of clarity. Was always confused about which service to use and why.
Wow! Amazing presentation on Azure Data Factory, Azure Databricks, Azure Synapse Analytics. Love it :)
Amazing work 🎉❤
Thank you for the video. Excellent analysis and presentation!!! Can you please do a comparision video for Azure Fabric vs Azure Databricks.
Wonderful, thank you 🦋
Great presentation!
Thank you kindly!
very nicely explained. great job!
Thank you for the work Lisa !
What would be your recommendation is she was using Delta lake, does synpase intergrate well with delta lake for dataflows and data processing?
Delta Lake is totally an option!
@@LisaHoving yes I know delta lake is an option but am asking if synapse integrates well with Delta lake vs using databricks for Delta lake
@@CMJTe In my personal opinion, both integrate well. However, historically speaking, Azure Databricks does come out with newer Delta Lake versions faster than Synapse, or Azure Data Factory for that matter. So, if it is the newest features you are after, go with Azure Databricks. If this is not a priority, both are good.
Azure Data Factory is similar to SSIS and doesn't have a data store to persist the data, but Azure Databricks and Azure Synapse has a database engine to support the storage of data.
Azure Data Factory is only an ETL/ELT tool. But for the other two there are ETL/ELT and database.
In case this, Azure Data Factory shouldn't be compared to a database.
Its a seperate tool that's true and as many people use ETL with Data Factory they do have doubts about Should I use Azure Synapse / Azure Databricks for my ETL or I should continue using Azure Data Factory. Noting don't know code can leverage UI with little extra cost and who knows code can save little too.
Migrating to Databricks can offer you a bit more flexibility, but you would have to migrate all the pipelines to code. Alternatively, you could use both tools, and make your new flows in Databricks. Notebooks and packaged code in databricks can easily be kicked off by ADF, making it a cool orchistrator!
Agreed that ADF seems like an odd comparison here but the Databricks vs Synapse comparison was really helpful
Why then the speaker is saying that there is no data storage (24:36) in all three:?
Agree
Eye opening
Why ADF can’t be used for Power BI if the target data model is SQL server?!
If SQL Server is the target, you can indeed just connect Power BI to SQL Server and do your aggregations/data loading with ADF, no problem! My point was more regarding to connecting ADF to Power BI. In synapse and Databricks you can create tables and use these definitions directly in Power BI by connecting these tools. ADF has no such thing.
Cheers Lisa! Thank you
Isn't Azure Synapse pipelines based on ADF? If so how come it's cheaper on Synapse to run data flows
Very clear. And hilarious when she misspoke sqlbit, and blamed her adhd🤣
Great presentation thank you.
Our pleasure!
I am bit confused, why cant we store data in Databricks. Databricks has Lake house to do so?
I guess it's because it's just Azure data lake storage under the hood? So technically the data isn't actually stored in Databricks
lake house is just the architectural approach. as of my knowledge, every analytical, cloud based solution is built on top of some kind of cloud data storage (adls, blob storage, aws s3 etc.) and this is only a data storage layer
as per my understanding, databricks and synapse store data in azure blob storage, and give you a database/ datawarehouse like model on top of that, so that you can do easier analytics or other stuffs. Even some projects creates data integration and pipeline in ADF to trigger databricks jobs/notebooks and synapse can do analytics and use BI tools over delta lake in databricks.
HAHAHAHA 20:12 man she’s so hilarious for keeping it real. ADHD here too
Here still just having on prem projects with SSIS LOL.
Hi, what is the minimum salary we can expect for azure data factory developer with 5 yrs of experience,, other experience 5 yrs
What about the Java u highlighted earlier? Or did i missed it 😂
Yes , you kinda missed it . She mentions go with Databricks if speciality is in Java, as Java lang is supported
ua-cam.com/video/_QtA_492l4k/v-deo.htmlsi=NAXqM24LibEQz4tI&t=1171
What can DataBricks do that Synapse cannot do better?
Here's a few off the top of my head:
1. Databricks clusters are more flexible. You can choose the cheaper Compute Optimized VMs for append-only incremental processing, or Storage Optimized VMs to enable caching on the local SSDs, among other VM types. In Synapse, you can only use Memory Optimized and GPU Optimized VMs;
2. Databricks clusters allow you to use Spot VMs for the workers, which are significantly cheaper as well. Synapse does not support Spot VMs;
3. Databricks allows for better cluster sharing, as the same cluster can have multiple Spark sessions active at once. Synapse reserves slots for each Spark session, and those slots will sit idle when the developer is not running any code -- they can't be used by other developers while they are reserved;
4. The notebook file format in Databricks lends itself better to git diffs in Pull Requests, as they are regular code files (e.g., Python code) with some comments for special cells. Synapse notebooks, on the other hand, are saved as JSON files which are much harder to review in a git diff interface;
5. Databricks has exclusive features such as Auto Loader and identity columns, which are really helpful for data engineering and framework development;
6. Databricks is the flagship product of the company founded by the creators of Apache Spark, and as such it will always have an edge in supporting new Spark versions and features. Meanwhile, Synapse is a PaaS offering from Microsoft, and Microsoft is now clearly focusing a lot more in their SaaS offering: Microsoft Fabric. If I had to build a data platform on Azure today, I would use Databricks as my transformation engine. Hope this helps! :)
I hope you finally found a job)
Your choise is Snowflake, actually. :)
WHAT DOES MORE MATURE EVEN MEAN???!!!!
Hi Tinashelyemaone5435, you can get in touch with the speakers directly through LinkedIn and X! They are normally more than happy to help.
the amount of work the developing community has put into it. Think of it as beta vs stable. Databricks is way more stable, has been developed through iterations to catch bugs and implement fixes already. Synapse Analytics is newer comparatively and is going through that iterative process now, so in time its reliability will catch up to that of Databricks.
I would say maturity is the level of knowledge and skills an organization has to support these tools. You are not going to give a graphing calculator to a 6 yr old child. You are not going to give databricks to a company that has everything in spreadsheets.
You Said absolutely Nothing!!!
Quite the contrary