Thank you for sharing this comparison. For me, too, there is no case for Fabric any time soon. I like the idea of integrating everything in one place in a common user experience, but: The most severe problem is the unknown performance impact on other users on the same compute resources. Let's say the organization is running a P3 as production environment that is doing well serving the business users with dataset queries, dataset refreshes, paginated report subscriptions etc. Now the organization starts allowing citizen data analysts or data engineers using fabric. I tried a F2 capacity and it lags with even a single developer! With F4 you can start building small datasets more or less fluently. With F16 it's nice. Now image a whole team using the P3 environment. How long will it take until business users complain about bad performance? The minimum requirement for me would be to divide a capacity into compartments and assign resource limits resp. priorities to different features. Second problem is the unknown cost impact and how Fabric compares against an Azure solution. Unless there are prices and fair comparisons of realistic use cases publicly available, there is no way to invest into Fabric. Except you are a very large organization that already has binding custom contracts with Microsoft. Then, as you mentioned, there is only git integration for classic Power BI artifacts in Fabric. After we have been waiting for so long to get this in Power BI, Microsoft starts Fabric again with this void. But even for citizen data analysts that are willing to ignore version control it's not a nice low code tool. Pipelines have some improvements compared with ADF like automatic mapping to parquet compatible file names, but e.g. if you use the wizard to define how to load a datawarehouse, there is no way to go back into that wizard and change something later. You have to edit JSON code which is even more cumbersome than starting with SQL code rightaway. Then there are mising APIs and CmdLets. Other than A-capacities which you can scale and turn on and off programmatically, with F-capacities this is all manual clicking in the web UI. Can't wait to get this API! Many more features missing or unclear, like managed identities and key vault integration. The preferred use case I see currently is giving individual F-subscriptions to data analysts that run on demand and are not shared with others. For me personally it covers too much of its architecture under the hood without removing the specifics and limitations of the underlying technologies. For example, when defining a custom table in the datawarehouse section, you can provide a SQL query. This fails if it contains an ORDER BY clause. Then it's good to know anyway that SQL views don't apply ORDER BY clauses. But it would actually be better to know transparently what you are actually dealing with: Is it a view? Is it a query? I definitely still prefer to know the building blocks of my architecture exactly and have access to and control over them.
Nice comparison. Seems like Fabric has promise but a ways to go to offer feature parity. Databricks seems like a good bet, at least for now. Since they can work of of the same ADLS data source, you can pick and choose for specific workloads as Fabric matures. But if you want to be multi-cloud, Databricks seems like the best option.
Wish onelake was offered as a standalone product, it would be great as a centralized data store while allowing you to build your own data stack using whatever tooling you prefer. My problem with fabric is they are a MS exclusive stack to be honest. Their semantic layer from what is advertised at least can only connect to PowerBI not other BI tools. While other vendors semantic layers and connect to most of them. Do you know if the Fiber semantic layer works with other BI tools like Tableau, Looker etc?
I think fabric now has git integration. But they still don't have something like the unity catalog (for data/models in databricks, I think), and they don't have the easy built-in monitoring and alerts with drift metrics (like Wasserstein distance, other distance metrics, null/0 value percent drift, etc) AFAIK, and not sure when those are coming (maybe another 9 months to a year or more?). I don't think fabric has a feature store either yet, you have to use azure ML for that. With photon I think databricks might be faster than fabric, but might also be more costly.
Our organization has integrated Power BI as our primary BI tool for the frontend. We're transitioning to utilizing Databricks on Azure as our data platform, opting against Fabric. What are your insights regarding Databricks? Do you think Fabric is sufficiently mature for large-scale enterprises?
At a Workspace settings you can configure Git integration with Azure DevOps which provides version control. Maybe the video is old before that feature!
Great video! Would like to see more of databricks vs fabric contents. I think Version Control is now available in Fabric. So, what's your thoughts now?
Thank you Syed! To my knowledge, the version control is still limited to only a few types of items: Reports, Paginated Reports and Datasets. So versioning notebooks is unfortunately still out of the picture. Microsoft has an up-to-date overview here: learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration.
Thank you for sharing this comparison. For me, too, there is no case for Fabric any time soon. I like the idea of integrating everything in one place in a common user experience, but:
The most severe problem is the unknown performance impact on other users on the same compute resources. Let's say the organization is running a P3 as production environment that is doing well serving the business users with dataset queries, dataset refreshes, paginated report subscriptions etc. Now the organization starts allowing citizen data analysts or data engineers using fabric. I tried a F2 capacity and it lags with even a single developer! With F4 you can start building small datasets more or less fluently. With F16 it's nice. Now image a whole team using the P3 environment. How long will it take until business users complain about bad performance? The minimum requirement for me would be to divide a capacity into compartments and assign resource limits resp. priorities to different features.
Second problem is the unknown cost impact and how Fabric compares against an Azure solution. Unless there are prices and fair comparisons of realistic use cases publicly available, there is no way to invest into Fabric. Except you are a very large organization that already has binding custom contracts with Microsoft.
Then, as you mentioned, there is only git integration for classic Power BI artifacts in Fabric. After we have been waiting for so long to get this in Power BI, Microsoft starts Fabric again with this void.
But even for citizen data analysts that are willing to ignore version control it's not a nice low code tool. Pipelines have some improvements compared with ADF like automatic mapping to parquet compatible file names, but e.g. if you use the wizard to define how to load a datawarehouse, there is no way to go back into that wizard and change something later. You have to edit JSON code which is even more cumbersome than starting with SQL code rightaway.
Then there are mising APIs and CmdLets. Other than A-capacities which you can scale and turn on and off programmatically, with F-capacities this is all manual clicking in the web UI. Can't wait to get this API!
Many more features missing or unclear, like managed identities and key vault integration.
The preferred use case I see currently is giving individual F-subscriptions to data analysts that run on demand and are not shared with others. For me personally it covers too much of its architecture under the hood without removing the specifics and limitations of the underlying technologies. For example, when defining a custom table in the datawarehouse section, you can provide a SQL query. This fails if it contains an ORDER BY clause. Then it's good to know anyway that SQL views don't apply ORDER BY clauses. But it would actually be better to know transparently what you are actually dealing with: Is it a view? Is it a query? I definitely still prefer to know the building blocks of my architecture exactly and have access to and control over them.
Nice comparison. Seems like Fabric has promise but a ways to go to offer feature parity. Databricks seems like a good bet, at least for now. Since they can work of of the same ADLS data source, you can pick and choose for specific workloads as Fabric matures. But if you want to be multi-cloud, Databricks seems like the best option.
Buen trabajo bro! saludos desde Argentina :D
Gracias amigo!
Wish onelake was offered as a standalone product, it would be great as a centralized data store while allowing you to build your own data stack using whatever tooling you prefer.
My problem with fabric is they are a MS exclusive stack to be honest. Their semantic layer from what is advertised at least can only connect to PowerBI not other BI tools. While other vendors semantic layers and connect to most of them. Do you know if the Fiber semantic layer works with other BI tools like Tableau, Looker etc?
I think fabric now has git integration. But they still don't have something like the unity catalog (for data/models in databricks, I think), and they don't have the easy built-in monitoring and alerts with drift metrics (like Wasserstein distance, other distance metrics, null/0 value percent drift, etc) AFAIK, and not sure when those are coming (maybe another 9 months to a year or more?). I don't think fabric has a feature store either yet, you have to use azure ML for that. With photon I think databricks might be faster than fabric, but might also be more costly.
Our organization has integrated Power BI as our primary BI tool for the frontend. We're transitioning to utilizing Databricks on Azure as our data platform, opting against Fabric. What are your insights regarding Databricks? Do you think Fabric is sufficiently mature for large-scale enterprises?
can you please mention few points that made you choose Databricks over Fabric?
I ama ctually still wondering... but missing version control is very very sad
At a Workspace settings you can configure Git integration with Azure DevOps which provides version control. Maybe the video is old before that feature!
@@casimircompaore2076 I think, it is still in preview, but thx a lot!
Great video! Would like to see more of databricks vs fabric contents. I think Version Control is now available in Fabric. So, what's your thoughts now?
Thank you Syed!
To my knowledge, the version control is still limited to only a few types of items: Reports, Paginated Reports and Datasets. So versioning notebooks is unfortunately still out of the picture.
Microsoft has an up-to-date overview here: learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration.
@@johannesjolkkonen now it's available for Fabric Notebooks too
Wow!