Thank you Bryan for sharing! I have always wondered how to feed Databricks tables to PowerBI, glad I get to see this video before digging into the JDBC approach
Hey, as easy as it is, I do believe that there is superior method: sourcing data from storage, so in case you keep data in Azure Databricks, you want to make tables external, meaning that underlaying data will reside on e.g. blob storage. Why? 1) Cost - Databricks are expensive and for data load you use here interactive cluster, which is not cheap at all. You cannot use any job cluster to decrease the cost. On top of that, it you happen to refresh, cluster will start up automatically, but it will not switch off by all means. It will wait up to the time it is terminated by autotermination (due to inactivity). Make sure to have separate cluster for it and set autotermination to minimal possible value, which is 10 minutes. You do not want to keep it up for couple of hours for data refresh each day. 2) time for data refresh - it starts the cluster, so before you even start loading data to PBI cluster has to up and running, so you have ~5mins lag, before it is on, before you do anything. In many cases data will be already loaded to PBI from storage. For storage there is also a question, how to pick very last version of delta table, in case you use delta format. PBI does not have native (best to my knowledge) connector to delta, it has for parquets, which in short means, that it will load ALL legacy versions of databricks table available thanks to time travel feature. In other words your data will have duplicates. I sorted it out thanks to the function below: github.com/delta-io/connectors/blob/master/powerbi/fn_ReadDeltaTable.pq @Bryan - great work showing how it works, don't get me wrong. I just believe that loading from storage has more sense in the long run.
Hello, thanks for the video. Using this connector, do we need to have a gateway to refresh the Power BI dataset automatically or it's not needed? Best,
Not sure but if Databricks Serverless is an option, I would think you'd get better response. Fabric over Delta should also give good performance if Fabric is something your company wants to try. Fabric is not for all use cases but particularly good for Power BI.
Thank you for sharing! Is it possible to access files in dbfs like this, as delta lake gold table is stored in the dfbs only ? or should I store gold table into a database ?
Thanks Bryan, love your content. What would you say is the difference / advantage of using Delta Sharing rather than Partner Connect? In the specific case of Databricks to PowerBI? Thanks!
Thanks Alfonso. Sorry. Missed this earlier. Delta Sharing seems focused on data collaboration and is fairly generic. The Power BI connect is just a easy way to use Databricks as a source for Power BI reports.
Thank you Bryan for the valuable information. Suppose, my primary key is emp_id which is encrypted in Databricks. How can I decrypt the information in Power BI? Is there any possible way?
Thank you Bryan for sharing! I have always wondered how to feed Databricks tables to PowerBI, glad I get to see this video before digging into the JDBC approach
YW. Glad it was helpful!
You sir, are a scholar and a gentleman, and a super helper.
Brilliant, you saved me hours of work, Thanks for sharing this knowledge
Glad it helped!
Thank you for your video.How to refresh the Power BI Visualization charts when database tables values updated in data bricks side.? Thank you
YW. You need to reload your Power BI Tabular model.
Hey, as easy as it is, I do believe that there is superior method: sourcing data from storage, so in case you keep data in Azure Databricks, you want to make tables external, meaning that underlaying data will reside on e.g. blob storage. Why?
1) Cost - Databricks are expensive and for data load you use here interactive cluster, which is not cheap at all. You cannot use any job cluster to decrease the cost. On top of that, it you happen to refresh, cluster will start up automatically, but it will not switch off by all means. It will wait up to the time it is terminated by autotermination (due to inactivity). Make sure to have separate cluster for it and set autotermination to minimal possible value, which is 10 minutes. You do not want to keep it up for couple of hours for data refresh each day.
2) time for data refresh - it starts the cluster, so before you even start loading data to PBI cluster has to up and running, so you have ~5mins lag, before it is on, before you do anything. In many cases data will be already loaded to PBI from storage.
For storage there is also a question, how to pick very last version of delta table, in case you use delta format. PBI does not have native (best to my knowledge) connector to delta, it has for parquets, which in short means, that it will load ALL legacy versions of databricks table available thanks to time travel feature. In other words your data will have duplicates. I sorted it out thanks to the function below:
github.com/delta-io/connectors/blob/master/powerbi/fn_ReadDeltaTable.pq
@Bryan - great work showing how it works, don't get me wrong. I just believe that loading from storage has more sense in the long run.
Hello, thanks for the video.
Using this connector, do we need to have a gateway to refresh the Power BI dataset automatically or it's not needed?
Best,
Quick question, connecting power bi from databricks is better or from serverless is better?
Not sure but if Databricks Serverless is an option, I would think you'd get better response. Fabric over Delta should also give good performance if Fabric is something your company wants to try. Fabric is not for all use cases but particularly good for Power BI.
Thank you for sharing! Is it possible to access files in dbfs like this, as delta lake gold table is stored in the dfbs only ? or should I store gold table into a database ?
Thanks Bryan, love your content. What would you say is the difference / advantage of using Delta Sharing rather than Partner Connect? In the specific case of Databricks to PowerBI? Thanks!
Thanks Alfonso. Sorry. Missed this earlier. Delta Sharing seems focused on data collaboration and is fairly generic. The Power BI connect is just a easy way to use Databricks as a source for Power BI reports.
Thank you Bryan for the valuable information. Suppose, my primary key is emp_id which is encrypted in Databricks. How can I decrypt the information in Power BI? Is there any possible way?
Before I answer, why would you want to exposed PII data like that in Power BI?
Hello Sir, Please help me, I want to import data from apache druid database but not found any options to connectivity. Please suggest me.
Take a look at this blog about half way down the page it talks about the Druid APIs. druid.apache.org/docs/latest/querying/sql.html