Hello Dustin, Thank you for posting this video. This was very helpful!!! Pardon my ignorance but I have a question about initializing the Databricks bundle. The first step when you initialize the databricks bundle through CLI, does it create the required files in the databricks workspace folder. Additionally do we push the files from the databricks workspace to our git feature branch so that we can clone it to your local so that we can make the change in the configurations and push it back to git for deployment.
Typically I am doing the bundle init and other bundle work locally and committing then pushing to version control. There are some ways to do this from workspace now but it's likely to get much easier in the future and I hope to share that out once publicly available.
how does this work within a team with multiple projects? How do I apply multiple projects in github actions? Am I creating a bundle folder for project? Or do I have a mono folder with everything databricks in it?
You can have different subfolders in your repo each with their own bundle yaml or you could have one at a root level and import different resource yaml files. It should only deploy the assets that have changed so I tend to suggest one bundle if everything can be deployed at the same time.
Great Video, ! What shoud be the best approach to switch between dev and prod inside the codes ? example: df_test.write.format('delta').mode('overwrite').saveAsTable("dev_catalog.schema.table") how can i parametrize this to automatically change to this: df_test.write.format('delta').mode('overwrite').saveAsTable("prod_catalog.schema.table")
Hi Dustin, i want to send a dataframe with streaming logs that im listening from an eventhub and send them to log analytics, but im no recieving any data on the log analytics workspace or azure monitor? which may be the problem? do i need to create a custom table before hand? DCR or MMA? I dont know why im not getting any data or what im doing wrong...
Is this still an issue? If so, is it related to using spark-monitoring library? I have a quick mention of how to troubleshoot that towards the end of this new video: ua-cam.com/video/CVzGWWSGWGg/v-deo.html
is it possible to add approvers in asset bundle based code promotion ? Say one does not want the same dev to promote to prod, as prod could be maintained by other teams; or if the dev has to do cod promotion, it should go through an approval process. Also is it possible to add code scanning using something like sonarcube ?
All that is done with your CICD tools that automate the deploy, not within Databricks Asset Bundle itself. So take a look at how to do that with Github Actions, Azure DevOps pipelines, or whatever you use to deploy.
Hey Dustin,
Really appreciate the video on DAB's , If possible can you please make a video on using DAB's for CICD with Azure Devops.
Thanks !
Done, just published vide on on Azure DevOps Pipeline with DABs.
ua-cam.com/video/ZuQzIbRoFC4/v-deo.html
Thank you for the session!
Thanks Dustin.
Hello Dustin, Thank you for posting this video. This was very helpful!!! Pardon my ignorance but I have a question about initializing the Databricks bundle. The first step when you initialize the databricks bundle through CLI, does it create the required files in the databricks workspace folder. Additionally do we push the files from the databricks workspace to our git feature branch so that we can clone it to your local so that we can make the change in the configurations and push it back to git for deployment.
Typically I am doing the bundle init and other bundle work locally and committing then pushing to version control. There are some ways to do this from workspace now but it's likely to get much easier in the future and I hope to share that out once publicly available.
how does this work within a team with multiple projects? How do I apply multiple projects in github actions? Am I creating a bundle folder for project? Or do I have a mono folder with everything databricks in it?
You can have different subfolders in your repo each with their own bundle yaml or you could have one at a root level and import different resource yaml files. It should only deploy the assets that have changed so I tend to suggest one bundle if everything can be deployed at the same time.
Great Video, !
What shoud be the best approach to switch between dev and prod inside the codes ?
example:
df_test.write.format('delta').mode('overwrite').saveAsTable("dev_catalog.schema.table")
how can i parametrize this to automatically change to this:
df_test.write.format('delta').mode('overwrite').saveAsTable("prod_catalog.schema.table")
environment = os.environ["ENV"]
Attach env on the cluster level
in the DAB
spark_env_vars:
ENV: ${var.ENV}
Hi Dustin, i want to send a dataframe with streaming logs that im listening from an eventhub and send them to log analytics, but im no recieving any data on the log analytics workspace or azure monitor? which may be the problem? do i need to create a custom table before hand? DCR or MMA? I dont know why im not getting any data or what im doing wrong...
Is this still an issue? If so, is it related to using spark-monitoring library? I have a quick mention of how to troubleshoot that towards the end of this new video: ua-cam.com/video/CVzGWWSGWGg/v-deo.html
is it possible to add approvers in asset bundle based code promotion ? Say one does not want the same dev to promote to prod, as prod could be maintained by other teams; or if the dev has to do cod promotion, it should go through an approval process. Also is it possible to add code scanning using something like sonarcube ?
All that is done with your CICD tools that automate the deploy, not within Databricks Asset Bundle itself. So take a look at how to do that with Github Actions, Azure DevOps pipelines, or whatever you use to deploy.