Databricks Asset Bundles: Advanced Examples

Dustin Vannoy

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 6 січ 2025

КОМЕНТАРІ • 33

@NoahPitts713 6 місяців тому ⁺¹
Exciting stuff! Will definitely be trying to implement this in my future work!
@asuretril867 4 місяці тому
Thanks a lot Dustin... Really appreciate it :)
@pytalista 4 місяці тому
Thanks for the video. It helped me a lot in my YT channel.
@maeklund86 Місяць тому
Great video, learned a lot!
I do have a question; would it make sense to define a base environment for serverless notebooks and jobs, and in the bundle reference said default environment? Ideally it would be in one spot, so upgrading the package versions would be simple and easy to test. This way developers could be sure that any package they get used to, is available across the whole bundle.
@DustinVannoy 14 днів тому
The idea makes sense but the way environments interact with workflows is still different depending on what task type you use. Plus you can't use them with standard clusters at this point. So it depends on how much variety you have in your jobs which is why I don't really include that in my repo yet.
@gardnmi 6 місяців тому
Loving bundles so far. Only issue so far I've had is the databricks vscode extension seems to be modifying my bundles yml file behind the scenes. For example when I attach to a cluster in the extension it will override my job cluster to use that attached cluster when I deploy to the dev target in development mode.
@DustinVannoy 6 місяців тому
Which version of the extension are you on, 1.3.0?
@gardnmi 6 місяців тому
@@DustinVannoyYup, I did have it on a pre release which I thought was the issue but switched back to 1.3.0 and the "feature" persisted.
@houssemlahmar6409 3 місяці тому
Thanks Dustin for the video.
Is there a way where I can specify sub-set of resources (workflows, DLT pieplines) to run in specific env?
For example, I would like to deploy only Unit test job in DEV and not in PROD env.
@DustinVannoy 3 місяці тому
You would need to define the job in the targets section of only the targets you want it in. If it needs to go to more than one environment, use YAML anchor to avoid code deuplication. I would normally just let a testing job get deployed to prod without a schedule, but others can't allow that or prefer not to do it that way.
@GaneshKrishnamurthy-i9l Місяць тому
Is there a way to define Policies as a resource and deploy . I have some 15 to 20 policies which my jobs can use any of them. If there is a way to manage these policies to apply policy changes, it will be very convenient
@bartsimons6325 4 місяці тому
Great video Dustin! Especially on the advanced configuration of the databricks.yaml.
I'd like to hear your opinion on the /src in the root of the folder. If you're team/organisation is used to work with a mono repo it would be great to have all common packages in the root, however, if you're more of a polyrepo kinda team/organisation, building and hosting the packages remotely (i.e. Nexus or something) could be a better approach in my opinion. Or am I missing something?
How would you deal with a job where task 1 and task 2 have source code with conflicting dependencies?
@DataMyselfAI 4 місяці тому
Is there a way for python wheel tasks to combine the functionality we had without serverless to use:
libraries: - whl../dist/*.whl so that the wheel gets deployed automatically with using serverless?
As if I am trying to include environments for serverless I can't longer specify libraries for the wheel task (and therefore it is not deployed automatically) and I also need to hardcode my path for the wheel in the workspace.
Could not find an example for that so far.
All the best,
Thomas
@DustinVannoy 4 місяці тому
Are you trying to install the wheel in a notebook task, so you are required to install with %pip install?
If you include the artifact section it should build and upload the wheel regardless of usage in a taks. You can predict the path within the .bundle deploy if you aren't setting mode: development, but I've been uploading it to a specific workspace or volume location.
As environments for serverless evolve I may come back wtih more examples of how those should be used.
@deepakpatil5059 4 місяці тому
Great content!! I am trying to deploy the same job into different environments DEV/QA/PRD. I want to override parameters passed to the job from variable-group defined on the Azure DevOps portal. Can you please suggest how to proceed on this?
@DustinVannoy 4 місяці тому ⁺¹
The part that references variables group PrdVariables shows how you set different variables and values depending on target environment.
- stage: toProduction
variables:
- group: PrdVariables
condition: |
eq(variables['Build.SourceBranch'], 'refs/heads/main')
In the part where you deploy the bundle, you can pass in variable values. See the docs for how that can be set. docs.databricks.com/en/dev-tools/bundles/settings.html#set-a-variables-value
@ameliemedem1918 6 місяців тому
Thanks a lot, @DustinVannoy for this great presentation! I have a question: which is the better approach for project structuration: one bundle yml config file for all my sub-projects or each sub-project have its own Databricks and bundle yml file? Thanks again :)
@etiennerigaud7066 6 місяців тому
Great video ! Is there a way to overide variables defined in the databricks.yml in each of the job yml definition so that the variable has a different value for that job only ?
@DustinVannoy 4 місяці тому
If value is the same for a job across all targets you wouldn't use a variable. To override job values you would set those in the target section which I always include in databricks.yml.
@fb-gu2er 3 місяці тому
Any way to see a plan like you would with terraform?
@DustinVannoy 3 місяці тому
Not really, using databricks bundle validate is best way to see things. There are some options to view as debug but I haven't found something that works quite like Terraform plan. When you try to run destroy it does show what will be destroyed before you confirm.
@dreamsinfinite83 5 місяців тому
how do you change the Catalog Name specific to an environment?
@DustinVannoy 4 місяці тому
I would use a bundle variable and set it in the target overrides, then reference it anywhere you need it.
@fortheknowledge145 6 місяців тому
Can we integrate Azure pipelines + DAB for ci cd implementation?
@DustinVannoy 6 місяців тому ⁺²
Are you referring to Azure DevOps CI pipelines? You can do that and I am considering a video on that since it has been requested a few times.
@fortheknowledge145 6 місяців тому
@@DustinVannoy yes, thank you!
@felipeporto4396 5 місяців тому
@@DustinVannoy Please, can you do that? hahaha
@DustinVannoy 4 місяці тому ⁺¹
Video showing Azure DevOps Pipeline is published!
ua-cam.com/video/ZuQzIbRoFC4/v-deo.html
@praveenreddy177 3 місяці тому
How to remove [dev my_user_name]. Please suggest
@DustinVannoy 3 місяці тому ⁺¹
Change from mode: development to mode: production (or just remove that line). This will remove prefix and change default destination. However, for dev target I recommend you keep the prefix if multiple developers will be working in the same workspace. Production target is best deployed as a service principal from CICD pipeline (like Azure DevOps Pipeline) to avoid different people deploying the same bundle and having conflicts with resource owner and code version.
@praveenreddy177 3 місяці тому
@@DustinVannoy Thank you Vannoy!! Worked fine now !!
@9829912595 6 місяців тому
Once the code is deployed it gets uploaded in the shared folder can't we store that some where else like an artifact or storage account because there are chances that someone may deleted that bundle from shared folder. It is always like with databricks deployment before and after asset bundles.
@DustinVannoy 6 місяців тому
You can set permissions on the workspace folder and I recommend also having it all checked into version control such as GitHub in case you ever need to recover an older version.

Наступне

Автоматичне відтворення

Databricks CI/CD: Azure DevOps Pipeline + DABs