Databricks Asset Bundles: Advanced Examples

Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)

Will AI Replace Data Engineering? - Advancing Spark

Cat mode and a glass of water #family #humor #fun

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Developer Best Practices on Databricks: Git, Tests, and Automated Deployment

Dustin Vannoy

Переглядів 2 024

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 лют 2025

КОМЕНТАРІ • 4

@paul_devos 10 днів тому
This is helpful Dustin, thank you.
I have a question for you. This is kinda of related to your problem as I'm trying to figure out testing as well as the branching + deployment strategy (e.g. CI vs abbreviated Trunk based dev + PR + CI for automated testing) but I have perhaps 3 different types of "work" that are related but different possibly in the same repo. So setting up CI/CD pipelines and testing is downstream of the repo structure or well, if multiple repos.
So that said, I'm trying to decide repo structure and I'm a bit torn on if there should be multiple repositories or separate repositories.
I'm working as an embedded end-to-end ML/AI/Data engineer on a product team and we have 3 primary "Functions of work" as I like to call it:
1. Automated ETL Pipelines (e.g. Databricks workflows)
2. LLM Tools, Utilities, etc -- we use or plan to use a lot of LLMs
3. POCs for businesses using GenAI/ML/etc that may NEVER have an updated data set needed, so no needed DBX Workflow
I don't think these should all be in the same repo. The main question is should the #3, which likely also does not have Unit Tests, Integration Tests should live in the same repo as #1 (DBX Workflows), perhaps in a separate top level directory called "Sandbox" or if that should be it's own Github repository. And if/when any one of these POCs gets the go ahead for regular data ingestion and/or "PROD usage" perhaps it is then ported over to "Workflows" (either the directory, if it lives in the same repo, or the Workflows Repo, if it's a separate repo).
I feel this video applies very well to #1, perhaps to #2. And I should then keep #3 to itself.
I have another question about CI/CD as it pertains to doing both a Medallion Architecture + DEV/TEST/PROD environments. There seems to be a ton of redundancy if I am using a CI pipeline for testing of my code in DEV on a Pull Request to then merge the code to MAIN in the DEV environment. At this point, the code is "tested" but environment wise, still lives in DEV and has not yet been pushed up through UAT or PROD.
@felipeporto4396 Місяць тому
Excelent material
@perer232 Місяць тому
Really good stuff! Thank you so much for these posts! It is very inspiring and I have some work to do to reach this level.
We are very heavy on using SQL-code for transformations, using temporary views and cte:s. Is that a bad strategy in the sense that it makes it really hard to test?
So for example instead of having a CASE-statement you would instead use a UDF that is more easily testable?
How do you test big SQL-transformations?
@DustinVannoy Місяць тому ⁺¹
The downside of using SQL directly is it makes testing harder, but there are some different options. Certain ways of doing ETL with SQL like dbt or DLT have testing capabilities built in. Are you using SQL files that are then orchestrated via Databricks Workflows? If so you could follow similar ideas to what I show in the integration testing.
Using UDF can make some of your unit testing easier but can have performance implications, so isn't always a best practice. I'm hoping to share more about SQL based ETL options this year but please let me know more about your design so it can influence some of the examples I build on this topic.

Наступне

Автоматичне відтворення

Databricks Asset Bundles: Advanced Examples

Databricks Asset Bundles: Advanced Examples

Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)

Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)

Will AI Replace Data Engineering? - Advancing Spark

Will AI Replace Data Engineering? - Advancing Spark

Cat mode and a glass of water #family #humor #fun

Cat mode and a glass of water #family #humor #fun

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Cute Baby Ties Up Dad And Wants To Play With His Phone #funny #fatherhoodlove#cute#fatherhoodmoments

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Learn to Efficiently Test ETL Pipelines

Learn to Efficiently Test ETL Pipelines

Databricks CI/CD: Azure DevOps Pipeline + DABs

Databricks CI/CD: Azure DevOps Pipeline + DABs

GitLab’s First Critical SSRF since 2020

GitLab’s First Critical SSRF since 2020

What Makes A Great Developer

What Makes A Great Developer

Best Practices for Unit Testing PySpark

Best Practices for Unit Testing PySpark

Microservices are Technical Debt

Microservices are Technical Debt

7 Design Patterns EVERY Developer Should Know

7 Design Patterns EVERY Developer Should Know

7 Best Practices for Development and CICD on Databricks

7 Best Practices for Development and CICD on Databricks

The Best Way to Run Integration Tests in Your CI/CD Pipeline

The Best Way to Run Integration Tests in Your CI/CD Pipeline

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

How Strong Is Tape?

How Strong Is Tape?

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

Прочистка шлюзов

Прочистка шлюзов

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм