Build Python Packages in a Databricks Asset Bundle

Поділитися
Вставка
  • Опубліковано 19 сер 2024
  • In this tutorial I will describe step by step on how to create from scratch a Python package, configure build and deploy in a Databricks Asset Bundle

КОМЕНТАРІ • 9

  • @abhishekprakash4793
    @abhishekprakash4793 Місяць тому +1

    Thanks I hope you bring some more such content involving pyspark and databricks

    • @pytalista
      @pytalista  Місяць тому

      ua-cam.com/video/lYYIFRaY8Tk/v-deo.htmlsi=RiYQqRffGBTVkYZR

  • @harshitgupta3706
    @harshitgupta3706 2 місяці тому +1

    Truly amazing!! This is what i was looking for.
    Please make a video on DAB deployment through Azure DevOps CI/CD pipeline through yaml file.
    Looking forward to it

  • @SergioPolimante
    @SergioPolimante Місяць тому +1

    Very good, straight to the point.
    You could make a video running pytest on databricks. Do you think that is a valid approach, since many tests depends on reading data from catalog?

    • @pytalista
      @pytalista  Місяць тому

      Hi I think is a valid approach. Test in data engineering is a bit different. I would separate the test into 2 categories. Unit Test which would be test of the functions that is used in your code. Then use pytest for that. Then tests of data that you can use Delta Live tables expectations or libraries like great expectations. This because in data engineer data is stateful.

    • @SergioPolimante
      @SergioPolimante Місяць тому +1

      @@pytalista Good point. So basicaly, you're suggesting that queries are basically tested by the data quality checks you apply in the result, and not testing the query itself using mock data and expected dataframes?

    • @pytalista
      @pytalista  Місяць тому +1

      @SergioPolimante on data quality you can test things like. Not nulls, unique, data ranges, valid keys etc. that is the part of your data is statefull. About testing the logic of your query with mock data is also a valid approach maybe it is a lower return on time investment than quality expectations checks. Then there is the unit test of your functions you may have in your pipeline in this case you would do classic unit tests.