dbt and Python-Better Together

Поділитися
Вставка
  • Опубліковано 25 сер 2024
  • Drew Banin is the co-founder of dbt Labs and one of the maintainers of dbt Core, the open source standard in data modeling and transformation. In this talk, he will demonstrate an approach to unifying SQL and Python workloads under a single dbt execution graph, illustrating the powerful, flexible nature of dbt running on Databricks.
    Connect with us:
    Website: databricks.com
    Facebook: / databricksinc
    Twitter: / databricks
    LinkedIn: / data. .
    Instagram: / databricksinc

КОМЕНТАРІ • 7

  • @guilhermegonzalez5501
    @guilhermegonzalez5501 9 місяців тому

    Amazing presentation!

  • @kebincui
    @kebincui 11 місяців тому

    Awesome presentation 👍

  • @vai5tac336
    @vai5tac336 Рік тому +1

    amazing!

  • @paulfunigga
    @paulfunigga Рік тому +5

    I still don't understand what dbt is.

    • @MrTeslaX-vi2qn
      @MrTeslaX-vi2qn Рік тому +1

      same here.. not sure what it is and what's its purpose?

    • @samjebaraj6895
      @samjebaraj6895 11 місяців тому +1

      @@MrTeslaX-vi2qn Looks like it's completely SQL based ETL framework for building data marts/warehouse.

    • @ravishmahajan9314
      @ravishmahajan9314 7 місяців тому +3

      So basically it the T in ETL.
      You can transform data using SQL in dbt. But your question may be , SO WHAT? I can transform using SQL server as well. 😂
      So the answer is dbt is your transformation friend. It has no database of Its own. It uses Its host database, for Eg. If you are transforming BigQuery warehouse, then the transformed data is in that warehouse only. But what dbt do is, it STRUCTURES THE TRANSFORMATION PROCESS.
      It means you can version control like git and create dependency models between different views & tables, merge them are create a kind of data pipeline. You will be able to visualize how a table has been transformed and what steps have been taken to get it transformed. Also it can be used to setup automatic pipeline. So you don't need to worry if more data gets into warehouse, it will incrementally refresh and apply same transformation steps. It automatically creates an acyclic graph .
      So think of how you use multiple CTEs (Common table expressions) in SQL to solve a very complex query. Each CTEs transforms, group by, from multiple tables and then joined to get final output.
      But the problem is of you are writing a query with 50 CTEs, you will eventually get confused. But DBT helps you with automatically creating a graph that shows how all CTEs works together to get output.
      Also from one click, you can create a documentation.
      You can create three instances (development > test>production) in dbt to create complex SQL pipelines.
      Hope it clears.
      PS : I am also learning. My be I am wrong somewhere in my understanding.