How to use Common Table Expressions vs Subqueries vs Views vs Temp Tables like a data engineer!

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • The data for today's video is available here: github.com/EcZ...
    Common Table Expression and subquery is a very common debate! The temporary table crowd is pretty loud too! Don't forget the materialized view crowd !
    Check out dataexpert.io for my DataExpert community!

КОМЕНТАРІ • 67

  • @anandahs6078
    @anandahs6078 6 місяців тому +28

    fist time learning that materialized can be run concurrently with unique index. thanks for great content.

    • @EcZachly_
      @EcZachly_  6 місяців тому +4

      Thanks! Anything else you’d want to learn from me?

    • @pavelk.1558
      @pavelk.1558 6 місяців тому +1

      Yep, it's really helpful function.

    • @ssjdonkey4489
      @ssjdonkey4489 5 місяців тому +1

      Great video! How do you go about learning complex data engineer concepts? What's your learning algorithm? Thanks

    • @chrishabgood8900
      @chrishabgood8900 5 місяців тому

      If you want them to be current then yep

  • @stanislawcronberg3271
    @stanislawcronberg3271 5 місяців тому +10

    When a DE at my work showed me how to rewrite some of my queries with indexed temp tables to join all the model features into a final table I was so happy

  • @Kyriaeus
    @Kyriaeus 4 місяці тому +5

    The CROSS APPLY or LATERAL JOIN clause is also nice if you’re doing multiple steps of transformations on the same relation and don’t want to break out a separate CTE for each layer.

  • @karlnorberg7768
    @karlnorberg7768 5 місяців тому +4

    Huge thanks for this informative video! Picking up some handy stuff even after +10 years of data engineering or whatever we call it now 🙂

  • @peekknuf
    @peekknuf 6 місяців тому +8

    Man you're on fire, producing content at the pace I can't consume it :D
    Nice job!

  • @apibarra
    @apibarra 6 місяців тому +2

    Love the video! I am a huge proponent of CTEs. Nothing worst when you open a table’s source code and see on massive block of join to multiple tables and sub queries all with their own unique filtering logic at the bottom of the joins or nested in the sub query. I feel like I can understand the logic and read the code much quicker with the CTEs at the top.

  • @chandanjha3205
    @chandanjha3205 4 місяці тому +2

    Important thing to note here is that at 6:30, if you are creating a permanent view in your DB, you need to do it only once as its a DDL statement adding a new object to your schema. It does not have to be repeated everytime else your code will start throwing error.
    Modern DBs may let you create temporary views whose scope is only during the execution and is not global. You may choose to use that.

  • @ingenieroriquelmecagardomo4067
    @ingenieroriquelmecagardomo4067 6 місяців тому +4

    Awesome video! Would love to see you cover Change Data Capture - with all your breadth and experience, i'm sure it would result in an amazing video.

    • @EcZachly_
      @EcZachly_  6 місяців тому

      Thanks for the suggestion!

  • @user-ud2zd7nx4c
    @user-ud2zd7nx4c 6 місяців тому +3

    Zach…sir…this video is effin awesome!!! Like the speed of explanation and the examples!!! 👊

  • @SQLBucketHatGirl
    @SQLBucketHatGirl 5 місяців тому +1

    If only this was posted earlier id probably pass my technical interview on materialized transformation. 😢 thank you for the great explanation and use cases.

  • @AntMak-p3c
    @AntMak-p3c 2 місяці тому +1

    Great video! What could have been briefly included, in my opinion, is the comparison between materialized views vs (normal) tables, as they seem pretty similar. When to use which one.
    Keep rocking the great work👍

    • @Ps5Statie
      @Ps5Statie 20 днів тому

      Mv is updated simply by refresh and you can keep the log in mv_stats. Tables can be updated theough complex etl

  • @subhasishsarkar5106
    @subhasishsarkar5106 6 місяців тому +4

    Great work, thank you so much Zach! Very informative and concise. Keep these coming!

    • @EcZachly_
      @EcZachly_  6 місяців тому

      I appreciate your support! Any topics in particular you'd want me to cover?

    • @subhasishsarkar5106
      @subhasishsarkar5106 6 місяців тому +1

      @@EcZachly_ I'd be interested to know your thoughts on the different architecture patterns like the lakehouse, vault, mesh, etc. Looking forward to more from you! :D

  • @jhonnafg
    @jhonnafg 6 місяців тому +1

    this is so helpful Zach, we get table apis and do a bunch of dataset out of it to build up our sass applications. This sheds light since my current role as a DE in this company is really heavy on SQL but not that part of the etl, they kind of separated that and is more on the backend de. Im trying to bridge that since this is also my first role as de

  • @sagarjoshi7097
    @sagarjoshi7097 6 місяців тому +3

    Hey Zach, thank for such informative videos.
    Next can you please explain something about dbt usecases. Points like how it tests sql, orchestration, pipline. Other alternativer if any.

  • @DookyButter
    @DookyButter Місяць тому

    I’m a BI developer who occasionally has to dabble in some heavy SQL scripting and your content is hitting some nails on the head that I’ve always been curious about. Humorously, I arrived at the same idea that CTEs are (most often) the best use case scenario. We have another developer that uses subqueries heavily and the performance is really bad.
    Great content. You def earned a sub!

  • @MikeHaynes-xd8mw
    @MikeHaynes-xd8mw 14 днів тому

    Please make more videos like this. Can you just do a series on query optimization

  • @dima13693
    @dima13693 2 місяці тому +2

    Delivery tip: if something is awesome, you don't have to say so. It will be apparent. Having said that, seeing the 1 to 1 translation between different concepts WAS awesome.

  • @diaconescutiberiu7535
    @diaconescutiberiu7535 6 місяців тому +1

    Great content, as ever! I'd like to see a video on Data Vault (with hashes) or at least if you can recommend some resources for it... a practical example would be best. Keep up the good work

  • @dima13693
    @dima13693 2 місяці тому +2

    9:29 can you explain why this is slow in more detail? I assumed that everything would run once and get reused in-memory. What exactly re-ran and why?

  • @md.jannatultasnim2661
    @md.jannatultasnim2661 6 місяців тому +2

    i used CTEs , views , materialized views, temp tables but never found any detail in depth explanations how they works internally and performance comparison. Now everything is clear to me Thank you for the contents you shared. Another thing, as you described here 15:14 does this happens only in the relational DBMS ? or in both relational and data warehouses like snowflake BigQuery ?

    • @EcZachly_
      @EcZachly_  6 місяців тому +1

      Great question! I haven’t tested that exact case. My guess is it’s still blocking based on how refreshes work but that might be something that varies with data warehouses

  • @madhanv160
    @madhanv160 5 місяців тому +1

    This is new to me. Great comparison.

  • @pasqualesalomone8902
    @pasqualesalomone8902 6 місяців тому +3

    I was reviewing some SQL code today, and I came across an example of nested SQL query aka an infinite series of subqueries, and I was asking myself: WHY?

  • @tsveti_p
    @tsveti_p 5 місяців тому +1

    Great explanation Zach! This is one of the best comparisons I've heard between all these - with all their pros and cons! 🌟🌟🌟
    On this topic, I would be really thankful for an advice about performance between temporary tables and using staging tables, which are truncated and reloaded on every execution of the ETL.
    My use case is a complex transformation in which we have multiple stages of preparing the dataset moving it from one temporary table to another after applying some logic, joining a table or deleting some rows. It all happens within one script/session, so we don't need to keep the intermediary results, only the final dataset. We use Redshift and the performance is definitely not great.
    Do you think we can get any performance gains if we turn these temp tables into actual staging tables and we truncate and reload them? Also, do you know if defining DISTRIBUTION/SORT key on a temporary table in Redshift is as efficient as dist/sort key on an actual table?

    • @sf-spark129
      @sf-spark129 4 місяці тому

      Personally I use staging tables over temp tables for debugging purposes. When your end/user-facing table is returning incorrect or duplicate data, you can always trace back to an upstream staging table layer by layer to identify the root cause of the issue. Also, it's a lot easier to fine-tune the performance with staging table because you can test and validate different optimization techniques on them.

  • @muhammadraza3290
    @muhammadraza3290 6 місяців тому +1

    Hey Zach, been following you almost religiously now, cheers for the clear conceptual content. What I am not clear on is why the computation for the CTE 9:55 teammates and enemies took that long. Shouldn't the creation of a CTE and the creation of a materialized table take the same amount of time? Does it have anything to do with the complex JOIN condition?

    • @EcZachly_
      @EcZachly_  6 місяців тому

      No because the CTE also had an aggregation step afterward

    • @muhammadraza3290
      @muhammadraza3290 6 місяців тому

      @@EcZachly_ so temp table + aggregation was far lower time than CTE with aggregation. would it always be the case though?

    • @EcZachly_
      @EcZachly_  6 місяців тому

      @@muhammadraza3290 Temp Tables can be useful for improving performance though. It's very difficult to say something will always be the case

  • @byte_easel
    @byte_easel 5 місяців тому +1

    This is really great. Can you make videos about SQL aggregate functions, triggers, and transactions? especially aggregate functions and transactions if you can

    • @EcZachly_
      @EcZachly_  5 місяців тому

      Thanks for the suggestion!

  • @hanpanBR
    @hanpanBR 3 місяці тому +1

    The spark "registerAsTempTable" goes into what type of those?
    It works like a temporary table, saving the data just for that session?
    I guess I've been using it wrong =p

  • @strawhatSEO
    @strawhatSEO 6 місяців тому +1

    Great video thank you Zach! Where do stored procedures come in with this conversation? Are sprocs materialized or non-materialized? Or are sprocs used for a purpose besides storing transformations?

    • @EcZachly_
      @EcZachly_  6 місяців тому

      They are mostly non-materialized

  • @shiroyasha7178
    @shiroyasha7178 2 місяці тому +1

    So good. Thanks!

  • @sanooosai
    @sanooosai 26 днів тому

    thank you sir, you rock !!!

  • @mikefenn00
    @mikefenn00 5 місяців тому +4

    Subscribed on TikTok when I came across your videos a couple of months ago. Now UA-cam has recommended you as well. Seems like a good sign for you!

  • @arulkumarankumaraswamipill2516
    @arulkumarankumaraswamipill2516 6 місяців тому

    Timeseries databases like TimescaleDB has continuous aggregates on materialiazed views , which is incremental.

  • @tangy_ra
    @tangy_ra 6 місяців тому +1

    I have a question Zach. In snowflake, you cannot use join in materialized view. So, does materialized views vary from software to software? Also, there are many restrictions on using materialized view on snowflake.

  • @alialfaily7829
    @alialfaily7829 6 місяців тому +1

    thanks for great content.

    • @EcZachly_
      @EcZachly_  6 місяців тому

      Glad you enjoy it!

  • @FranLegon
    @FranLegon 3 місяці тому +1

    I don't really get the benefits of using a materialized view instead of a regular table other that it's a little easier to update (with regular table you would need tp update using keys and check for changed rows, probably using hash)

    • @Ps5Statie
      @Ps5Statie 20 днів тому

      If the scripy is not complex, i prefer mv

  • @kevinwang5225
    @kevinwang5225 Місяць тому

    Can u create a video on airflow best practices with sql / python?

  • @dn9416
    @dn9416 6 місяців тому +1

    Intresting

  • @NeumsFor9
    @NeumsFor9 6 місяців тому +1

    Materialized views are oftentimes like the fine print or the auction speak at the end of a commercial..... SO many limitations (depending on the vendor or edition or featureset).
    It's like dating the perfect 10...... as long as you pay the 5k dinner bill and have the person home by 8pm because that person has a data engineering conference to speak at in the morning.... 😂😂😂😂

  • @JPY_Data
    @JPY_Data 6 місяців тому +1

    hi Zach, great video. What would be the difference between creating a materialized view and creating a table of the results? Would it be the same thing performance wise?

    • @EcZachly_
      @EcZachly_  6 місяців тому

      If the materialized view has the same indices as the table, they’ll be the same.
      The main difference is you can automatically refresh the materialized view. But you can incrementally build a table

  • @fernandoangelrodriguezgonz1393
    @fernandoangelrodriguezgonz1393 6 місяців тому +2

    Good

  • @user-bh5ws7hx1t
    @user-bh5ws7hx1t 6 місяців тому +1

    we have a really long sql statement that uses a sequence of CTE. Each CTE is then used in the next CTE through a join condition. Do you think we would benefit from going to temp tables instead?

    • @EcZachly_
      @EcZachly_  6 місяців тому

      Probably. Try it out!

  • @johndt1013
    @johndt1013 6 місяців тому +1

    Hi Zach, thanks for this vid, it was such a great breakdown! I had 1 question if you have time:

    At 17:40 you mention how temp tables have incremental gains, I'm struggling to understand how that is, seeing as how similar to materialized views it appears.
    Is this because if you're adding new data 'A' to existing data 'B', temp tables can just focus on adding 'A', whereas materialized views require you to handle 'B' as well due to the refresh?
    This might be me not understanding the concept of incremental gains, thanks!

    • @EcZachly_
      @EcZachly_  6 місяців тому +1

      Your understanding is right!

  • @NeumsFor9
    @NeumsFor9 6 місяців тому +1

    So many collateral contexts to any of these...