Looking for help with your team's data strategy? → www.kahandatasolutions.com Looking to improve your data engineering skillset?→ bit.ly/more-kds --- After spending more time working with modern data stacks, I wanted to add an updated note to this video. Some of these tips are focused on writing SQL in the context of traditional row-based databases such as SQL Server, MySQL, etc. as opposed to columnar platforms such as Snowflake. The main update is in regards to point #1: Filtering a data set -- This rule is not _necessarily_ always still the case with some of the modern databases, such as Snowflake as the query plans have become really efficient and a "select *" is not always a no-no. The query processors can handle an insane amount of data without much of a performance hit (if written responsibly). Research around topic this is covered really well in this article - discourse.getdbt.com/t/ctes-are-passthroughs-some-research/155. I still suggest the filtered approach when working with any of the row-based databases mentioned above, and the other tips still hold true regardless of data stack. I mainly just wanted to add this caveat to point #1 as my viewpoint has evolved a bit as I've learned more!
Something that would add value here for me would be showing examples. I'm a little newer to SQL, so actually looking at examples would be helpful! This video may have been aimed at more advanced users, so that makes sense, but this is just my input! Thanks for taking the time to make this and listen to our feedback!
Great suggestion and thanks for watching! I will make sure to add examples in future videos. You may find this other video helpful as it does have some example SQL - ua-cam.com/video/asCQj1ybnds/v-deo.html
Sounds like great tips but as a beginner / intermediate it would be great to see some actual before / after code examples. It's easy to get lost without visuals.
In my experience, properly used CTEs/Temp Tables can massively help a query and I recommend them. But the key here is "properly" using them. When used strategically, they help you stay more organized and can boost performance. When used blindly and without understanding what's happening you could end up creating an unnecessarily complex execution plan.
I am newer to SQL, but for #1, wouldn't the query optimizer first pare down your data set by any WHERE clause found, regardless of whether that clause appears at the top to filter rows into a temp table, or down a ways in a longer query? Or -- is the optimizer really that useless?
This is likely accurate when working w/ CTEs (common table expressions) vs temp tables in the more modern cloud databases (Snowflake, Redshift, etc). CTEs will function as a pass-through and the optimizer would likely function in the way you describe. This is a common strategy that you see in "best practices" on tools such as dbt that rely heavily on CTEs. But again, this is mainly based on modern analytical cloud databases vs traditional row-based ones (sql server, mysql, postgres, etc.). Here is a great write-up on this topic - discourse.getdbt.com/t/ctes-are-passthroughs-some-research/155 Temp tables on the other hand are built as separate table objects in your database and may not necessarily function as a pass-through in the same way. Truthfully I haven't personally tested that statement but at a first thought I don't think a WHERE clause outside of the temp table would be able to impact the result set that becomes the temp table object. But not to say it isn't possible - worth testing out.Try running some queries and take a look at the optimizer to see what results you get when you try different approaches. Thanks for your comment!
@@KahanDataSolutions We have a many subqueries for updating a day capacity. Having 7 days we ran it for each day and for each object (we have more than 1000) so we were doing the same complex subquery over 7 thousand times. I use a CTE and group all the results of that query by the day and the objectID and now I just need to select from that CTE where the day = myDay AND object= myObjectID :) that pretty much did it.
Definitely! You could pull it out and put it in a CTE (common-table-expression) or a temporary table. Then join the CTE/Temp Table in the various places where it was being used as a sub-select. But now you have the actual query logic in just one place, not multiple.
Looking for help with your team's data strategy? → www.kahandatasolutions.com
Looking to improve your data engineering skillset?→ bit.ly/more-kds
---
After spending more time working with modern data stacks, I wanted to add an updated note to this video. Some of these tips are focused on writing SQL in the context of traditional row-based databases such as SQL Server, MySQL, etc. as opposed to columnar platforms such as Snowflake.
The main update is in regards to point #1: Filtering a data set --
This rule is not _necessarily_ always still the case with some of the modern databases, such as Snowflake as the query plans have become really efficient and a "select *" is not always a no-no. The query processors can handle an insane amount of data without much of a performance hit (if written responsibly).
Research around topic this is covered really well in this article - discourse.getdbt.com/t/ctes-are-passthroughs-some-research/155.
I still suggest the filtered approach when working with any of the row-based databases mentioned above, and the other tips still hold true regardless of data stack. I mainly just wanted to add this caveat to point #1 as my viewpoint has evolved a bit as I've learned more!
#1 Use filter Data Set
#2 Use temp tables/ CTEs
#3 Don't repeat yourself (Dry)
#4 Use indexes
#5 Be consistent
Something that would add value here for me would be showing examples. I'm a little newer to SQL, so actually looking at examples would be helpful! This video may have been aimed at more advanced users, so that makes sense, but this is just my input!
Thanks for taking the time to make this and listen to our feedback!
Great suggestion and thanks for watching! I will make sure to add examples in future videos.
You may find this other video helpful as it does have some example SQL - ua-cam.com/video/asCQj1ybnds/v-deo.html
Just finding your channel. Excellent content. It may exist already, but this should be a whole series with examples of do's, dont's and why's
This is the best indexing explanation I have seen anywhere on the internet. Period!
Thanks!
Than you for the video.
I would like to add one thing: use partitions. Better than indexes for sets of identical values.
Sounds like great tips but as a beginner / intermediate it would be great to see some actual before / after code examples. It's easy to get lost without visuals.
I agree, check out this other video that adds some examples.
ua-cam.com/video/asCQj1ybnds/v-deo.html
Hope that helps!
This is just a theory! Wanted to see some examples :(
Check out this video - it has specific examples: ua-cam.com/video/asCQj1ybnds/v-deo.html
He's reading from Google, he don't even know what sql is
I dont understand the point 2# use temp tables.
Other advice i read says avoid temp tables like the plague.
So what gives?
In my experience, properly used CTEs/Temp Tables can massively help a query and I recommend them. But the key here is "properly" using them.
When used strategically, they help you stay more organized and can boost performance.
When used blindly and without understanding what's happening you could end up creating an unnecessarily complex execution plan.
I am newer to SQL, but for #1, wouldn't the query optimizer first pare down your data set by any WHERE clause found, regardless of whether that clause appears at the top to filter rows into a temp table, or down a ways in a longer query? Or -- is the optimizer really that useless?
This is likely accurate when working w/ CTEs (common table expressions) vs temp tables in the more modern cloud databases (Snowflake, Redshift, etc). CTEs will function as a pass-through and the optimizer would likely function in the way you describe. This is a common strategy that you see in "best practices" on tools such as dbt that rely heavily on CTEs. But again, this is mainly based on modern analytical cloud databases vs traditional row-based ones (sql server, mysql, postgres, etc.).
Here is a great write-up on this topic - discourse.getdbt.com/t/ctes-are-passthroughs-some-research/155
Temp tables on the other hand are built as separate table objects in your database and may not necessarily function as a pass-through in the same way. Truthfully I haven't personally tested that statement but at a first thought I don't think a WHERE clause outside of the temp table would be able to impact the result set that becomes the temp table object. But not to say it isn't possible - worth testing out.Try running some queries and take a look at the optimizer to see what results you get when you try different approaches.
Thanks for your comment!
Many thanks for this great content
Appendix or index in a book?
Could you please explain how can I make a loop in Presto sql?
Thanks :) went from more than 2 minutes query down to 750 ms average :D
Nice! What changes did you make that sped it up the most?
@@KahanDataSolutions We have a many subqueries for updating a day capacity. Having 7 days we ran it for each day and for each object (we have more than 1000) so we were doing the same complex subquery over 7 thousand times. I use a CTE and group all the results of that query by the day and the objectID and now I just need to select from that CTE where the day = myDay AND object= myObjectID :) that pretty much did it.
Thanks for this! I also like to put indexes on my temp tables if they get a little bigger (combining tips 2 and 4) And lots of comments!
I agree on both of those as well!
#1, So valid, it's not even funny! I work on a data warehouse and let me tell you, this is rule number one for a reason!
Thanks, very well said.
Thanks for watching
#3
Let's say I have a query with 2 exact same subqueries, is it possible to write a subquery once and use it again somehow?
Definitely! You could pull it out and put it in a CTE (common-table-expression) or a temporary table. Then join the CTE/Temp Table in the various places where it was being used as a sub-select. But now you have the actual query logic in just one place, not multiple.
@@KahanDataSolutions Thanks, but unfortunately I use Jet SQL so no WITH statement thing for me, haha
Informative.
Thanks!
So u read from someone else blog and made a video outta it
Useful advice
Thank you
Excellent.
Thanks!
Nice, thx
Thanks for watching
no examples
This video tells you to do things but it doesn't tell you how to do the things.
excellent information but boring as hell. Add some music or images man
you should have given samples man
This would work as a podcast instead of a video. No visual not examples
If you used examples when it would be better
Check out this video (it has some examples) - ua-cam.com/video/asCQj1ybnds/v-deo.html
Thanks for watching!