SQL Tutorial - Difference between CTEs and Derived Tables
Вставка
- Опубліковано 2 чер 2022
- In this SQL Tutorial I talk through the differences between CTEs and Derived Tables in SQL Server. CTEs actually came along to address some of the limitations of Derived Tables, when nesting Derived Tables it can be difficult to understand what the query is doing, CTEs address this by allowing you to read the SQL code from top to bottom, as Derived Tables are also declared in the FROM clause of a query, it prevents reuse, declaring CTEs outside of the final SELECT allows for reuse.
Both CTEs and Derived Tables are Table Expressions that are inlined when the query executes. CTEs offer better readability and usability over Derived Tables, however Derived Tables offer better visibility, you don't need to change anything to see intermediate query results, you just run the inner queries whereas with a CTE you need to change the query to see an intermediate step.
Remember that CTEs and Derived Tables can also be used together.
#sqltutorials #sqlserver #dataengineering #data
Code Samples:
-- create table
IF OBJECT_ID(N'dbo.CustomerSalesRaw', N'U') IS NOT NULL
DROP TABLE dbo.CustomerSalesRaw;
CREATE TABLE dbo.CustomerSalesRaw
(
Customer VARCHAR(50),
[Location] VARCHAR(50),
CustomerType VARCHAR(50),
StartDate DATE,
[Date] DATE,
Amount DECIMAL(6, 2)
);
INSERT INTO dbo.CustomerSalesRaw (Customer, [Location], CustomerType, StartDate, [Date], Amount)
VALUES
('Polly McFadden', 'United Kingdom', 'Online', '20220101', '20220601', 1000.00),
('Polly McFadden', 'United Kingdom', 'Online', '20220101', '20220601', 1000.00),
('Polly McFadden', 'United Kingdom', 'Online', '20220101', '20220601', 1000.00),
('Polly McFadden', 'United Kingdom', 'Online', '20220101', '20220516', 300.00),
('Daisy Smith', 'United Kingdom', 'Store', '20220301', '20220602', 300.00),
('Daisy Smith', 'United Kingdom', 'Store', '20220301', '20220602', 300.00),
('Paul Kirkland', 'United Kingdom', 'Online', '20220301', '20220502', 680.00),
('Paul Kirkland', 'United Kingdom', 'Online', '20220301', '20220601', 200.00),
('Betsy Jones', 'United Kingdom', 'Online', '20220301', '20220511', 720.00),
('Betsy Jones', 'United Kingdom', 'Online', '20220301', '20220511', 720.00),
('Mark Porter', 'United Kingdom', 'Store', '20220301', '20220522', 32.00),
('Mark Porter', 'United Kingdom', 'Store', '20220301', '20220522', 32.00);
-- nested derived table
SELECT
Aggr.Customer,
Detail.CustomerType,
Detail.StartDate,
Aggr.Mnth,
Aggr.Total
FROM
(
SELECT
Customer,
MONTH([Date]) AS Mnth,
SUM(Amount) AS Total
FROM
(
SELECT DISTINCT
Customer,
[Location],
CustomerType,
StartDate,
[Date],
Amount
FROM dbo.CustomerSalesRaw
) AS Clean
GROUP BY
Customer,
MONTH([Date])
) AS Aggr
INNER JOIN
(
SELECT DISTINCT
Customer,
CustomerType,
StartDate
FROM dbo.CustomerSalesRaw
) AS Detail
ON Aggr.Customer = Detail.Customer;
-- nested CTE
WITH Clean AS
(
SELECT DISTINCT
Customer,
[Location],
CustomerType,
StartDate,
[Date],
Amount
FROM dbo.CustomerSalesRaw
), Aggr AS
(
SELECT
Customer,
MONTH([Date]) AS Mnth,
SUM(Amount) AS Total
FROM Clean
GROUP BY
Customer,
MONTH([Date])
)
SELECT
A.Customer,
B.CustomerType,
B.StartDate,
A.Mnth,
A.Total
FROM Aggr AS A
INNER JOIN
(
SELECT DISTINCT
Customer,
CustomerType,
StartDate
FROM dbo.CustomerSalesRaw
) AS B
ON A.Customer = B.Customer;
-- create clean table
SELECT DISTINCT
Customer,
[Location],
CustomerType,
StartDate,
[Date],
Amount
INTO dbo.CustomerSalesClean
FROM dbo.CustomerSalesRaw;
-- reusing derived table
SELECT
[Current].Customer,
[Current].Total - Previous.Total AS [Difference]
FROM
(
SELECT
Customer,
MONTH([Date]) AS Mnth,
SUM(Amount) AS Total
FROM CustomerSalesClean
GROUP BY
Customer,
MONTH([Date])
) AS Previous
LEFT OUTER JOIN
(
SELECT
Customer,
MONTH([Date]) AS Mnth,
SUM(Amount) AS Total
FROM CustomerSalesClean
GROUP BY
Customer,
MONTH([Date])
) AS [Current]
ON Previous.Customer = [Current].Customer
AND Previous.Mnth - 1 = [Current].Mnth;
-- reusing CTE
WITH Aggr AS
(
SELECT
Customer,
MONTH([Date]) AS Mnth,
SUM(Amount) AS Total
FROM CustomerSalesClean
GROUP BY
Customer,
MONTH([Date])
)
SELECT
[Current].Customer,
[Current].Total - Previous.Total
FROM Aggr AS Previous
LEFT OUTER JOIN Aggr AS [Current]
ON Previous.Customer = [Current].Customer
AND Previous.Mnth - 1 = [Current].Mnth; - Наука та технологія
Do you use CTEs, Derived Tables or both?
I prefer to add my own indexes, so I do not use ctes, I use temp tables where I can add an index on fields that don't have them already. There's a few tricks you can do to make getting the temp table structures very easy, so I normally do that.
You'd probably find that will be slower than typical use cases for CTEs or Derived Tables though. If you have a table with a supporting index in place, creating a CTE or Derived Table on that table will be faster than copying the data to a temporary structure then creating an index, will be wasted effort. There are definitely a lot of use cases for your approach though, if you're repeatedly doing that process though it might be worth looking at indexed views.
CTEs FTW! Space is cheap but time is not. I rather spend time on legibility so anyone reading my code could walk away with less confusion
It's a very good point, other engineers and analysts being able to easily interpret your code is invaluable.
@@ameobiamicrobiological2366 This doesn't make any sense. A CTE literally is a temp table and faster.
This channel needs more views! Always learn something new cheers Dev
Thanks for the support Brian.
New to SQL and your channel. I appreciate your content! Going to make an effort to use more CTEs and derived tables in my queries.
Thanks so much, also great to hear somebody learning SQL, let me know if there's areas you would like to see a video on.
I appreciate your content. And I really appreciate that you engage with your viewers' comments instead of ignoring comments - which most content creators do. It takes time to do that. The discussions are valuable, and sometimes ideas for further videos can arise.
Do you use an add-in SQL formatting tool, such as Redgate's SQL Prompt? Something in this demo hints at that.
Do you know of a similar tool that is free? Thanks.
I have used it in the past but not something I use day to day. I believe there is a free version but have to copy and paste online. An alternative you could have a look at is ApexSQL Refactor, I believe it's free.
Very helpful. Thankyou
No problem.
I would like to understand recursive CTEs, do you have a video discussing that specific topic? BTW, I really do appreciate your videos, they are quite helpful.
Hi, I haven't got a video on recursive CTEs at the moment but I will put it on my list to do.
@@BeardedDevData I really appreciate your time and effort, as well as how you explain things regarding SQL topics. I have learned a great deal from your videos.
@@OldPick-Unix-dude-pb9jg
Yes ☝. Step-by-step, which is the only right way. Some content creators skip steps, not coming back down to the level of us puny humans.
Appreciate your work. I have a problem "org.postgresql.util.PSQL.Exception. ERROR Out of memory. Consumed 5368709120 bytes. DETAIL Failed on request in VecHash Table" with my psql query which contains few millions values(result of another table) in IN clause. Can anyone there to take me out from this.
A Derived Tables is also a subquery, is it not?
A derived table is a type of table of table expression, alongside CTEs, Views and Table Valued Functions, a good use case is where you want to perform multiple operations on a table or build a table to operate on, there are also certain rules such as cannot use ORDER BY without TOP that apply to Table Expressions. Subqueries return a value within SELECT or WHERE clauses, ORDER BY is allowed. I have seen people refer to derived tables as subqueries but this would be incorrect as they serve different purposes but it's only terminology, use cases are what's important.
I mean I'm not sure this fully explains it, some of it is just flat out pro cons rather pulling table to explain it. Should just have a slide side by side comparsion and test times. As a developer I always use CTEs for many reasons. Literally never derived, ever. Otherwise I am purely creating VOLATILE table (Teradata). But 90% of the time I use CTEs even when Alteryx -> Tableau
I'm curious as to why you have boycotted derived tables and why you would consider a derived table volatile data? The reason I didn't discuss performance is because I would expect it to be equivalent in almost all cases, as I said CTEs have the benefits of readability but I wouldn't typically expect a CTE to perform better than a derived table.