T-SQL Tutorial - Aggregate Window Functions Performance
Вставка
- Опубліковано 25 лип 2024
- Learn how to measure and optimise aggregate window functions in this T-SQL Tutorial. In the video I show how the version of SQL Server impacts aggregate window functions and how we can make queries that use window functions faster.
If you are not familiar with NTILE, feel free to check out my video on that here: • SQL Tutorial - How to ...
You can also watch all my videos on Window Functions within my playlist: • Window Functions
If you wish to follow along with the video, you can use the SQL Scripts below:
-- check if table exists
IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
DROP TABLE dbo.Orders;
-- create table
CREATE TABLE dbo.Orders
(
OrderId BIGINT IDENTITY(1, 1)
CONSTRAINT PK_Orders_OrderID PRIMARY KEY (OrderId),
CustomerId BIGINT,
Amount DECIMAL(6, 2)
);
-- set variables
DECLARE
@Customers BIGINT = 1000,
@NumRows BIGINT = 1000000
;
WITH
L0 AS (SELECT 1 AS N UNION ALL SELECT 1),
L1 AS (SELECT 1 AS N FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS (SELECT 1 AS N FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS (SELECT 1 AS N FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS (SELECT 1 AS N FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS (SELECT 1 AS N FROM L4 AS A CROSS JOIN L4 AS B)
INSERT INTO dbo.Orders (CustomerId, Amount)
SELECT
NTILE(@Customers) OVER(ORDER BY (SELECT NULL)) AS CustomerId,
CAST(ABS(CHECKSUM(NEWID())) / 1000000.00 AS DECIMAL(6, 2)) AS Amount
FROM
(
SELECT TOP(@NumRows)
N
FROM L5
) AS D;
-- aggregate window function
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT
CustomerId,
Amount,
CAST((Amount * 100) / SUM(Amount) OVER() AS DECIMAL(8, 6)) AS PctofTotal,
CAST((Amount * 100) / SUM(Amount) OVER(PARTITION BY CustomerId) AS DECIMAL(8, 6)) AS PctofCustTotal
FROM dbo.Orders;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
-- using derived tables
SELECT
A.CustomerId,
A.Amount,
CAST((A.Amount * 100) / Total AS DECIMAL(8, 6)) AS PctofTotal,
CAST((A.Amount * 100) / CustTotal AS DECIMAL(8, 6)) AS PctofCustTotal
FROM dbo.Orders AS A
INNER JOIN
(
SELECT
CustomerId,
SUM(Amount) AS CustTotal
FROM dbo.Orders
GROUP BY CustomerId
) AS B
ON A.CustomerId = B.CustomerId
CROSS JOIN
(
SELECT
SUM(Amount) AS Total
FROM dbo.Orders
) AS C
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF; - Наука та технологія
As usual, I find your video quite helpful and useful. Thanks again.
Thanks so much.
Been using SQL since 95, and didn't use the OVER() stuff specifically because of performance; I tend to favor derived tables.
But I never tested again when 2019 shipped.
Thanks for teaching this old dog :)
Fantastic, window functions performance has always been with lag and lead or at least it has for sometime but its great to see aggregation performance improved.
@@BeardedDevData
You present things well.
Maybe my real problem was familiarity. I wrote a lot of SQL before that stuff was introduced and it seems to be the way I think...
I'm sold. Windowing sure does make things like moving average easier to calculate...
Are you playing with the 2022 preview? Does it have window and nth_row?
I mainly work in Azure now, the new features are available in Azure SQL Database first which is quite cool.
@@BeardedDevData Does that mean that a video about Azure SQL is in your plans? Perhaps new features, for example.
Nice explanation 👌 👍
rand() is executed once per batch rather than for each row.
Perfect explanation.
Meaning, I think, that the same value generated by rand() would be inserted into the amount field for all rows created by the cross joins. Right?
@@houstonvanhoy7767 Well, you don't have to wonder, you can test it.
I do the best that I can to avoid functions where possible and I'm not familiar with this particular one, my only guess is that you would take a performance hit. But please let me know how far off I am? 😄👍
Why do you avoid functions? In this case Microsoft has extended functionality over the last few years so performance is actually much better when using Window Functions than the alternative but it depends what version of SQL Server you are using. I would never recommend a blanket approach of avoidance, that can certainly cause sub-optimal queries, e.g. table-valued functions are fantastic, keep an eye out for a video on those soon.
Ok, this is where I am at.
SELECT
[Customer Name],
ROUND(SUM([Sales]),2) as customer_total
FROM Orders O
GROUP BY [Customer Name];
CROSS JOIN
(
SELECT [Customer Name],
SUM(Sales) OVER(PARTITION BY [Customer Name]) as customer_total,
SUM(Sales) OVER() as total_sales,
ROUND((SUM(Sales) OVER(PARTITION BY [Customer Name])/SUM(Sales) OVER() ) *100,3) as percent_of_total
FROM Orders
ORDER BY percent_of_total desc
) AS W
ON O.[Customer Name] = W.[Customer Name];
Not quite, as you only want to know customers, total by customer, total and percentage of total you can really simplify the query:
SELECT
[Customer Name],
[CustomerTotal],
Total,
([CustomerTotal] / [Tota]l) * 100 AS [Percentage]
FROM
(
SELECT
[Customer Name],
SUM(Sales) AS CustomerTotal
FROM dbo.Orders
GROUP BY [Customer Name]
) AS C
CROSS JOIN
(
SELECT
SUM(Sales) AS Total
FROM dbo.Orders
) AS D
@@BeardedDevData Close but no cigar? I am getting this error:
Msg 8120, Level 16, State 1, Line 2
Column 'dbo.Orders.Customer Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
No problem, was replying late last night and missed off the GROUP BY, edited the original query now
@@BeardedDevData Thanks again for helping me with this. when I run the above query there is a error message:
Msg 8120, Level 16, State 1, Line 4
Column 'D.Total' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
That was me trying to point you in the right direction, I didn't want to just write the solution for you but have you think about it. I have updated it again to the final query although I haven't fully checked the syntax.