T-SQL Tutorial - Aggregate Window Functions Performance

Поділитися
Вставка
  • Опубліковано 25 лип 2024
  • Learn how to measure and optimise aggregate window functions in this T-SQL Tutorial. In the video I show how the version of SQL Server impacts aggregate window functions and how we can make queries that use window functions faster.
    If you are not familiar with NTILE, feel free to check out my video on that here: • SQL Tutorial - How to ...
    You can also watch all my videos on Window Functions within my playlist: • Window Functions
    If you wish to follow along with the video, you can use the SQL Scripts below:
    -- check if table exists
    IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
    DROP TABLE dbo.Orders;
    -- create table
    CREATE TABLE dbo.Orders
    (
    OrderId BIGINT IDENTITY(1, 1)
    CONSTRAINT PK_Orders_OrderID PRIMARY KEY (OrderId),
    CustomerId BIGINT,
    Amount DECIMAL(6, 2)
    );
    -- set variables
    DECLARE
    @Customers BIGINT = 1000,
    @NumRows BIGINT = 1000000
    ;
    WITH
    L0 AS (SELECT 1 AS N UNION ALL SELECT 1),
    L1 AS (SELECT 1 AS N FROM L0 AS A CROSS JOIN L0 AS B),
    L2 AS (SELECT 1 AS N FROM L1 AS A CROSS JOIN L1 AS B),
    L3 AS (SELECT 1 AS N FROM L2 AS A CROSS JOIN L2 AS B),
    L4 AS (SELECT 1 AS N FROM L3 AS A CROSS JOIN L3 AS B),
    L5 AS (SELECT 1 AS N FROM L4 AS A CROSS JOIN L4 AS B)
    INSERT INTO dbo.Orders (CustomerId, Amount)
    SELECT
    NTILE(@Customers) OVER(ORDER BY (SELECT NULL)) AS CustomerId,
    CAST(ABS(CHECKSUM(NEWID())) / 1000000.00 AS DECIMAL(6, 2)) AS Amount
    FROM
    (
    SELECT TOP(@NumRows)
    N
    FROM L5
    ) AS D;
    -- aggregate window function
    SET STATISTICS TIME ON;
    SET STATISTICS IO ON;
    SELECT
    CustomerId,
    Amount,
    CAST((Amount * 100) / SUM(Amount) OVER() AS DECIMAL(8, 6)) AS PctofTotal,
    CAST((Amount * 100) / SUM(Amount) OVER(PARTITION BY CustomerId) AS DECIMAL(8, 6)) AS PctofCustTotal
    FROM dbo.Orders;
    SET STATISTICS TIME OFF;
    SET STATISTICS IO OFF;
    SET STATISTICS TIME ON;
    SET STATISTICS IO ON;
    -- using derived tables
    SELECT
    A.CustomerId,
    A.Amount,
    CAST((A.Amount * 100) / Total AS DECIMAL(8, 6)) AS PctofTotal,
    CAST((A.Amount * 100) / CustTotal AS DECIMAL(8, 6)) AS PctofCustTotal
    FROM dbo.Orders AS A
    INNER JOIN
    (
    SELECT
    CustomerId,
    SUM(Amount) AS CustTotal
    FROM dbo.Orders
    GROUP BY CustomerId
    ) AS B
    ON A.CustomerId = B.CustomerId
    CROSS JOIN
    (
    SELECT
    SUM(Amount) AS Total
    FROM dbo.Orders
    ) AS C
    SET STATISTICS TIME OFF;
    SET STATISTICS IO OFF;
  • Наука та технологія

КОМЕНТАРІ • 20

  • @OldPick-Unix-dude-pb9jg
    @OldPick-Unix-dude-pb9jg 10 місяців тому +2

    As usual, I find your video quite helpful and useful. Thanks again.

  • @JohnBoen
    @JohnBoen 2 роки тому +3

    Been using SQL since 95, and didn't use the OVER() stuff specifically because of performance; I tend to favor derived tables.
    But I never tested again when 2019 shipped.
    Thanks for teaching this old dog :)

    • @BeardedDevData
      @BeardedDevData  2 роки тому +1

      Fantastic, window functions performance has always been with lag and lead or at least it has for sometime but its great to see aggregation performance improved.

    • @JohnBoen
      @JohnBoen 2 роки тому +1

      @@BeardedDevData
      You present things well.
      Maybe my real problem was familiarity. I wrote a lot of SQL before that stuff was introduced and it seems to be the way I think...
      I'm sold. Windowing sure does make things like moving average easier to calculate...
      Are you playing with the 2022 preview? Does it have window and nth_row?

    • @BeardedDevData
      @BeardedDevData  2 роки тому +1

      I mainly work in Azure now, the new features are available in Azure SQL Database first which is quite cool.

    • @houstonvanhoy7767
      @houstonvanhoy7767 Рік тому

      @@BeardedDevData Does that mean that a video about Azure SQL is in your plans? Perhaps new features, for example.

  • @sravankumar1767
    @sravankumar1767 2 роки тому +1

    Nice explanation 👌 👍

  • @Sttuey
    @Sttuey 2 роки тому +1

    rand() is executed once per batch rather than for each row.

    • @BeardedDevData
      @BeardedDevData  2 роки тому

      Perfect explanation.

    • @houstonvanhoy7767
      @houstonvanhoy7767 Рік тому

      Meaning, I think, that the same value generated by rand() would be inserted into the amount field for all rows created by the cross joins. Right?

    • @Sttuey
      @Sttuey Рік тому

      @@houstonvanhoy7767 Well, you don't have to wonder, you can test it.

  • @ameobiamicrobiological2366
    @ameobiamicrobiological2366 2 роки тому +1

    I do the best that I can to avoid functions where possible and I'm not familiar with this particular one, my only guess is that you would take a performance hit. But please let me know how far off I am? 😄👍

    • @BeardedDevData
      @BeardedDevData  2 роки тому

      Why do you avoid functions? In this case Microsoft has extended functionality over the last few years so performance is actually much better when using Window Functions than the alternative but it depends what version of SQL Server you are using. I would never recommend a blanket approach of avoidance, that can certainly cause sub-optimal queries, e.g. table-valued functions are fantastic, keep an eye out for a video on those soon.

  • @CaribouDataScience
    @CaribouDataScience Рік тому +1

    Ok, this is where I am at.
    SELECT
    [Customer Name],
    ROUND(SUM([Sales]),2) as customer_total
    FROM Orders O
    GROUP BY [Customer Name];
    CROSS JOIN
    (
    SELECT [Customer Name],
    SUM(Sales) OVER(PARTITION BY [Customer Name]) as customer_total,
    SUM(Sales) OVER() as total_sales,
    ROUND((SUM(Sales) OVER(PARTITION BY [Customer Name])/SUM(Sales) OVER() ) *100,3) as percent_of_total
    FROM Orders
    ORDER BY percent_of_total desc
    ) AS W
    ON O.[Customer Name] = W.[Customer Name];

    • @BeardedDevData
      @BeardedDevData  Рік тому

      Not quite, as you only want to know customers, total by customer, total and percentage of total you can really simplify the query:
      SELECT
      [Customer Name],
      [CustomerTotal],
      Total,
      ([CustomerTotal] / [Tota]l) * 100 AS [Percentage]
      FROM
      (
      SELECT
      [Customer Name],
      SUM(Sales) AS CustomerTotal
      FROM dbo.Orders
      GROUP BY [Customer Name]
      ) AS C
      CROSS JOIN
      (
      SELECT
      SUM(Sales) AS Total
      FROM dbo.Orders
      ) AS D

    • @CaribouDataScience
      @CaribouDataScience Рік тому +1

      @@BeardedDevData Close but no cigar? I am getting this error:
      Msg 8120, Level 16, State 1, Line 2
      Column 'dbo.Orders.Customer Name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    • @BeardedDevData
      @BeardedDevData  Рік тому

      No problem, was replying late last night and missed off the GROUP BY, edited the original query now

    • @CaribouDataScience
      @CaribouDataScience Рік тому +1

      @@BeardedDevData Thanks again for helping me with this. when I run the above query there is a error message:
      Msg 8120, Level 16, State 1, Line 4
      Column 'D.Total' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    • @BeardedDevData
      @BeardedDevData  Рік тому +1

      That was me trying to point you in the right direction, I didn't want to just write the solution for you but have you think about it. I have updated it again to the final query although I haven't fully checked the syntax.