Scalar Functions vs Table-Valued Functions

Today I’m going to give you another tip, but now focusing on Scalar Functions vs Table-Valued Functions.

Many developers love to use Scalar Functions inside of SQL Server because it lets them easily reuse code, which is a common goal in object-oriented programming.
Why would you ever write the same piece of logic more than once after all?

But the problem with Scalar Functions is that the query optimizer performs row-by-row processing instead of set-based logic.
Scalar Functions are known as “silent performance killer“, because even if we use some of the techniques in our benchmark toolkit, like looking at SET Statistics IO, and execution plans, we wouldn’t be able to see the row-by-row processing.

Let’s take a look at a sample query using Scalar Function:

CREATE FUNCTION GetmaxProductQty_Scalar (@ProductId INT)
returns INT
      DECLARE @maxQty INT 

      SELECT @maxQty = Max(sod.orderqty)
      FROM   Sales.SalesOrderDetail sod
      WHERE  sod.productid = @ProductId 

      RETURN ( @maxQty )

Here’s a query that uses the new function. It does nothing more than getting the max quantity sold for each product in the Production.Product table.

SELECT productid,
       dbo.GetMaxProductqty_scalar(productid) m
FROM   Production.Product
ORDER  BY productid

If we look at the SET Statistics IO, you might think this is great query just because we only have 15 logical reads. But does that mean that we don’t have anything else to improve here?


Keep in mind that this is just the Product’s table, and our function is referencing SalesOrderDetail. That is the reason why this is a “silent performance killer”.
If we check the execution plan, there is only an Index Scan from Product’s table.


However, using SQL Server Profiler, we are able to see the row-by-row processing:


Now, let’s create the Table-Valued Function, doing exactly the same thing on SalesOrderDetail.
You will notice that there is a slight difference compared to the previous function GetMaxProductQty_Scalar.

CREATE FUNCTION GetMaxProductQty_inline (@ProductId INT)
returns TABLE
      (SELECT Max(sod.orderqty) AS maxqty
       FROM   sales.SalesOrderdetail sod
       WHERE  sod.productid = @ProductId)

You can check the following examples using the new function:

SELECT      productid,
FROM        production.product
CROSS apply dbo.GetMaxProductqty_Inline(productid) MaxQTy
ORDER BY    productid

SELECT   productid,
         ( SELECT maxqty
FROM dbo.GetMaxProductqty_Inline(productid)) MaxQty
FROM     production.product
ORDER BY productid

If we now look at the SET Statistics IO, we can see the information from SalesOrderDetail:


The other thing to notice is that our execution plan has changed, and now includes everything. We can see the index scan on SalesOrderDetail, so we still may have some opportunities for improvement.
The main point here is that our silent performance killer is gone, and we don’t see the row-by-row processing anymore. So, if you have the opportunity to push that over to a Table Function, it will allow you to use set-based logic, which gives you much faster execution times.

Specify ORDER BY if you need ordered data

Some of you guys may not know that, so I’d like to share it.

If you need ordered data and your query does not have an ORDER BY clause, the data might return in any order. Yeah. It’s not guaranteed that it will return based in the order of the “primary key”.
So, you can’t  assume that SQL Server has some “default” order. You might believe there is, because usually when you SELECT a specific table it will use the cluster index(PK) with identity IDs.

But, it’s not always the case. Check the following example:

The following query will run just fine, and “as expected”.

Now, let’s create the following index.

CREATE INDEX Test_RandomInt ON dbo.Test(RandomInt);

The same query executed before is now broken, just because SQL Server has decided to change the index in the execution plan .

I’m not saying that you should start sorting every single query (it’s very expensive performance wise). But, if you really need it, you must provide the ORDER BY clause.

It’s preferable to sort on the app side from a resource perspective, because generally it’s easier to add more servers or add more CPU and Memory to the application layer.  It’s much more difficult, and expensive, to do so for the SQL servers.