Scalar Functions vs Table-Valued Functions

Today I’m going to give you another tip, but now focusing on Scalar Functions vs Table-Valued Functions.

Many developers love to use Scalar Functions inside of SQL Server because it lets them easily reuse code, which is a common goal in object-oriented programming.
Why would you ever write the same piece of logic more than once after all?

But the problem with Scalar Functions is that the query optimizer performs row-by-row processing instead of set-based logic.
Scalar Functions are known as “silent performance killer“, because even if we use some of the techniques in our benchmark toolkit, like looking at SET Statistics IO, and execution plans, we wouldn’t be able to see the row-by-row processing.

Let’s take a look at a sample query using Scalar Function:

CREATE FUNCTION GetmaxProductQty_Scalar (@ProductId INT)
returns INT
      DECLARE @maxQty INT 

      SELECT @maxQty = Max(sod.orderqty)
      FROM   Sales.SalesOrderDetail sod
      WHERE  sod.productid = @ProductId 

      RETURN ( @maxQty )

Here’s a query that uses the new function. It does nothing more than getting the max quantity sold for each product in the Production.Product table.

SELECT productid,
       dbo.GetMaxProductqty_scalar(productid) m
FROM   Production.Product
ORDER  BY productid

If we look at the SET Statistics IO, you might think this is great query just because we only have 15 logical reads. But does that mean that we don’t have anything else to improve here?


Keep in mind that this is just the Product’s table, and our function is referencing SalesOrderDetail. That is the reason why this is a “silent performance killer”.
If we check the execution plan, there is only an Index Scan from Product’s table.


However, using SQL Server Profiler, we are able to see the row-by-row processing:


Now, let’s create the Table-Valued Function, doing exactly the same thing on SalesOrderDetail.
You will notice that there is a slight difference compared to the previous function GetMaxProductQty_Scalar.

CREATE FUNCTION GetMaxProductQty_inline (@ProductId INT)
returns TABLE
      (SELECT Max(sod.orderqty) AS maxqty
       FROM   sales.SalesOrderdetail sod
       WHERE  sod.productid = @ProductId)

You can check the following examples using the new function:

SELECT      productid,
FROM        production.product
CROSS apply dbo.GetMaxProductqty_Inline(productid) MaxQTy
ORDER BY    productid

SELECT   productid,
         ( SELECT maxqty
FROM dbo.GetMaxProductqty_Inline(productid)) MaxQty
FROM     production.product
ORDER BY productid

If we now look at the SET Statistics IO, we can see the information from SalesOrderDetail:


The other thing to notice is that our execution plan has changed, and now includes everything. We can see the index scan on SalesOrderDetail, so we still may have some opportunities for improvement.
The main point here is that our silent performance killer is gone, and we don’t see the row-by-row processing anymore. So, if you have the opportunity to push that over to a Table Function, it will allow you to use set-based logic, which gives you much faster execution times.

SQL Server Concepts and Best Practices [Part 1]

Hi Folks,

It’s been a while that I don’t post anything here, so it’s time to start a series of SQL Server concepts and Best Practices. Most of them, I’ve already shared in my repository on GitHub.
So, take a peek when you get a chance.

MAX vs Top 1 – which one is better?

When there is a clustered index on the table and the column to be queried, both the MAX() operator and the query SELECT TOP 1 will have almost identical performance.

When there is no clustered index (Heap) on the table and the column to be queried, the MAX() operator offers the better performance. So, consider using MAX rather than SELECT TOP 1.

CTE vs #temp table


* Are unindexable (but can use existing indexes on referenced objects)
* Cannot have constraints
* Are essentially disposable VIEWs
* Persist only until the next query is run
* Can be recursive
* Do not have dedicated stats (rely on stats on the underlying objects)

Temp Tables…

* Are real materialized tables that exist in tempdb
* Can be indexed
* Can have constraints
* Persist for the life of the current CONNECTION
* Can be referenced by other queries or subprocedures
* Have dedicated stats generated by the engine

As far as when to use each, they have very different use cases. If you will have a very large result set, or need to refer to it more than once, put it in a #temp table. If it needs to be recursive, is disposable, or is just to simplify something logically, a CTE is preferred. Also, a CTE should never be used for performance. You will almost never speed things up by using a CTE, because, again, it’s just a disposable view. You can do some neat things with them but speeding up a query isn’t really one of them.

When to use char/varchar/nchar/nvarchar?

  • nchar and nvarchar can store Unicode characters.
  • char and varchar cannot store Unicode characters.
  • char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don’t use up all that space.
  • varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar. nchar and nvarchar will take up twice as much storage space, so it may be wise to use them only if you need Unicode support.

It is worth highlighting the importance to keep your code (variables) using the same data type definition as the column definition. Mismatch data types will probably result in Non-Sargable queries.

Using the same column name in different tables but with different data types

Probably anybody will assume that columns with the same name in different tables have the same data type. As a result, they won’t verify data types. Different types is an accident waiting to happen.

Check that before creating a new column with the same name:

SELECT'.' AS ObjectName, as ColumnName,
CASE WHEN IN ('char','varchar') THEN'('+CASE WHEN s.max_length<0 then 'MAX' ELSE CONVERT(varchar(10),s.max_length) END+')'
WHEN IN ('nvarchar','nchar') THEN'('+CASE WHEN s.max_length<0 then 'MAX' ELSE CONVERT(varchar(10),s.max_length/2) END+')'
WHEN IN ('numeric') THEN'('+CONVERT(varchar(10),s.precision)+','+CONVERT(varchar(10),s.scale)+')'
,CASE WHEN s.is_nullable=1 THEN 'NULL' ELSE 'NOT NULL' END AS Nullable     
FROM sys.columns s
INNER JOIN sys.types t ON s.system_type_id=t.user_type_id and t.is_user_defined=0
INNER JOIN sys.objects o ON s.object_id=o.object_id
INNER JOIN sys.schemas sh on o.schema_id=sh.schema_id
WHERE IN (select table_name from information_schema.tables)
--AND = 'NewColumn'
ORDER BY'.',s.column_id

Float and real datatypes

Why we should not use float and real datatypes to store money in SQL Server? The main problem with them is that they are approximate numerics, so they don’t store exact values.

Take a look at this:

image2017-6-29 10-35-6

The float and real data types are known as approximate data types. The behavior of float and real follows the IEEE 754 specification on approximate numeric data types. Approximate numeric data types do not store the exact values specified for many numbers; they store an extremely close approximation of the value. For many applications, the tiny difference between the specified value and the stored approximation is not noticeable. At times, though, the difference becomes noticeable. Because of the approximate nature of the float and real data types, do not use these data types when exact numeric behavior is required, such as in financial applications, in operations involving rounding, or in equality checks. Instead, use the integer, decimal, money, or smallmoney data types.

Decimal is always a better choice even compared to money datatype, where we might have issues in cases of division.

image2017-6-29 10-35-47

I hope I could help you somehow.
See you soon!


Determining a setting for Cost threshold for parallelism

The cost threshold for parallelism option specifies the threshold at which SQL Server creates and runs parallel plans for queries. SQL Server creates and runs a parallel plan for a query only when the estimated cost to run a serial plan for the same query is higher than the value set in cost threshold for parallelism.


The default is set to 5, which is a really low setting. Although there is no right and wrong number, I’d recommend starting with this setting at 50 or so and tuning up or down as appropriate.

The following script returns a list of parallel query plans and their subtree costs which will help you decide an appropriate number.

SQL Server ignores the cost threshold for parallelism value under the following conditions:

  • Your computer has only one processor.
  • Only a single CPU is available to SQL Server because of the affinity mask configuration option.
  • The max degree of parallelism option is set to 1.

core AS (
eqp.query_plan AS [QueryPlan],
ecp.plan_handle [PlanHandle],
q.[Text] AS [Statement],
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS OptimizationLevel ,
ISNULL(CAST(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float),0) AS SubTreeCost ,
ecp.usecounts [UseCounts],
ecp.size_in_bytes [SizeInBytes]
sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(ecp.plan_handle) AS eqp
CROSS APPLY sys.dm_exec_sql_text(ecp.plan_handle) AS q
CROSS APPLY query_plan.nodes ('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn ( n )

SubTreeCost * UseCounts [GrossCost],
--GrossCost DESC
SubTreeCost DESC

Fore more details: Cost Threshold for Parallelism Option

Finding Implicit Column Conversions in the Plan Cache

I’m just sharing with you guys a script to find all implicit data type conversions associated with mismatched data types in a specific database.
It’s pretty handy when you are either developing or coding reviewing.



SET @dbname = QUOTENAME(DB_NAME()); 

   (DEFAULT '')
   stmt.value('(@StatementText)[1]', 'varchar(max)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)'),
   ic.DATA_TYPE AS ConvertFrom,
   t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo,
   t.value('(@Length)[1]', 'int') AS ConvertToLength,
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t)
   ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)')
   AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)')
   AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)')
WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1

More details here:

Sargable queries – Using indexes effectively

It’s really important to have effective indexes in your database, but also equally important to ensure that yours queries are written properly to use these indexes effectively.
So, one thing that we always need to keep in mind is to avoid using Nonsargable search conditions.

The word sargable is a contraction of “Search ARGument ABLE”, where the optimizer has the ability to seek a row in the index.

Sargable Conditions
The following predicates CAN be matched to an index seek operation:

=,>,>=,<,<=, and BETWEEN, and some LIKE conditions such as LIKE 'Search%'

Nonsargable Conditions
An index cannot be used to seek in the following predicates:

<>,!=,!>,!<, NOT EXISTS, NOT IN, and N0T LIKE IN, OR, and some
LIKE conditions such as LIKE '%Search'

So, basically you should use this:

WHERE Name LIKE 'Ice%'
WHERE ProductID = 771
WHERE UnitPrice < 3.975
rather than this:
WHERE ID IN (51825, 51826, 51827, 51828)
WHERE Name LIKE '%Ice%'
WHERE LEFT(Name,3) ='Ice'
WHERE ABS(ProductID) = 771
WHERE ProductID <> 771
WHERE ProductID NOT IN (771,772)
WHERE UnitPrice + 1 < 3.975

Also, we shouldn’t use functions in the left side (the field side). Functions on WHERE clause columns also hurt query performance for the same reasons as mentioned before.

So, we should use this

WHERE Startdate >= '20160501' and Startdate < '20160601'

rather than

WHERE MONTH(Startdate)=5 AND year(Startdate)=2016

You might want to see more details about it in the links below.

Forcing a parallel query execution plan

Sometimes you might need to convince the query optimizer to generate a parallel execution plan to make your query faster. There is an undocumented trace flag (8649) that will help you with that.

Take a look the example below.

SELECT top (1) FirstName
FROM [Person].[Person] AS P
INNER JOIN [Person].[PersonPhone] AS H
    ON P.BusinessEntityID = H.BusinessEntityID

Now, the same query with QUERYTRACEON 8649.

SELECT top (1) FirstName
FROM [Person].[Person] AS P
INNER JOIN [Person].[PersonPhone] AS H
 ON P.BusinessEntityID = H.BusinessEntityID

I cannot recommend you use it directly in production unless advised by Microsoft, but you might like to use it on a test system as an extreme last resort, perhaps to generate a plan guide or USE PLAN hint for use in production (after careful review).”

If you need more details about that, check this link.

Why queries run so slow and my SSMS queries are so fast?

Many times I face questions from development teams “Why queries run so slow and my SSMS queries are so fast?”.

The most common reason of this behavior is about the option ARITHABORT. defaults are different than SSMS settings:

Your application connects with ARITHABORT OFF, but when you run the query in SSMS, ARITHABORT is ON and thus you will not reuse the cache entry that the application uses, but SQL Server will compile the procedure anew, sniffing your current parameter values, and you may get a different plan than from the application. So there you have a likely answer to the initial question of this article. There are a few more possibilities that we will look into in the next chapter, but the by far most common reason for slow in the application, fast in SSMS in SQL 2005 and later is parameter sniffing and the different defaults for ARITHABORT.


This and many more issues can be fixed if you read “Slow in the Application, Fast in SSMS? Understanding Performance Mysteries” by Erland Sommarskog.

You can check the ARITHABORT option, running the query below:

IF ( (64 & @@OPTIONS) = 64 ) SET @ARITHABORT = 'ON';

Good reading!