Specify ORDER BY if you need ordered data

Some of you guys may not know that, so I’d like to share it.

If you need ordered data and your query does not have an ORDER BY clause, the data might return in any order. Yeah. It’s not guaranteed that it will return based in the order of the “primary key”.
So, you can’t  assume that SQL Server has some “default” order. You might believe there is, because usually when you SELECT a specific table it will use the cluster index(PK) with identity IDs.

But, it’s not always the case. Check the following example:

The following query will run just fine, and “as expected”.

Image
Now, let’s create the following index.

CREATE INDEX Test_RandomInt ON dbo.Test(RandomInt);

The same query executed before is now broken, just because SQL Server has decided to change the index in the execution plan .

Image
I’m not saying that you should start sorting every single query (it’s very expensive performance wise). But, if you really need it, you must provide the ORDER BY clause.

It’s preferable to sort on the app side from a resource perspective, because generally it’s easier to add more servers or add more CPU and Memory to the application layer.  It’s much more difficult, and expensive, to do so for the SQL servers.

Recommend MaxDop settings for SQL Server

The Microsoft SQL Server max degree of parallelism (MAXDOP) configuration option controls the number of processors that are used for the execution of a query in a parallel plan. This option determines the computing and thread resources that are used for the query plan operators that perform the work in parallel. Depending on whether SQL Server is set up on a symmetric multiprocessing (SMP) computer, a non-uniform memory access (NUMA) computer, or hyperthreading-enabled processors, you have to configure the max degree of parallelism option appropriately.

When setting up a new SQL Server, I usually use the following code to determine a good starting point for the MAXDOP setting:

I recommend setting MAXDOP to a setting that makes sense for your server load. Only you can determine the best setting for your environment. This script is just a good way to start.

Error updating temporal tables

Scenario

At this point, you have probably heard about the feature called System Versioned Temporal Tables released on SQL Server 2016. So, let’s imagine that your team has decided to  implement that on a table called “Orders”.

The application is using different access patterns to modify the data in this table. There are direct statements coming from the application or the application is running long batches in explicit transactions where it makes several changes to multiple tables. The update on the “Orders” table is not the first statement in those long batches! So, suddenly you start seeing the following error message:

Data modification failed on system-versioned table “Orders” because transaction time was earlier than period start time for affected records.

You can simulate the error message running the following script:

CREATE TABLE dbo.Orders
(
[OrderId] INT NOT NULL PRIMARY KEY CLUSTERED
, [OrderValue] DECIMAL(19,4)
, [ValidFrom] DATETIME2 (2) GENERATED ALWAYS AS ROW START
, [ValidTo] DATETIME2 (2) GENERATED ALWAYS AS ROW END
, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.OrdersHistory));
GO
INSERT dbo.Orders ([OrderId], [OrderValue])
VALUES (1, 9.99), (2, 9.99);
GO
SELECT * FROM dbo.Orders;
GO

--Run first query
BEGIN TRAN
WAITFOR DELAY '00:00:15';
UPDATE dbo.Orders
SET [OrderValue] = [OrderValue] + 1;
COMMIT TRAN

--Run Query 2 in another session
BEGIN TRAN
UPDATE dbo.Orders
SET [OrderValue] = [OrderValue] + 1;
COMMIT TRAN

 

But, what’s the problem here?

The system generated start and end times are keyed to the server’s system time at the moment the BEGIN TRANSACTION is executed.

The times recorded in the system datetime2 columns are based on the begin time of the transaction itself. For example, all rows inserted within a single transaction will have the same UTC time recorded in the column corresponding to the start of the SYSTEM_TIME period.

Is that a bug?

No, this is a standard behavior of System Versioned Temporal Tables.
More info: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-2017#how-does-temporal-work

Workarounds

  • Build in exception handling that can retry the transaction.
  • Use triggers instead of Temporal Tables
  • Wait for Microsoft to build/release the Application-time feature (that is part of the Standard). More info: https://en.wikipedia.org/wiki/SQL:2011

SQL Server Concepts and Best Practices [Part 1]

Hi Folks,

It’s been a while that I don’t post anything here, so it’s time to start a series of SQL Server concepts and Best Practices. Most of them, I’ve already shared in my repository on GitHub.
So, take a peek when you get a chance. https://github.com/danilocbraga/SQLServer

MAX vs Top 1 – which one is better?

When there is a clustered index on the table and the column to be queried, both the MAX() operator and the query SELECT TOP 1 will have almost identical performance.

When there is no clustered index (Heap) on the table and the column to be queried, the MAX() operator offers the better performance. So, consider using MAX rather than SELECT TOP 1.

CTE vs #temp table

CTEs…

* Are unindexable (but can use existing indexes on referenced objects)
* Cannot have constraints
* Are essentially disposable VIEWs
* Persist only until the next query is run
* Can be recursive
* Do not have dedicated stats (rely on stats on the underlying objects)

Temp Tables…

* Are real materialized tables that exist in tempdb
* Can be indexed
* Can have constraints
* Persist for the life of the current CONNECTION
* Can be referenced by other queries or subprocedures
* Have dedicated stats generated by the engine

As far as when to use each, they have very different use cases. If you will have a very large result set, or need to refer to it more than once, put it in a #temp table. If it needs to be recursive, is disposable, or is just to simplify something logically, a CTE is preferred. Also, a CTE should never be used for performance. You will almost never speed things up by using a CTE, because, again, it’s just a disposable view. You can do some neat things with them but speeding up a query isn’t really one of them.

When to use char/varchar/nchar/nvarchar?

  • nchar and nvarchar can store Unicode characters.
  • char and varchar cannot store Unicode characters.
  • char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don’t use up all that space.
  • varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar. nchar and nvarchar will take up twice as much storage space, so it may be wise to use them only if you need Unicode support.

It is worth highlighting the importance to keep your code (variables) using the same data type definition as the column definition. Mismatch data types will probably result in Non-Sargable queries.

Using the same column name in different tables but with different data types

Probably anybody will assume that columns with the same name in different tables have the same data type. As a result, they won’t verify data types. Different types is an accident waiting to happen.

Check that before creating a new column with the same name:

SELECT sh.name+'.'+o.name AS ObjectName,
s.name as ColumnName,
CASE WHEN t.name IN ('char','varchar') THEN t.name+'('+CASE WHEN s.max_length<0 then 'MAX' ELSE CONVERT(varchar(10),s.max_length) END+')'
WHEN t.name IN ('nvarchar','nchar') THEN t.name+'('+CASE WHEN s.max_length<0 then 'MAX' ELSE CONVERT(varchar(10),s.max_length/2) END+')'
WHEN t.name IN ('numeric') THEN t.name+'('+CONVERT(varchar(10),s.precision)+','+CONVERT(varchar(10),s.scale)+')'
ELSE t.name END AS DataType
,CASE WHEN s.is_nullable=1 THEN 'NULL' ELSE 'NOT NULL' END AS Nullable     
FROM sys.columns s
INNER JOIN sys.types t ON s.system_type_id=t.user_type_id and t.is_user_defined=0
INNER JOIN sys.objects o ON s.object_id=o.object_id
INNER JOIN sys.schemas sh on o.schema_id=sh.schema_id
WHERE O.name IN (select table_name from information_schema.tables)
--AND s.name = 'NewColumn'
ORDER BY sh.name+'.'+o.name,s.column_id

Float and real datatypes

Why we should not use float and real datatypes to store money in SQL Server? The main problem with them is that they are approximate numerics, so they don’t store exact values.

Take a look at this:

image2017-6-29 10-35-6

The float and real data types are known as approximate data types. The behavior of float and real follows the IEEE 754 specification on approximate numeric data types. Approximate numeric data types do not store the exact values specified for many numbers; they store an extremely close approximation of the value. For many applications, the tiny difference between the specified value and the stored approximation is not noticeable. At times, though, the difference becomes noticeable. Because of the approximate nature of the float and real data types, do not use these data types when exact numeric behavior is required, such as in financial applications, in operations involving rounding, or in equality checks. Instead, use the integer, decimal, money, or smallmoney data types. https://technet.microsoft.com/en-us/library/ms187912(v=sql.105).aspx

Decimal is always a better choice even compared to money datatype, where we might have issues in cases of division.

image2017-6-29 10-35-47

I hope I could help you somehow.
See you soon!

 

Determining a setting for Cost threshold for parallelism

The cost threshold for parallelism option specifies the threshold at which SQL Server creates and runs parallel plans for queries. SQL Server creates and runs a parallel plan for a query only when the estimated cost to run a serial plan for the same query is higher than the value set in cost threshold for parallelism.

The default is set to 5, which is a really low setting. Although there is no right and wrong number, I’d recommend starting with this setting at 50 or so and tuning up or down as appropriate.

The following script returns a list of parallel query plans and their subtree costs which will help you decide an appropriate number.

SQL Server ignores the cost threshold for parallelism value under the following conditions:

  • Your computer has only one processor.
  • Only a single CPU is available to SQL Server because of the affinity mask configuration option.
  • The max degree of parallelism option is set to 1.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
core AS (
SELECT
eqp.query_plan AS [QueryPlan],
ecp.plan_handle [PlanHandle],
q.[Text] AS [Statement],
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS OptimizationLevel ,
ISNULL(CAST(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float),0) AS SubTreeCost ,
ecp.usecounts [UseCounts],
ecp.size_in_bytes [SizeInBytes]
FROM
sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(ecp.plan_handle) AS eqp
CROSS APPLY sys.dm_exec_sql_text(ecp.plan_handle) AS q
CROSS APPLY query_plan.nodes ('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn ( n )
)

SELECT TOP 100
QueryPlan,
PlanHandle,
[Statement],
OptimizationLevel,
SubTreeCost,
UseCounts,
SubTreeCost * UseCounts [GrossCost],
SizeInBytes
FROM
core
ORDER BY
--GrossCost DESC
SubTreeCost DESC

Fore more details: Cost Threshold for Parallelism Option

Finding Implicit Column Conversions in the Plan Cache

I’m just sharing with you guys a script to find all implicit data type conversions associated with mismatched data types in a specific database.
It’s pretty handy when you are either developing or coding reviewing.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED 

DECLARE @dbname SYSNAME
SET @dbname = QUOTENAME(DB_NAME()); 

WITH XMLNAMESPACES
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
   stmt.value('(@StatementText)[1]', 'varchar(max)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)'),
   t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)'),
   ic.DATA_TYPE AS ConvertFrom,
   ic.CHARACTER_MAXIMUM_LENGTH AS ConvertFromLength,
   t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo,
   t.value('(@Length)[1]', 'int') AS ConvertToLength,
   query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t)
JOIN INFORMATION_SCHEMA.COLUMNS AS ic
   ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)')
   AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)')
   AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)')
WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1

More details here: http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/01/08/finding-implicit-column-conversions-in-the-plan-cache.aspx

Clearing a Single Plan from Cache

Most of you guys know that you can clear all cached plans using DBCC FREEPROCCACHE.
In SQL Server 2008 you are able to clean a single pan from cache. The query below helps you to find the plan.

SELECT st. text, qs.query_hash,'DBCC FREEPROCCACHE(' ,plan_handle ,')'
FROM sys .dm_exec_query_stats qs
CROSS APPLY sys .dm_exec_sql_text( qs. sql_handle) st
WHERE st. text LIKE 'SELECT * FROM Person.Address%' -- your query

There is full example in the Books Online, so you might want to read the following link:
http://msdn.microsoft.com/en-us/library/ms174283.aspx

Use DBCC FREEPROCCACHE to clear the plan cache carefully. Freeing the plan cache causes, for example, a stored procedure to be recompiled instead of reused from the cache. This can cause a sudden, temporary decrease in query performance. For each cleared cachestore in the plan cache, the SQL Server error log will contain the following informational message: "SQL Server has encountered %d occurrence(s) of cachestore flush for the ‘%s’ cachestore (part of plan cache) due to ‘DBCC FREEPROCCACHE’ or ‘DBCC FREESYSTEMCACHE’ operations." This message is logged every five minutes as long as the cache is flushed within that time interval.

You can also clear all the plans for one particular database from cache. You can use the command:

DBCC FLUSHPROCINDB(<db_id>);

In the next post I will talk about how to force a specific execution plan.
I know that it’s a old subject, but I still see many people asking about it in the SQL Server community.