A commonly executed stored procedure had an average duration > 1200 ms as shown by SQL Server Profiler. That stored proc had a moderately complex SQL statement with 5 JOIN's that was being executed with an EXEC(@sql) statement (see below). By studying the Query Analyzer graphical execution plan, I realized that the estimates the query optimzer was making for the number of rows returned by the Orders table, the driving table of the query, was way off. As you can see by the results below, the estimate was 212,003 rows whereas the actual rows returned when the query ran was only 262. This estimate caused an unneeded high cost index scan and hash join of the OrderDetails table and other tables joined to the Orders table. I factored out the SELECT with only the Orders table. You might be surprised that the optimizer could not do a better job even on the much simplified query. When the optimizer formulates the execution plan, it doesn't have a good estimate of what the date range generated by the GETDATE and DATEADD functions will be. When the sp_executesql formulation is used, the Orders table row count estimate is close enough. The optimizer has actual values for the datetime range. It turns out that if you use the @dateStart and @dateEnd parameters in the query, but don't use sp_executesql, the estimates will still be bad and so will the query plan. UPDATE STATISTICS was tried with FULL_SCAN but didn't change any of the estimates. The trick to solving the performance puzzle is to know what information the optimizer has at the time it formulates the execution plan.
I used my findings to reformulate the complex SELECT statement in the stored proc. The result was that the stored proc execution duration dropped to 16 ms. This was a great improvement - happy client. I had considered a query hint as a last resort. Since the Orders table result set was actually a small number of rows, I could have used a join hint to force an INNER LOOP JOIN to the OrderDetails table instead of the index scan and hash JOIN. And, I would have also had to add the query hint OPTION(MAXDOP 1) to force the use of 1 CPU (another story). But, I am extremely adverse to using query hints. Substituting date strings into the complex query with joins would improve the estimate, but the optimizer would have to redo the plan for each set of dates. This is because simple parameter substitution can only take place when a query doesn't have joins. When sp_executesql is used, the query plan in the cache will be reused because the optimizer knows what the parameters are. But, be aware that using sp_executesql with parameters instead of EXEC is not the right solution 100% of the time. It can produce worse performance in situations where a variety of execution plans are needed depending on actual parameter values.
Yes, carefully chosen indexes do solve the majority of performance problems. But, in this case a minor T-SQL rewrite was needed. In practice only a few queries usually need to be rewritten (no matter how ugly some may appear to be). The trick is to first figure out which queries are chewing up reads and cpu using SQL Server Profiler. Then try to reduce disk I/O and CPU using indexing techniques. If that doesn't work, then approaches such as the rewrite which I have described need to be evaluated. Hey, if SQL Server performance tuning was too simple, I wouldn't have a consulting job and SQL Server Index Tuning Wizard might actually work!
----------------------------------------------------
Query optimizer estimate of row count: 212,003 Actual count: 262
SELECT ORD.id FROM dbo.Orders ORD 
WHERE ORD.insertDT BETWEEN DATEADD(d, -3, GETDATE()) AND GETDATE()
----------------------------------------------------
declare @dateStart datetime, @dateEnd datetime,  @sql as nvarchar(1000)
select @dateStart = DATEADD(d, -3, GETDATE()) , @dateEnd = GETDATE()
set @sql = N'SELECT ORD.id FROM dbo.Orders ORD WHERE ORD.insertDT BETWEEN @dateStart AND @dateEnd'
--Query optimizer estimate of row count: 364 Actual count: 262
SELECT ORD.id FROM dbo.bo_Orders ORD 
WHERE ORD.insertDT BETWEEN '2004-11-06 05:26:45.077' AND '2004-11-09 05:26:45.077'
--Query optimizer estimate of row count: 63,601 Actual count: 262
SELECT ORD.id FROM dbo.bo_Orders ORD 
WHERE ORD.insertDT BETWEEN @dateStart AND @dateEnd
--Query optimizer estimate of row count: 106 Actual count: 262
EXEC sp_executesql @sql,N'@dateStart datetime, @dateEnd datetime',@dateStart=@dateStart, @dateEnd=@dateEnd
----------------------------------------------------
Here is the original high cost query that was in the stored proc:
SELECT ITEM.id, ITEM.number, ITEM.name
FROM Orders ORD
JOIN OrderDetails DET on DET.OrdersID = ORD.id
JOIN Items ITEM on ITEM.id = DET.ItemsID
JOIN Items_CategoryItems CATITEM on CATITEM.ItemsID = ITEM.id
JOIN Items_Categories CAT on CAT.id = CATITEM.Items_CategoriesID
LEFT JOIN (SELECT ItemsID, retailPrice FROM Items_Custom WHERE retailersID = 9199) as ITEMC on ITEM.id = ITEMC.ItemsID
WHERE ORD.insertDT BETWEEN DATEADD(d, -3, GETDATE()) AND GETDATE()
GROUP BY ITEM.id, ITEM.number, ITEM.name, ITEMC.retailPrice, ITEM.retailPrice
ORDER BY COUNT(ITEM.id) DESC
----------------------------------------------------
It still takes brainpower, tedious work and experimentation to make SQL Server run like the fast racecar that it is. If you have questions about the details of SQL Server performance tuning, please give me a phone call to get accurate answers. I think you will be surprised at how fast SQL Server can run after being tuned by an expert performance consultant. Your customers and management will notice the big difference!
Chris Dickey
858-274-6909
www.TuneSQLServer.com
www.TuneSQL.com