harriyott.com

Friday, January 13, 2006

SQL Server performance tips

A colleague of mine has been looking at SQL Server performance, and come up with a great set of tips (mostly gleaned from this website):

Does your SQL statement have a WHERE clause?

I know this sounds obvious, but don't retrieve more data than you need. However, less obvious is that even if your SELECT statement retrieves the same quantity of data without a WHERE clause, it may run faster with one.

Is SELECT DISTINCT being used properly?

Again, pretty obvious, but using SELECT DISTINCT where no duplicate records are being returned is an unnecessary performance hit. If you are getting duplicate records, first double check your table joins as this is often the cause and only use the DISTINCT clause if you really need it.

Are you using UNION instead of UNION ALL?

A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it is much quicker.

Are your stored procedures prefixed with 'sp_'?

Any stored procedures prefixed with 'sp_' are first searched for in the Master database rather than the one it is created in. This will cause a delay in the stored procedure being executed.

Are all stored procedures referred to as dbo.sprocname?

When calling a stored procedure you should include the owner name in the call, i.e. use EXEC dbo.spMyStoredProc instead of EXEC spMyStoredProc.

Prefixing the stored procedure with the owner when executing it will stop SQL Server from placing a COMPILE lock on the procedure while it determines if all objects referenced in the code have the same owners as the objects in the current cached procedure plan.

Are you using temporary tables when you don't need to?

Although there is sometimes a benefit of using temporary tables, generally they are best eliminated from your stored procedure. Don't assume that retrieving data multiple times is always less efficient than getting the data once and storing it in temporary table as often it isn't. Consider using a sub-query or derived table instead of a temporary table (see examples below). If you are using a temporary table in lots of JOINS in you stored procedure and it contains loads of data, it might be beneficial to add an index to your temporary table as this may also improve performance.

An example of a derived table instead of a temporary table

SELECT COLUMN1, COLUMN2, COUNTOFCOL3
FROM A_TABLE A
INNER JOIN (SELECT COUNT(COLUMN3) AS COUNTOFCOL3, COLUMN2
FROM B_TABLE B
INNER JOIN C_TABLE C ON B.ID = C.ID) ON A.ID = B.ID

Are you using Cursors when you don't need to?

Cursors of any kind slow down SQL Server's performance. While in some cases they are unavoidable, often there are ways to remove them from your code.

Consider using any of these options instead of using a cursor as they are all faster:
  • Derived tables
  • Sub-queries
  • CASE statements
  • Multiple queries
  • Temporary tables
Are your Transactions being kept as short as possible?

If you are use SQL transactions, try to keep them as short as possible. This will help db performance by reducing the number of locks. Remove anything that doesn't specifically need to be within the transaction like setting variables, select statements etc.

Is SET NO COUNT ON being used?

By default, every time a stored procedure is executed, a message is sent from the server to the client indicating the number of rows that were affected by the stored procedure. You can reduce network traffic between the server and the client if you don't need this feature by adding SET NO COUNT ON at the beginning of your stored procedure.

Are you using IN or NOT IN when you should be using EXISTS or NOT EXISTS?

If you are using IN or NOT IN in a WHERE clause that contains a sub-query you should re-write it to use either EXISTS, NOT EXISTS or perform a LEFT OUTER JOIN. This is because particularly the NOT IN statement offers really poor performance. The example below probably better explains what I mean:

e.g. This SQL statement:

SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE A_TABLE.COLUMN2 NOT IN (SELECT A_TABLE2.COLUMN2
FROM A_TABLE2)

Could be re-written like this:
SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE NOT EXISTS (SELECT A_TABLE2.COLUMN2
FROM A_TABLE2
WHERE
A_TABLE.COLUMN2 = A_TABLE2.COLUMN2)

Do you have a function that acts directly on a column used in a WHERE clause?

If you apply a function to a column used in the WHERE clause of your SQL statement, it is unlikely that the SQL statement will be able to make use of any indexes applied to that column.

e.g.
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE SUBSTRING (FIRSTNAME,1,1) = 'm'

Could be re-written:
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE FIRSTNAME LIKE = 'm%'
Where you have a choice of using the IN or BETWEEN clauses

Use the BETWEEN clause as it is much more efficient

e.g. This SQL statement:

SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER IN (100, 101, 102, 103)

Should be re-written like this:
SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER BETWEEN 100 AND 103

Are you doing excessive string concatenation in your stored procedure?

Where possible, avoid doing loads of string concatenation as it is not a fast process in SQL Server.

Have you checked the order of WHERE clauses when using AND?

If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written (assuming that no parenthesis have been used to change the order of execution). You may want to consider one of the following when using AND:
  • Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.
  • If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.
Have you checked that you are using the most efficient operators?

Often you don't have much of a choice of which operator you use in your SQL statement. However, sometimes there is an alternative way to re-write your SQL statement to use a more efficient operator. Below is a list of operators in their order of performance (with the most efficient first).

  • =
  • >, >=, <, <=
  • LIKE
  • <>



33 Comments:

Anonymous James Crowley said...

Nice! *bookmarked* :-)

January 14, 2006 7:00 PM  
Anonymous Simon said...

Glad you liked it - I knew someone would find it useful.

January 14, 2006 7:32 PM  
Blogger Björn said...

Very good article!
Regarding the difference between IN and EXISTS you can check out one of my blog posts: SQL: IN and EXISTS

July 06, 2006 3:33 PM  
Blogger Simon said...

Thanks Bjorn, interesting post.

July 08, 2006 11:30 AM  
Anonymous Steve said...

This is a really nice list. Concise and clearly written. I'll be referring to it regularly! Thanks!

July 11, 2006 11:54 AM  
Blogger Simon said...

Glad you like it Steve.

July 11, 2006 7:28 PM  
Anonymous Anonymous said...

I'd be inclined to add some points about making sure that searched columns are indexed, and ensure that the query plans shown in query analyser do not contain thick lines. If they do you can talk about forcing join types or introducing temp tables (or in-place sub queries) to improve performance.

July 12, 2006 10:46 AM  
Blogger Simon said...

I didn't know about the thick lines - I'll check that out. Thanks!

July 13, 2006 9:02 PM  
Anonymous rid00z said...

Dont forget to include Sub selects. They are performance killers.

July 13, 2006 11:36 PM  
Anonymous Anonymous said...

I'm not convinced about any impact of the order of where clauses. I'm sure the optimisor decides the best table access path - joins - (and thus which indexes and where clause filters may be processed at the same time) regardless of the where clause order. I've tested query plans after changing the order and it doesn't seem to impact it....

July 18, 2006 5:56 PM  
Anonymous Aniel said...

Hi can you take a look at this and let me know if there is a reason why im getting duplicate records;

SELECT
tEmployee.EmpCode,
tPerson.FirstName,
tPerson.LastName,
tPersonSensitive.BirthDate,
PerCon.Email as EmailAddress,
tEmployee.HireDate as DateHired,
vdsEmpWorkStatus.StatusEmpCode as Status,
tPersonSecondEd.Location as SecEdLocation,
tPersonSecondEd.Institute as SecEdSchool,
tPersonSecondEd.SEdLevID as SecEdLevel,
tPersonSecondEd.Completed as SecEdCompleted,
tEmployee.ActualLeaveDate as RetireDate,
tPerson.SK_MobComment as MobilityPref,
tMobilityText.Code as Mobility,
tSK_EmpSuccesData.EarliestMoveDate,
tSK_EmpSuccesData.InTop250,
tSK_EmpSuccesData.LatestMoveDate,
tSK_EmpSuccesData.RatingDate as PotentialRatingDate,
vRetentionRisk.Code as RiskOfLoss,
PerCon.WorkPhone,
tEmpIndivPref.CareerPreference as CareerAspirations,
vEmpSlotPosSuperAll.Title as HRISTitle,
POrgUnitBasic.UnitCode_2 as BusinessUnit,
vEmpSlotPosSuperAll.SupervisorEmpCode as MgrEmpCode,
vEmpSlotPosSuperAll.SupervisorFullName as MgrFullName,
vEmpSlotPosSuperAll.SupervisorTitle as MgrTitle,
POrgUnitBasic.UnitCode_3 as Division,
vPosition.SK_Region,
vPosition.SK_District,
vPosition.SK_EnglishTitle,
vPosition.SK_Department as Department,
vPosition.Description as PositionDesc,
tEmployee.ETWeb9_EmpID as EmpID,
vEmpSlotPosSuperAll.MiddleName as MiddleInitial,
vPosition.SK_Location as CityLocation,
vEmpSlotPosSuperAll.DateAppointed,
tSK_EmpSuccesData.RatingOfPotentialID as Potential,
tSK_EmpCurManagLev.CurrMgmtlevelID as CurrMgmtLevel,
tSK_EmpSuccesData.HiPotential as EmpHiPotential

FROM
tEmployee INNER JOIN tPerson
ON tEmployee.PID = tPerson.PID INNER JOIN
tPersonSensitive
ON tEmployee.PID = tPersonSensitive.PID INNER JOIN
tPersonContact PerCon
ON tEmployee.PID = PerCon.PID INNER JOIN
vdsEmpWorkStatus
ON tEmployee.PID = vdsEmpWorkStatus.PID INNER JOIN
tPersonSecondEd
ON tEmployee.PID = tPersonSecondEd.PID INNER JOIN
vEmpSlotPosSuperAll
ON tEmployee.PID = vEmpSlotPosSuperAll.PID INNER JOIN
tSK_EmpCurManagLev
ON tEmployee.PID = tSK_EmpCurManagLev.PID INNER JOIN
tSK_EmpSuccesData
ON tEmployee.PID = tSK_EmpSuccesData.PID INNER JOIN
tEmpIndivPref
ON tEmployee.PID = tEmpIndivPref.PID INNER JOIN
vdsSlotEmpPosition
ON tEmployee.PID = vdsSlotEmpPosition.PID LEFT JOIN
vdsPosOrgUnitBasic POrgUnitBasic
ON vdsSlotEmpPosition.PosID = POrgUnitBasic.PosID LEFT JOIN
vPosition
ON vdsSlotEmpPosition.Title = vPosition.Title LEFT JOIN
tMobilityText
ON tPerson.SK_MobileID = tMobilityText.MobileID LEFT JOIN
vRetentionRisk
ON vRetentionRisk.RetRiskID = tSK_EmpSuccesData.RiskOfLossID

WHERE
(tEmployee.EmpCode LIKE 'QB%')

October 04, 2006 10:26 AM  
Anonymous Simon said...

Hi Aniel,

I can't really tell without looking at the data, but I'm guessing that one of the tables has two entries for a foreign key, such as an employee having more than one entry in (for example) tPersonSecondEd.

October 04, 2006 11:15 AM  
Anonymous Aniel said...

Hi Simon,

Ok well im working with data that may be duplicated in my database, but if the client database doesnt then im assuming they wont see duplicate records. The problem is that i get almost 500,000 records returned even though there is only 40,000 in my DB. I think my query is duplicating the record for every join in using, any clues as to why this would be?

October 04, 2006 12:05 PM  
Anonymous Simon said...

Well, the first step would be to remove the duplicates from your database. Then try using a cut down version of your query, maybe just one join. If that works ok, then add another join, and keep going until you find the problem. It may be the left joins, but I'm not that good at spotting things from "just looking" - I need to have a proper rummage around with the data and try out some things.

October 04, 2006 12:40 PM  
Anonymous Aniel said...

Thanks for the feedback, im applying one join at a time and seem to be working so far, 4 tables to go!! Its returning the right amount each time, computers u gotta love em ;)

October 04, 2006 1:09 PM  
Anonymous Anonymous said...

Hi Simon,
Excellent Article.... Keep it up. Could you please tell me why is it not advisable to use a function in a WHERE clause? Also could you provide me with link(s) that explains well the Sql Server Stored Proc execution plan as well as the steps Sql server 2000 engine performs to creates, caches and reuses an execution plan (a detailed explanation).
Thanks: Subhadip

October 04, 2006 2:29 PM  
Anonymous Anonymous said...

Hi,
Nice article, please keep updating if you can.

I had one query, if you can help - If a Table has an index on a field which would be better/optimizer:
1. Using 2 statements
- Select * from Table where indexed_field = A
- Select * from Table where indexed_field = A

OR
2. Select * from Table where indexed field = A or indexed_field = B

(Assuming about 500,000 records in table with varied indexed_field values).

Thank you!

October 04, 2006 6:34 PM  
Anonymous Anonymous said...

In 1. above second statement should be = B

October 04, 2006 6:35 PM  
Anonymous Simon said...

Subhadip, the reason that functions in a WHERE clause may not do well is if an index is applied to the column being used in the function. The function may not take the index into account.

As for the links to the execution plan details, I'm sorry, but I don't have any.

Anonymous, it depends on what you're intending to do with the results, and whether you're going to use the data together. If As and Bs populate different datasets then separating them may be better, otherwise keeping them in the same statement will probably be more convenient for processing.

October 05, 2006 8:39 PM  
Anonymous Anonymous said...

Nice. Antother tip: Using UNION instead of OR in WHERE-clause can also be positive for the performance.

November 01, 2006 2:52 PM  
Anonymous Anonymous said...

Wonderful Article... Thanks a lot

November 11, 2006 6:14 AM  
Anonymous Nandu said...

Wonderful Article... Thanks a lot

November 11, 2006 6:14 AM  
Anonymous LS said...

I suppose the Query will use the index even if we use functions in the WHERE clause. The catch here is ... that while building INDEX you will have to build it using the function.

Hence in the example below:

SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE SUBSTRING (FIRSTNAME,1,1) = 'm'

we could build an INDEX on firstname with SUBSTRING (FIRSTNAME,1,1) function and rest assured it will be used in the above query.

I chanced upon this funda a few years back when I was working on ORACLE8i. I'am sure SQL SERVER also does the same.

Simon your expert comments solicited....

December 14, 2006 2:25 PM  
Anonymous Anonymous said...

For the second last optimization given (Have you checked the order of WHERE clauses when using AND?), we can extend the rule to WHERE and OR clauses; the cluase most likely to be true should be placed first so that other OR clauses do not need to be evalutated.

February 23, 2007 7:14 PM  
Anonymous Anonymous said...

Hi Simon,

how to write a query to give an addition of a decimal column of successive rows. E.g.
1

4

8

10

15

Output should

1

5

12

18

25

August 29, 2007 9:26 AM  
Anonymous Anonymous said...

Hi Simon,
Write a query to delete rows from table A not present in table B based on relationship between then the two tables on Column C.

August 29, 2007 9:27 AM  
Anonymous commonjunks said...

DELETE FROM [A]
WHERE NOT EXISTS (SELECT * FROM [B] WHERE [B].C=[A].C)


>> Hi Simon,
>> Write a query to delete rows from table A not present in table B based on relationship between then the two tables on Column C.

December 11, 2007 7:52 PM  
Anonymous commonjunks said...

Just an Idea
Suppose you have table [A], Do
--Suppose you have Table A where you want to perform your desired operation at column "Counter"
--Skip follow line if you have Identity Column
ALTER TABLE [A] ADD ID INT IDENTITY(1,1)

DECLARE @CCounter INT
DECLARE @LCounter INT
SET @LCounter=0
DECLARE @NREC INT
DECLARE @CREC INT
SELECT @NREC=COUNT(*) FROM [A]
SET @CREC=1
WHILE @CREC<=@NREC
BEGIN
SELECT @CCounter = ISNULL(Counter,0) FROM [A] --ISNULL Incase if our value is NULL
IF @CREC>1
SELECT @CCounter+ @LCounter
ELSE
SELECT @CCounter
SET @LCounter = @CCounter
SET @CREC=@CREC+1
END

ALTER TABLE [A] DROP COLUMN ID

Hope this help.


>> Hi Simon,
>>
>> how to write a query to give an addition of a decimal column of successive rows. E.g.
>> 1
>>
>> 4
>>
>> 8
>>
>> 10
>>
>> 15
>>
>> Output should
>>
>> 1
>>
>> 5
>>
>> 12
>>
>> 18
>>
>> 25

December 11, 2007 8:01 PM  
Anonymous commonjunks said...

opps
change
SELECT @CCounter = ISNULL(Counter,0) FROM [A] --ISNULL Incase if our value is NULL
to
SELECT @CCounter = ISNULL(Counter,0) FROM [A] WHERE ID=@CREC

December 11, 2007 8:03 PM  
Anonymous Anonymous said...

Really good article !! It solved my performance issue with UNION.

January 10, 2008 1:39 PM  
Blogger XanderZe said...

This post has been removed by the author.

April 05, 2008 7:41 PM  
Blogger XanderZe said...

Good article.

I've created a factsheet for SQL server developers that fits on one A4 (printed on both sides). You can laminate it and keep it on your desk.

http://www.dotnet4all.com/snippets/2008/04/factsheet-for-sql-server-developers.html

April 05, 2008 7:43 PM  
Blogger Simon said...

Hey Xander,

That's a brilliant idea, and very nicely executed. I love the colour coding for the data type sizes - inspired. Thanks for linking.

Simon

April 05, 2008 8:04 PM  

Post a Comment

Links to this post:

Create a Link

<< Home