Using Indexed Computed Columns to Improve Performance
I started with a copy of a table from AdventureWorks.
The SELECT statements and the results are:
The first set of tests is in the "Base Index" column. When I compared the
column to a scalar value it did an index seek (1 and 2). Query #3 specified
a case-insensitive search using the COLLATE clause and that resulted in an very
slow index scan. Query #4 converted them both to upper case and then compared
them. This also resulted in a query scan but not nearly as bad. Still
this had a cost ten times higher than the base query. One of the things I
discuss in my performance tuning presentations is that any time you wrap a function
around an indexed column it probably won't use the index efficiently. This
seems to confirm that.
Next I created a computed column with the UPPER function and then built an index on it.
I also tested this by creating a case-insensitve computed column and testing that.
I also wanted to test this on datetime functions to see if I could easily query just on the date portion of the column. The queries I tested are:
Notice that in the script that created the NewContact table I added some minutes
to each ModifiedDate. I also created an index on ModifiedDate. The first
query is the preferred way of selecting one days worth of data. It does an
index seek and is the fastest way to return data. The second query is a little
faster but doesn't return any data. The comparison of a date-only value to
a date with time value with always fail. The third query is the way I see
this type of query written most frequenty. The column is converted to VARCHAR
and then compared to the value. In the fourth query the column is converted
back to datetime.
Next I added a computed column and an index on that column.
The biggest improvement came in Query #4. Since the function around ModifiedDate matched the computed column it used the computed column and its index. It didn't use it in Query #6 where the outer CONVERT was replaced with a CAST function. When test Query #4 but added a bunch of white space into the WHERE clause it still used the index. That tells it isn't doing a simple hash of the text in the WHERE clause. When I changed from VARCHAR(10) to any other length it no longer used the new index. Query #7 also didn't use the new index. The only difference in that query was the outer CONVERT function didn't have a format number.
If you have existing application code that wraps functions around an indexed column and then doesn't use the index you may be able to use this approach to improve those queries. If the functions are consistent you should be able to add computed columns and realize immediate benefits.
USE [AdventureWorks]
GO
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'NewContact'
AND TABLE_SCHEMA = 'Person')
DROP TABLE [Person].[NewContact]
GO
CREATE TABLE [Person].[NewContact] (
[ContactID] [int] PRIMARY KEY NOT NULL,
[Title] [nvarchar](8) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
[FirstName] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,
[LastName] nvarchar(50) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL,
[Suffix] [nvarchar](10) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
[EmailAddress] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CS_AS NULL,
[ModifiedDate] [datetime] NOT NULL DEFAULT (getdate()) )
GO
INSERT INTO [Person].[NewContact] ( [ContactID], [Title], [FirstName],
[LastName], [Suffix], [EmailAddress], [ModifiedDate] )
select ContactID, Title, FirstName, LastName, Suffix, EmailAddress,
ModifiedDate = DATEADD(mi, ContactID, ModifiedDate)
FROM Person.Contact
GO
CREATE INDEX IX_NewContact_LastName ON Person.NewContact(LastName);
GO
CREATE INDEX IX_NewContact_ModifiedDate ON Person.NewContact(ModifiedDate)
GO
This creates a copy of the contacts table with all the NVARCHAR columns set to case-sensitive.
It also creates indexes on the LastName column and DateModified column.
I tested a series of SELECT statements against the table to establish a baseline.
The SELECT statements and the results are:
|
Statement | Base index | Upper Index | CI Index |
1. | SELECT * FROM Person.NewContact WHERE LastName = 'Mcanich' | Index seek, query cost = 0.007, returned one row | Index seek, query cost = 0.007, returned one row | Index seek, query cost = 0.007, returned one row |
2. | SELECT * FROM Person.NewContact WHERE LastName = 'mcanich' | Index seek, query cost = 0.007, returned zero rows | Index seek, query cost = 0.007, returned zero rows | Index seek, query cost = 0.007, returned zero rows |
3. | SELECT * FROM Person.NewContact WHERE LastName = 'MCANICH' COLLATE SQL_Latin1_General_CP1_CI_AI | Index scan, query cost = .127, returned one row | Index scan, query cost = 0.129, returned one row | Index seek of CI index, query cost = 0.007, returned one row |
4. | SELECT * FROM Person.NewContact WHERE UPPER(LastName) = UPPER('McAnich') | Index scan, query cost = 0.08, returned one row | Index seek of "Upper" index, query cost = 0.007, returned one row | Index seek of "Upper" index, query cost = 0.007, returned one row |
5. | SELECT * FROM Person.NewContact WHERE LastNameUpper = UPPER('McAnich') | (Failed) | Index seek of "Upper" index, query cost = 0.007, returned one row | Index seek of "Upper" index, query cost = 0.007, returned one row |
6. | SELECT * FROM Person.NewContact WHERE LastNameCI = 'mcanICh' | (Failed) | (Failed) | Index seek of CI index, query cost = 0.007, returned one row |
Next I created a computed column with the UPPER function and then built an index on it.
ALTER TABLE Person.NewContact
ADD LastNameUpper AS UPPER(LastName)
GO
CREATE INDEX IX_NewContact_LastNameUpper ON Person.NewContact(LastNameUpper)
GO
The results of this are in the "Upper Index" column above. In Query #5 when
we use the computed column it does an index seek on the new index just like I'd hoped
it would. The really interesting result is Query #4. That also uses
the new index and does an index seek even though we aren't using the new computed
column. What the blog post says and what seems to be happening is that SQL
Server is checking for a computed column that matches the WHERE clause. It
finds it, finds that it's indexed and uses the index. Pretty cool if you ask
me!I also tested this by creating a case-insensitve computed column and testing that.
ALTER TABLE Person.NewContact
ADD LastNameCI AS LastName COLLATE SQL_Latin1_General_CP1_CI_AI
GO
CREATE INDEX IX_NewContact_LastNameCI ON Person.NewContact(LastNameCI)
GO
The results of this test are in the "CI Index" column in the table above.
This performed just like the other computed column. Query #6 which explicitly
used the column performed very well. But so did Query #3 which didn't explicitly
use the new column.I also wanted to test this on datetime functions to see if I could easily query just on the date portion of the column. The queries I tested are:
|
Statement | Base index | Date Only Index |
1. | SELECT * FROM Person.NewContact WHERE ModifiedDate >= '5/1/2003' AND ModifiedDate < '5/2/2003' | Index seek, query cost = 0.03, eleven rows returned | Index seek, query cost = 0.03, eleven rows returned |
2. | SELECT * FROM Person.NewContact WHERE ModifiedDate = '5/1/2003' | Index seek, query cost = 0.006, zero rows returned | Index seek, query cost = 0.006, zero rows returned |
3. | SELECT * FROM Person.NewContact WHERE CONVERT(VARCHAR(10), ModifiedDate, 101) = '05/01/2003' | Index scan, query cost = 0.07, eleven rows returned | Index scan, query cost = 0.07, eleven rows returned |
4. | SELECT * FROM Person.NewContact WHERE CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101), 101) = '5/1/2003' | Index scan, query cost = 0.07, eleven rows returned | Index seek on new index, query cost = 0.04, eleven rows returned |
5. | SELECT * FROM Person.NewContact WHERE ModifiedDateOnly = '5/1/2003' | (Failed) | Index seek on new index, query cost = 0.04, eleven rows returned |
6. | SELECT * FROM Person.NewContact WHERE CAST(CONVERT(VARCHAR(10), ModifiedDate, 101) AS DATETIME) = '5/1/2003' | Index scan, query cost = 0.07, eleven rows returned | Index scan, query cost = 0.07, eleven rows returned |
7. | SELECT * FROM Person.NewContact WHERE CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101)) = '5/1/2003' | Index scan, query cost = 0.07, eleven rows returned | Index scan, query cost = 0.07, eleven rows returned |
Next I added a computed column and an index on that column.
ALTER TABLE Person.NewContact
ADD ModifiedDateOnly AS CONVERT(DATETIME, CONVERT(VARCHAR(10), ModifiedDate, 101), 101)
GO
CREATE INDEX XI_NewContact_ModifiedDateOnly ON Person.NewContact(ModifiedDateOnly)
GO
This created a computed column that had the time removed from ModifiedDate.
When I converted back to DATETIME I had to specifiy the format number (101) or SQL
Server complained that the function wasn't deterministic. Deterministic functions
always return the same value from the same input value and database state.
For example, GETDATE() isn't deterministic. The biggest improvement came in Query #4. Since the function around ModifiedDate matched the computed column it used the computed column and its index. It didn't use it in Query #6 where the outer CONVERT was replaced with a CAST function. When test Query #4 but added a bunch of white space into the WHERE clause it still used the index. That tells it isn't doing a simple hash of the text in the WHERE clause. When I changed from VARCHAR(10) to any other length it no longer used the new index. Query #7 also didn't use the new index. The only difference in that query was the outer CONVERT function didn't have a format number.
If you have existing application code that wraps functions around an indexed column and then doesn't use the index you may be able to use this approach to improve those queries. If the functions are consistent you should be able to add computed columns and realize immediate benefits.
No comments :
Post a Comment