SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

August 2011 - Posts

Unleashing my inner Viking, SQLBits and more…

In case you didn't know SQLBits 9 ,”Query across the Mersey”, is taking place in Liverpool from September 21st to October 1st.  3 days of the usual excellent quality presentations, networking interspersed with the odd beer or two should all make for an informative and entertaining conference.  Rumor has it that a) Places are going fast and b) That Chris Testa-o'neill’s  (blog|twitter) band will be supporting the Fab Beatles for the thursday night entertainment,  if it is half as much fun as the “Crappy Code Games” held by Fusion IO in Brighton then it will be a night to remember.

I shall be speaking on Saturday on the subject of “Whats new in Denali : TSQL “,  so this will be the new language enhancements that I will be spending the next X years of my career writing as well as meta-data discovery etc….

Immediately following SqlBits, the UK Sql Server community is coming together to hold the first , hopefully, of many,  SQL-Relay.  13 SqlServer user groups are running across the country , sign up to your local one.  More details of dates, locations and confirmed speakers are at SqlServerFAQ.

8925046_s 

Finally, I shall be “releasing my inner viking” by speaking at SQLRally Nordic on 8th and 9th of November in Bålsta, Sweden.  This will be my first international speaking engagement and I shall be delivering my “Cursor are Evil” presentation.  Again, there is an excellent line-up of speakers and sessions, including members of the SQLCat team and other Microsofties, MVPs and other recognized speakers.  Another cracking event, that I hope to see you at.

NYA knappar 200x150-01

Rolling your own index lookups

Over the past few weeks, I’ve been training developers in what they really need to know about SQL Server to produce efficient and robust SQL code.  One of the areas we looked at was the “Index Lookup” aka “Bookmark Lookup”.  An index lookup occurs when a non-clustered index seek is used to find the required rows, but to fully resolve the query data that is not held in the index is needed, and so a “Lookup” of the row in the table data is instigated.  If the table has a clustered index then the clustered key is used to drive the lookup, else a rowid is used.

If you execute this :

select LastName,BusinessEntityID
from Person.Person
where LastName = 'Krane'

Then the execution plan will be

image

The query has been fully resolved using the non-clustered index “IX_Person_LastName_FirstName_MiddleName”.  BusinessEntityId is the clustering key,  this means that it will be implicitly included within any non-clustered index on Person.Person.

If we add “FirstName” to our selected columns , the plan will be

 image

Exactly the same as before, the FirstName column has been returned by using the data within the index.  However, if we add the column ‘RowGuid’ into our query:

select LastName,Firstname,BusinessEntityID,rowguid
from Person.Person
where LastName = 'Krane'

image

We have an additional Key Lookup Operator added.  The data for RowGuid is not contained within the index itself and so the engine has had to find and pull it from the base table data.

The question that I was asked about this is “Why is the Lookup so slow ?”.  The answer to that is, “Its not, you just have to appreciate the amount of work that is required”.  The point is that by looking up in the non-clustered index,  you may well find that all the required data is on a single 8k page.  Let us assume that you have 100 rows of data on this one single page.  Now, for each of those rows the engine will have to instigate a lookup.  So, as there are 100 rows thats 100 lookups,  the engine will now have to find and read 100 pages of table data.  So, ‘slow’ it is not, doing a lot more work it is.

The way to mitigate this lookup overhead is to create a “covering” index.  This means adding the required data, in the case above rowguid, explicitly into the index and so avoiding the lookup.  In 2005, the INCLUDE clause was added to SQLServer, so that although the data is held within the index,  it does not form part of the key and so the overhead of maintaining the column with the key itself is avoided.

There are some scenarios where you may be prevented from doing this for operational reason,  DML performance or schema ownership issues , are just two but , if the existing indexes are ‘aligned’ then we can still avoid the overhead of reading the base data.  We will still have to lookup *some* data, but as the clustered index is the widest set of data we can still improve performance by lookup up in a different non-clustered index.

Lets first build some test data and define some indexes.

Create Table Lookup
(
Id integer identity Primary Key,
Status integer not null,
ChildId integer not null,
ReadData integer not null,
Filler char(512) not null
)
go

with cteData
as(
Select top(3000000)
ROW_NUMBER() over (Order by (select null)) as Rown
from sys.columns a
cross join sys.columns b
cross join sys.columns c
)
insert into Lookup(Status,ChildId,ReadData,Filler)
select Rown%100000,
abs(CHECKSUM(newid())),
Rown%10000,
'djdjdjdjdjd'
from cteData

go
create index idxLookupStatus on Lookup(status) include (childid)
go
create index idxChild on Lookup(childid) include (ReadData)
go
Our starting query is
 
select ReadData
from Lookup
where Status = 10000

which produces a plan of
 
image
And running with 'set statistics io on’ , we can see that a total of 133 logical reads have been issued.
 
Interestingly, running the query
 
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
where Lookup.Status = 10000

produces the plan of

image

and again 133 logical reads are used,  so there really is not much difference between a key lookup and clustered index seek.  They are pretty much synonymous here.  In our table we have the data ‘Filler’ which is not required to satisfy our query, but is still begin read by both the above two methods,  is there a way not to read that ?
 
If you look at the index definitions above, you will see that idxchild could be used to satisfy  our query rather than doing the lookup.  We know the childid we require as that is INCLUDE’d into idxLookupStatus which we are already seeking upon.  We could use that to seek inside the idxChild index in which readdata is included.  So what if we formulated to exploit this relationship in the data ?
 
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
and Lookup.ChildId = KeyLookup.ChildId
where Lookup.Status = 10000

image

103 Logical IO’s a nice drop of 25%.