UPDATE and DELETE TOP and ORDER BY : Part2
I earlier posted about using derived tables with UPDATE
and DELETE to be able to update or delete the TOP n rows (http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx).
In many large scale systems archiving is essential and so being able to
efficiently archive data is essential, but more often than not your selectivity
of what to archive is not based on the primary key.
Whats nice is that, if you have an index that supports this, you can still
use the same behaviour as demonstrated in the other article to efficiently
delete the rows you want. I have often had indexes on large tables just to
support the archiving process.
Take this example populated table.
drop table Orders
go
create table Orders (Id int primary key, PadingCol char
(100),
orderDate datetime, status int)
declare @i int
set @i=1
SET NOCOUNT ON
set statistics time
off
set statistics io off
while
(@i<100000)
begin
if
@@TRANCOUNT = 0
begin transaction
insert
into Orders values
(@i,'x', DATEADD(day,rand()* 1000,GETDATE()), CAST(rand()+.2 as int));
if @i
% 2000 =
0
commit transaction
set
@i=@i+1
end
if @@TRANCOUNT >
0
begin
transaction
go
create
index IX_Orders_status on t1Orders(status,
orderDate)
Note the index created on the status and OrderDate.
If we want to archive the oldest closed orders (status 0) you can do the
following
delete
orders
from
(select top (100) *
from orders
where status =
0
order by orderDate)
orders
This works because an index
is sorted by the keys, but in this situation the index keys are status and then orderDate.
Well because we are filtering on an exact key value the rest of the index values
for will be sorted by the second key which is the orderDate.
If you make the filter on status be multivalue i.e IN(0,1) the plan changes
to require a TOP N sort.

or with parallism

These both have very expensive sort operations
What happens if we change the index and only have OrderDate in the key
drop index
Orders.IX_Orders_Status
go
create
index IX_Orders_orderDate on Orders(orderDate)
include (status )
If we now do the delete we get the same plan, sort of.
The key thing is the index scan, this now has a predicate in it. Which does
the status filtering. but because the scan is in OrderDate order when it passes
the data to the TOP operator it is already sorted.

Whats interesting is that if you apply a filter on status
that is not a single value i.e. IN(0,1) you still get a very optimal plan. IF
and its a big if. The data you are look for is found early on in the scan. In
this scenario (if the data were real), the oldest orders are are likley to be
old and so will be found early on in the scan. If for instance you were looking for a specific
customer, the scan might have scan a lot of rows to find 100 that match. In
that case you might want to consider the previous index structure. This is highlighted if you run
the delete with the last index, you will find many pages read. The number will be much higher
than before because to delete from the clustered index it has to seek to
the row to delete the row and this results in pages being read. Whats more the nonclustered indexes
also need to be deleted.
-