UPDATE and DELETE TOP and ORDER BY : Part2

I earlier posted about using derived tables with UPDATE and DELETE to be able to update or delete the TOP n rows (http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx).

In many large scale systems archiving is essential and so being able to efficiently archive data is essential, but more often than not your selectivity of what to archive is not based on the primary key.

Whats nice is that, if you have an index that supports this, you can still use the same behaviour as demonstrated in the other article to efficiently delete the rows you want. I have often had indexes on large tables just to support the archiving process.

Take this example populated table. 

drop table Orders

go

create table Orders (Id int primary key, PadingCol char (100), orderDate datetime, status int)

declare @i int

set @i=1

SET NOCOUNT ON

set statistics time off

set statistics io off

while (@i<100000)

begin

  if @@TRANCOUNT = 0

    begin transaction

  insert into Orders values (@i,'x', DATEADD(day,rand()* 1000,GETDATE()), CAST(rand()+.2 as int));

  if @i % 2000 = 0

    commit transaction

  set @i=@i+1

end

if @@TRANCOUNT > 0

  begin transaction

go

create index IX_Orders_status on t1Orders(status, orderDate)

Note the index created on the status and OrderDate.

If we want to archive the oldest closed orders (status 0) you can do the following

delete orders

from (select top (100) *

        from orders

        where status = 0

        order by orderDate) orders       

 This works because an index is sorted by the keys, but in this situation the index keys are status and then orderDate. Well because we are filtering on an exact key value the rest of the index values for will be sorted by the second key which is the orderDate.

If you make the filter on status be multivalue i.e IN(0,1) the plan changes to require a TOP N sort.

or with parallism

These both have very expensive sort operations

What happens if we change the index and only have OrderDate in the key

drop index  Orders.IX_Orders_Status

go

create index IX_Orders_orderDate on Orders(orderDate) include (status )

If we now do the delete we get the same plan, sort of.

The key thing is the index scan, this now has a predicate in it. Which does the status filtering. but because the scan is in OrderDate order when it passes the data to the TOP operator it is already sorted.

Whats interesting is that if you apply a filter on status that is not a single value i.e. IN(0,1) you still get a very optimal plan. IF and its a big if. The data you are look for is found early on in the scan. In this scenario (if the data were real), the oldest orders are are likley to be old and so will be found early on in the scan. If for instance you were looking for a specific customer, the scan might have scan a lot of rows to find 100 that match. In that case you might want to consider the previous index structure. This is highlighted if you run the delete with the last index, you will find many pages read. The number will be much higher than before because to delete from the clustered index it has to seek to the row to delete the row and this results in pages being read. Whats more the nonclustered indexes also need to be deleted.


-
Published Friday, May 22, 2009 6:01 PM by simonsabin

Comments

Sunday, May 24, 2009 6:22 AM by Uri Dimant

# re: UPDATE and DELETE TOP and ORDER BY : Part2

Hi Simon

What do you think about

sqlcat.com/.../fast-ordered-delete.aspx

Sunday, May 24, 2009 4:20 PM by simonsabin

# re: UPDATE and DELETE TOP and ORDER BY : Part2

Thats the article that started me down this road with the first post sqlblogcasts.com/.../DELETE-TOP-x-rows-avoiding-a-table-scan.aspx

The difference is using a derived table rather than a view. So you don't need the overhead of creating a view.

Thursday, May 28, 2009 12:37 AM by SimonS Blog on SQL Server Stuff

# Myth : I don't need point in time recovery so I don't need the transaction log

Wrong, wrong, wrong. This post was in response to a post in the forums &quot; How do I delete the log

Saturday, March 6, 2010 9:54 AM by SimonS Blog on SQL Server Stuff

# How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008

Often you have the need to archive data from a table. This leads to a number of challenges 1. How can

Wednesday, March 10, 2010 1:27 PM by Simon Sabin UK SQL Consultant's Blog

# How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008

Saturday, June 4, 2011 6:50 AM by Delete order | Monaschilling

# Delete order | Monaschilling

Pingback from  Delete order | Monaschilling