SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

SQL 101 : Without “ORDER BY”, order is not guaranteed.

It’s as simple as that. If you don’t specify ORDER BY, then the order of your result set is not guaranteed. Oh sure , there may be situations where the data looks like it is being returned ordered, but this is not really the case. So, for example :

CREATE TABLE #Example
    (
    row_id  INTEGER
    );
GO    
INSERT  #Example 
select top(10) ROW_NUMBER() over (order by (Select null))
from syscolumns
go
SELECT  E.row_id
FROM    #Example E;

On a number of occasions on different forums, the suggestion has been made that row_id will be in order. Certainly it is here (or if not, let me know ). Is that enough to say "It is in order" ?

insert into #Example(row_id) values(-1)
go
SELECT  E.row_id
FROM    #Example E;

Now, that’s out of order. Ahh , you may say, "That’s a heap table. A clustered key will define the order. After all, the data is already in that order, so no need for an order by. It’s going to start at row 1 and scan all the way through". NO, parallelism can cause multiple streams to start at multiple points in the table. Connor Cunningham goes through that scenario here .

Ok , what if you have a small number of rows and parallelism is not an issue. Now we don’t need the order by.

How about this ?

Drop TABLE #Example
go
CREATE TABLE #Example
    (
    row_id  INTEGER primary key,
    uuid    uniqueidentifier not null
    );
GO    
Create unique index idxExample on #Example(uuid)
go

INSERT  #Example 
select top(10) 
       ROW_NUMBER() over (order by (Select null)),
       NEWID()
from syscolumns
go
SELECT  *
FROM    #Example E;

That’s going to scan the clustered key , right ? Nope, at least not on my machine.

OrderScan

It produces a scan of the NON-clustered index. A discussion of why is not for now, but an illustration of my previous point.

So , to avoid using ORDER BY , we need:

A) A non heap table

B) To avoid parallelism

C) A clustered index

D) No non clustered indexes

During a recent discussion Paul White posted this example

CREATE TABLE dbo.Example
    (
    row_id  INTEGER IDENTITY NOT NULL PRIMARY KEY CLUSTERED,
    data    AS CONVERT(VARCHAR(11), row_id) PERSISTED NOT NULL UNIQUE
    );
GO    
INSERT  dbo.Example (data)
DEFAULT VALUES;
GO 10
SELECT  E.row_id, E.data
FROM    dbo.Example E;

Again that is now out of order due to the index to maintain the uniqueness on the data column. If I need to add a column like this to a table, the issue of non – ordered data would be low down on my list of thoughts.

All of these scenarios, you can, perhaps legitimately, argue don’t apply to your system and that you *know* you can meet all of these conditions. Will that always be true? Are you really happy with a table that you can’t add an index to? Even so, the engine may well change in the future and completely destroy your assumption. Then what will happen? Will you be able to confidently find and fix all the statements? That’s a big price to pay to save an extra few keystrokes now.

Just add ORDER BY now, and sleep safe at night.

Comments

GrumpyOldDBA said:

I'm not quite sure what your point is here as it's been this way for a very long time, maybe SQL 7 onwards - can't remember for certain.

One point however is that sorts can be resource hungry and if you don't need a sort, maybe your client can sort for example, then don't use them. I did find that a sort of data which is sorted already will still get sorted again taking resource.

# June 29, 2010 10:53 AM

Dave Ballantyne said:

The only point im making is that , if you want an ordered result set you *have* to use order by.  I posted this as ,once again, someone on a forum stated that the clustered index will define the order of a result set. I wanted one page to reference...

# June 29, 2010 11:30 PM

paulbarbin said:

Great reference.  You are absolutely correct, and this is definitely 101 stuff. I've been preaching it to developers for decades(?).  But I appreciate the academics.

My personal experience shows the execution plan pretty much dictates the order if ORDER BY isn't given.  That is, if a clustered index scan comes back in clustered index order.  Nonclustered index scans come back in nonclustered index order.

I didn't realize parallelism would change the order but that certainly makes sense.  

Finally, I thought perhaps a heap might always return the same order since I "THOUGHT" a table scan would just go in RowId order (i.e. the order they were inserted).  As any self respecting DBA, I would NEVER rely on this!

But when troubleshooting a developer's code lately, I saw an issue that appeared to have been caused by an unordered resultset.  I am yet to actually prove it and I guess I was wondering two things. Is there anyway to reproduce the different ordering reliably?  And what is the technical reason why the order is different if it's a heap (temp table too)?

Thanks, mainly just academic, but I'd really like to know.

# November 29, 2011 9:53 PM

Dave Ballantyne said:

Hi Paul,  sorry for the lateness of my reply.

One possible way to reproduce is to delete a row and then insert a row,  this *should* then use the same row slot on the page and cause an unordered result set.

drop  table #t1

go

create table #t1

(

id integer identity

)

go

insert into #t1 default values

go 2

select * from #t1

go

delete from #t1 where id =1

go

insert into #t1 default values

go

select * from #t1

As for heaps, well they are just a collection of rows with no order.  Any scan will start at page 0 and go through to page N following the page link chain...

Hope that helps

Dave

# January 2, 2012 6:02 PM