Friday, May 18, 2007 11:37 AM tonyrogerson

Using TOP and ORDER BY with UNION / UNION ALL

Got asked this from a friend, I have a column treeID that has values 1 - 5 and for each value I require 20 random rows, so 20 for 1, 20 for 2 etc... Is there a query that will do this other than calling the stored procedure 5 times?

First thought is to use 5 queries and a UNION ALL, however the syntax does not support it ->

select top 20 somedata

from #blah

order by newid()

union all

select top 20 somedata = somedata + 1

from #blah

order by newid()

Gives this error ->

Msg 156, Level 15, State 1, Line 4

Incorrect syntax near the keyword 'union'.

To get round this problem we can use derived tables, think of these as a self-contained table, a bit like a temporary table within the query.

select somedata

from (

    select top 20 somedata

    from #blah

    order by newid() ) as d

union all

select *

from (

    select top 20 somedata = somedata + 1

    from #blah

    order by newid() ) as d

The NEWID() gives a different GUID per row, unlike CURRENT_TIMESTAMP for instance which is evaluated once regardless of how many rows (you always get the same value), NEWID() evaluates for every row.

You can probably do something with a Common Table Expression as well, any takers?

Congrats to Wanderer (see comments) on the first to give me a CTE example, nice!...

WITH Top20Blah (SomeDataOut) as (

    select top 20 somedata

    from #blah

    order by newid() )

select 1, *

from Top20Blah

union all

select 2, *

from Top20Blah

union all

select 3, *

from Top20Blah

union all

select 4, *

from Top20Blah

union all

select 5, *

from Top20Blah


It just gets better, here is a CROSS APPLY example (see comments below) from Adam Machanic, I never even thought about CROSS APPLY - perfect use for it.

select p.somedata, x.treeid

from (   select 1

         union all

         select 2

         union all

         select 3

         union all

         select 4

         union all

         select 5

        ) x (treeid)

    cross apply (

           select top 20 somedata

           from #blah

           where treeid = x.treeid

           order by checksum(newid())

        ) p

 

Got another I've just worked out myself...

select *

from (

    select somedata,

           treeid,

           row_number() over ( partition by treeid order by newid() ) as rownumber

    from #blah

    ) as r

where r.rownumber <= 20

order by treeid

 

Filed under:

Comments

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 2:49 PM by Wanderer

Something like this?

WITH Top20Blah (SomeDataOut) as (select top 20 somedata from #blah order by newid() )

select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

.... some sample data created, and the test:

set nocount on

declare @int bigint,@count bigint,@Path tinyint

set @int = 0

set @count = 0

create table #blah (somedata bigint)

while @count < 100 begin

set @count = @count + 1

select

@Path =

(case

when newid() < newid() then 1

when newid() = newid() then 2

when newid() > newid() then 3

else 2 --nulls are generated, and = is virtually (actually?) impossible

end)

if @Path = 1 set @int = @count + @int

if @Path = 2 set @Int = @count + @int

if @Path = 3 set @int = @count - @int

--leave out / to remove pain of decimals etc.

insert into #blah (SomeData) values (@int)

end;

WITH Top20Blah (SomeDataOut) as (select top 20 somedata from #blah order by newid() )

select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

drop table #blah

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 3:01 PM by AdamMachanic

This is a perfect application for CROSS APPLY:

select p.somedata

from

(

 select 1

 union all

 select 2

 union all

 select 3

 union all

 select 4

 union all

 select 5

) x (treeid)

cross apply

(

   select top 20 somedata

   from yourTree

   where yourtree.treeid = x.treeid

   order by checksum(newid())

) p

By the way, you'll notice I used CHECKSUM(NEWID()) rather than just NEWID() -- I was given a tip to use it by Itzik Ben-Gan, and verified through some testing that he was correct; it produces a better distribution than just NEWID(), for whatever reason.

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 3:23 PM by Wanderer

I'm clearly not implementing your idea correctly, Adam. Could you review and correct my example below, so I understand what you mean.

Tony, btw, on my laptop, XP SP2, SQL 2005 SP2a, the union all and the CTE version create same access path for my sample data.

-------sample data in #blah, and then Tony's, my and Adam's SQL----

set nocount on

declare @int bigint,@count bigint,@Path tinyint

set @int = 0

set @count = 0

create table #blah (somedata bigint)

while @count < 100 begin

set @count = @count + 1

select

@Path =

(case

when newid() < newid() then 1

when newid() = newid() then 2

when newid() > newid() then 3

else 2 --nulls are generated, and = is virtually (actually?) impossible

end)

if @Path = 1 set @int = @count + @int

if @Path = 2 set @Int = @count + @int

if @Path = 3 set @int = @count - @int

--leave out / to remove pain of decimals etc.

insert into #blah (SomeData) values (@int)

end

;

--CTE version

WITH Top20Blah (SomeDataOut) as (select top 20 somedata from #blah order by newid() )

select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

union all select * from Top20Blah

--cross-apply version (with error, since it only returns exact matches)

select p.somedata

from

(select 1

union all

select 2

union all

select 3

union all

select 4

union all

select 5

) x (treeid)

cross apply

(  select top 20 somedata

  from #blah

  where #blah.SomeData = x.treeid

  order by checksum(newid())

) p

--

--Tony's initial union, done for 5 iterations as spec'd

select somedata

from ( select top 20 somedata from #blah order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah order by newid() ) as d

drop table #blah

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 3:43 PM by Wanderer

ahhhh - ok, got it.

I tested the 3, and not only is Adam's neater, but it also out performs the other two (on my sample data). I would guess that as the number of 'sets' required grew, Adam's would continue to improve relative to the others, since the CTE and the Derived table version create table scan's for each select, then union (concatenation operator in plan) the results, whereas Adam's Cross apply only lists one table scan, with a nested loops operator.

CTE sub-query cost: 0.0790287

Derived table unions sub-query costs: 0.0790287

Cross Apply: 0.0635376

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 4:05 PM by AdamMachanic

How did you get the CTE version working?

Here's my version of the complete code, which I think is closer to the actual requirement (I added and populated an additional TreeId column in the table) -- but this breaks the CTE:

-----

set nocount on

declare @int bigint,@count bigint,@Path tinyint,@ti int

set @ti = 1

set @int = 0

set @count = 0

create table #blah (somedata bigint, treeid int)

while @count < 500 begin

set @count = @count + 1

if @count % 100 = 0

set @ti = @ti + 1

select

@Path =

(case

when newid() < newid() then 1

when newid() = newid() then 2

when newid() > newid() then 3

else 2 --nulls are generated, and = is virtually (actually?) impossible

end)

if @Path = 1 set @int = @count + @int

if @Path = 2 set @Int = @count + @int

if @Path = 3 set @int = @count - @int

--leave out / to remove pain of decimals etc.

insert into #blah (SomeData, TreeId) values (@int, @ti)

end

;

--CTE version

WITH Top20Blah (SomeDataOut, treeid) as (select top 20 somedata, treeid from #blah order by newid() )

select * from Top20Blah

where treeid = 1

union all select * from Top20Blah

where treeid = 2

union all select * from Top20Blah

where treeid = 3

union all select * from Top20Blah

where treeid = 4

union all select * from Top20Blah

where treeid = 5

--cross-apply version (with error, since it only returns exact matches)

select p.somedata

from

(select 1

union all

select 2

union all

select 3

union all

select 4

union all

select 5

) x (treeid)

cross apply

(  select top 20 somedata

 from #blah

 where #blah.treeid = x.treeid

 order by checksum(newid())

) p

--

--Tony's initial union, done for 5 iterations as spec'd

select somedata

from ( select top 20 somedata from #blah where treeid = 1 order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah where treeid = 2 order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah where treeid = 3 order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah where treeid = 4 order by newid() ) as d

union all

select *

from ( select top 20 somedata from #blah where treeid = 5 order by newid() ) as d

drop table #blah

-----

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 4:13 PM by Wanderer

My CTE wasn't doing the 'treeid' - missed that bit, I was writing the CTE ((my first ever)) to do what Tony's initial example did - i.e. select top 20 somedata from blah

I'll try and find some time to playaround with this again, and see if I can make the CTE do the actual requirement, although I already agree the cross apply is superiod (see the performance numbers)

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 4:20 PM by tonyrogerson

What do you think to this guys?

select *

from (

   select somedata,

          treeid,

          row_number() over ( partition by treeid order by newid() ) as rownumber

   from #blah

   ) as r

where r.rownumber <= 20

order by treeid

# re: Using TOP and ORDER BY with UNION / UNION ALL

Friday, May 18, 2007 5:32 PM by Wanderer

We certainly have a new leader :P

relative costs (CTE version still not working, btw):

CTE: 38% (0.11)

CrossApply:22% (0.07)

Derived: 28% (0.08)

Row_Number: 12% (0.03)

Nice one

# re: Using TOP and ORDER BY with UNION / UNION ALL

Wednesday, May 23, 2007 10:57 PM by angelia

problem solved -  thank you!

# Sat?rlar?n ba? harfine g?re veritaban?ndan rastgele veri ?ekmek - Ceviz Forum

Pingback from  Sat?rlar?n ba? harfine g?re veritaban?ndan rastgele veri ?ekmek - Ceviz Forum