29 June 2006 08:00
tonyrogerson
Explanation and examples of using UNION and UNION ALL includes derived tables usage
Often people forget or misunderstand the difference between UNION and the UNION ALL keywords in a query.
UNION ALL
SELECT
NOW = GETDATE()
UNION
ALL
SELECT
NOW = GETDATE()
This gives us the entire 'set' of data, both queries are executed, note, they are executed serially and not in parallel, the order the queries are executed is not determined; so, you never get a situation where the top query is parallelised with the bottom query; the query components themselves may be executed in parallel and combined as the last step.
Output (notice there are two rows that are identical, this is because GETDATE() gives a consistent value across the entire query rather than consistency at each query within the UNION construct.
NOW
-----------------------
2006-06-29 08:01:48.937
2006-06-29 08:01:48.937
Say you don't want duplicates, well that's where the [ALL] option on the UNION came in, [ALL] gives all rows from all queries combined, whereas leaving the [ALL] out will do a distinct as the final stage on the result of all the sub-queries:-
UNION
SELECT
NOW = GETDATE()
UNION
SELECT
NOW = GETDATE()
Output (notice there is just a single row now).
NOW
-----------------------
2006-06-29 08:04:37.107
There are a number of ways of doing a DISTINCT on the final results set and it depends what query you are executing, some ways the optimiser uses are Merge Join (UNION), Stream Aggregate (Aggregate) with a preceeding Sort step.
ORDER BY
The ORDER BY only operates on the outer most query, for instance this looks like you are ordering on the final sub-query, and yes - it does look confusing:-
SELECT
NOW = GETDATE()
UNION
ALL
SELECT
NOW = GETDATE()
ORDER
BY NOW -- This relates to the merged 'final' query
Using TOP
You almost always want to use ORDER BY and TOP together, however this does not work:-
SELECT
TOP 1 name
FROM sysobjects
ORDER BY name
UNION
ALL
SELECT
name
FROM sysobjects
ORDER
BY name
It gives the error message:
Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword 'UNION'.
To get round this we can use what is called a derived table, a derived table can be thought of as a materialised temporary table that has no joins with the outer query (like a sub-query has).
SELECT
name
FROM (
SELECT TOP 1 name
FROM sysobjects
ORDER BY name
) AS d (name )
UNION
ALL
SELECT
name
FROM sysobjects
ORDER BY name
The derived table bit is shown below, basically you can think of it being executed in isolation from the rest of the query, a bit like using SELECT .. INTO #T and uses #T... but the great thing about derived tables is that the optimiser understands what you are trying to do and may give a better query plan than if you'd have created a #T or table variable.
(
SELECT TOP 1 name
FROM sysobjects
ORDER BY name
) AS d (name )
Using Derived Tables
Say you want to do a TOP or further query the results of your UNION query, you can do this by wrapping the UNION into a derived table and do your query on the derived table :-
SELECT
TOP 10 name
FROM (
SELECT name
FROM (
SELECT TOP 1 name
FROM sysobjects
ORDER BY name
) AS d (name )
UNION ALL
SELECT name
FROM sysobjects
) AS final_d
WHERE
name LIKE 's%'
ORDER BY name
Remember what a derived table is - think of it as a materialsed temporary table within your query.
Getting cool! So, this opens up a lot of power for creating a highly scalable and performant system, you can in some instances combine a lot of temporary table work into a single statement.
You can now see other stuff, you can use GROUP BY, anything you could possibly want, another example...
SELECT
type, COUNT(*), MAX( name ), MIN( name )
FROM (
SELECT name, type
FROM (
SELECT TOP 1 name, type
FROM sysobjects
ORDER BY name
) AS d ( name, type )
UNION ALL
SELECT name, type
FROM sysobjects
) AS final_d
WHERE
name LIKE 's%'
GROUP BY type
Filed under: SQL Server