SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

March 2010 - Posts

Cross Apply Ambiguity

Cross apply (and outer apply)  are a very welcome addition to the TSQL language.  However, today after a few hours of head scratching, I have found an simple issue which could cause big big problems.

What would you expect from this statement ?

select * 
  from sys.objects b    
  join sys.objects a
    on a.object_id = object_id

No prizes for guessing SQL server errors with “Ambiguous column name 'object_id'”.

What would you expect from this statement ?

Select * 
  from sys.objects a
  cross apply( Select * 
                from sys.objects b where b.object_id = object_id) as c

Surprisingly, perhaps, the result is a cross join of sys.objects.  Well, what happened there ?

If you look at the apply statement, within the where clause, only one of the conditions is qualified with a table name.  This meant that is has be interpreted as “b.object_id = b.object_id” causing the cross apply to have no join the the parent sys.objects table and causing the cross join.

The fix is , obviously, simple

Select * 
  from sys.objects a
  cross apply( Select * 
                from sys.objects b where b.object_id = a.object_id) as c

So why no “Ambiguous column name ” error ?  I’ve raised a connect item on this issue here.

Output = MAXDOP 1

It is widely know that data modifications on table variables do not support parallelism, Peter Larsson has a good example of that here .  Whilst tracking down a performance issue,  I saw that using the OUTPUT clause also causes parallelism to not be used.

By way of example,  first lets create two tables with a simple parent and child (one to one) relationship, and then populate them with 100,000 rows.

Drop table Parent
Drop table Child
go
create table
Parent
(
id integer identity Primary Key,
data1 char(255)
)

Create Table Child
(
id integer Primary Key
)
go
insert into
Parent(data1)
Select top 1000000 NULL
from sys.columns a cross join sys.columns b

insert into Child
Select id from Parent
go

If we then execute

update Parent 
set data1 =''
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1

We should see an execution plan using parallelism such as

Parrelel1

 

However,  if the OUTPUT clause is now used

update Parent 
set data1 =''
output inserted.id
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1

 

The execution plan shows that Parallelism was not used

Parrelel2

Make of that what you will, but i thought that this was a pretty unexpected outcome.

 

Update : Laurence Hoff has mailed me to note that when the OUTPUT results are captured to a temporary table using the INTO clause,  then parallelism is used.  Naturally if you use a table variable then there is still no parallelism