Cross apply (and outer apply) are a very welcome addition to the TSQL language. However, today after a few hours of head scratching, I have found an simple issue which could cause big big problems.
What would you expect from this statement ?
select *
from sys.objects b
join sys.objects a
on a.object_id = object_id
No prizes for guessing SQL server errors with “Ambiguous column name 'object_id'”.
What would you expect from this statement ?
Select *
from sys.objects a
cross apply( Select *
from sys.objects b where b.object_id = object_id) as c
Surprisingly, perhaps, the result is a cross join of sys.objects. Well, what happened there ?
If you look at the apply statement, within the where clause, only one of the conditions is qualified with a table name. This meant that is has be interpreted as “b.object_id = b.object_id” causing the cross apply to have no join the the parent sys.objects table and causing the cross join.
The fix is , obviously, simple
Select *
from sys.objects a
cross apply( Select *
from sys.objects b where b.object_id = a.object_id) as c
So why no “Ambiguous column name ” error ? I’ve raised a connect item on this issue here.
It is widely know that data modifications on table variables do not support parallelism, Peter Larsson has a good example of that here . Whilst tracking down a performance issue, I saw that using the OUTPUT clause also causes parallelism to not be used.
By way of example, first lets create two tables with a simple parent and child (one to one) relationship, and then populate them with 100,000 rows.
Drop table Parent
Drop table Child
go
create table Parent
(
id integer identity Primary Key,
data1 char(255)
)
Create Table Child
(
id integer Primary Key
)
go
insert into Parent(data1)
Select top 1000000 NULL
from sys.columns a cross join sys.columns b
insert into Child
Select id from Parent
go
If we then execute
update Parent
set data1 =''
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1
We should see an execution plan using parallelism such as
However, if the OUTPUT clause is now used
update Parent
set data1 =''
output inserted.id
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1
The execution plan shows that Parallelism was not used
Make of that what you will, but i thought that this was a pretty unexpected outcome.
Update : Laurence Hoff has mailed me to note that when the OUTPUT results are captured to a temporary table using the INTO clause, then parallelism is used. Naturally if you use a table variable then there is still no parallelism