February 2010 - Posts
I currently have a work lenovo laptop next to me. I
don't use it because the FN and CTLR keys are the wrong way round. This wouldn't
be a pain if I only live in Lenovo world but I don't. My other laptops and my
desktop keyboard have the CTRL key in the right place.
Well Lenovo recently asked for feedback on this unique user experience and
apparantly they have listened http://lenovoblogs.com/yamato/?p=518&language=en.
From now on you will be able to swap the logic for these keys round. You can't
swap the keys just what they do. Not sure if that will make the situation
worse.
I guess you could resort to tipex (or whiteout for our north american friends
as Ami Levin informed me last week).
When performance tuning I often use a combination of
Execution Plans, SET STATISTICS IO and SET STATISTICS TIME. what I've noticed
lately is that when using TIME and actual execution plans the teim and cpu
reports is very large. If I turn off actual execution plans the duration
drastically reduces. I can understand this as the execution plan has to be put
together and sent to the client, but it is very noticeable.
Whats more is that the extra duration seesm to be different for different
query plans.
So word of warning if you are doing some performance tuning and are trying to
compare CPUT and duration , you don't want to have execution plans turned on as
this will skew the result.
Saving a hierarchy of data from an XML document into a
set of tables is a problem when the only relationship between nodes in the
hierarchy is the position of the node. SQL doesn't like implied things like
this. It likes things to be explicit.
This means that when you save the child nodes you are stuck as to how you can
get the appropriate primary key value of the parent node.
With OPENXML you have metaproperties but with xquery you don't. This means
you have to do some tricky SQL using row_number.
The downside with this approach is that it breaks the xquery processing and
so might not perform as well compared to doing a simple cross apply.
In the code below we store the results from the insert into the table
variable along with the source generated id. Note the use of (select 1) this is
a bit of a hack. Realy you need to order by some determinstic set of columns but
if none exist, I am working on the assumption that doing this trick will result
in the row_number matching the order in the document. This may be a risky
assumption and one that needs thorough testing.
The other trick being used is the use of MERGE. The use of MERGE allows us to
populate the table variable with values from the source query as well as the
inserted tables. With a normal INSERT statement you can only use the latter, the
inserted table.
Finally we join back to this lookup data so that we can save teh correct
Foreign key value in the child table.
create table
Orders (OrderId int identity(1,1)
primary key, Customer varchar(100), OrderDate datetime)
create table
OrderItems (OrderDetailId int identity(1,1),
OrderId int
not null,ItemNo varchar(100),
Qty int)
go
declare @orderImport xml
set @orderImport=
'<?xml
version="1.0" encoding="UTF-8" ?>
<Source
>
<Body>
<Order Customer="Smith
jones" OrderDate="1-1-2001">
<Item
ItemNo="Darma Scales" Qty="-85" />
<Item
ItemNo="Darma Scales" Qty="85" />
</Order>
<Order Customer="Smith
jones" OrderDate="1-1-2001">
<Item
ItemNo="Sharp Knives" Qty="-68" />
<Item
ItemNo="Sharp Knives" Qty="68" />
</Order>
</Body>
</Source>'
declare @OrderLookup table (InOrderId int primary key, NewOrderId int not
null)
merge into
Orders
using
(Select ROW_NUMBER() over (order by (select 1))
Id
,header.value('@Customer','varchar(100)') Customer
,header.value('@OrderDate','datetime') OrderDate
from @orderImport.nodes('/Source/Body/Order') doc(header)) src on 1=2
when not
matched by target then
insert values (Customer, OrderDate)
output inserted.OrderId, src.Id into @OrderLookup(NewOrderId, InOrderId);
insert into
OrderItems
Select
ol.NewOrderId,
Item.value('@ItemNo','varchar(36)'),
Item.value('@Qty','int')
from
(select header.query('Item') header, ROW_NUMBER() over (order by (select 1))
Id
from @orderImport.nodes('/Source/Body/Order') doc(header))doc
cross apply
doc.header.nodes('Item') item(Item)
join @OrderLookup ol on ol.InOrderId = Id
go
select *
from
Orders
select *
from
OrderItems
Having utility functions that do helpful stuff is great,
just look at sp_who2, sp_help, sp_helpindex as examples.
What if you want to write your own. Well you can you just stick the procedure
in master and prefix it with sp. You can now call the procedure from any
database without calling master..procedure.
The real worth of these comes when you access system views like sys.objects
and sys.tables. However unfortunately these are scoped to the database of the
procedure and NOT the context database. This is a change from the previous
system views in SQL 2000 and before. With those you can reference them in a
global stored proc and they would assume the context of the database they are
called from.
To show you the behaviour run the following code. It creates a stored proc in
the master database for use elsewhere. It accesses sys.objects and sysobjects to
show the difference in behaviour.
You will see that the sys.objetcs query always returns the count of objects
from master. whereas the count of sysobjects returns the current database
sysobjects count.
use
master
go
drop
procedure
sp_getObjectCount
go
create
procedure
sp_getObjectCount
as
select
db_name(),
'sys.objects' sysview ,COUNT(1) from sys.objects
select
db_name(),
'sysobjects' sysview ,COUNT(1) from sysobjects
go
select
db_name(),
'sys.objects' sysview ,COUNT(1) from sys.objects
go
exec
master..sp_getObjectCount
go
exec
msdb..sp_getObjectCount
go
use
msdb
exec
sp_getObjectCount
go
How do you get the best
of both worlds well the only way I know is to use the undocumented procedure
sp_MS_MarksystemObject This flags the procedure to
behave how you expect.
Note: THIS IS AN UNDOCUMENTED
PROCEDURE SO USE WITH CAUTION
use
master
exec
sp_MS_MarksystemObject 'sp_getObjectCount'
go
exec
msdb..sp_getObjectCount
When you execute the procedure you know get the same
results.
I get increasingly frustrated with the lack of
visibility of included columns in management studio and from the system stored
procedures sp_...
This is a query that returns all indexes and there key and include
columns
select
SCHEMA_NAME
(o.SCHEMA_ID) SchemaName
,o.name
ObjectName,i.name IndexName
,i.type_desc
,LEFT(list, ISNULL(splitter-1,len(list))) Columns
,
SUBSTRING(list, indCol.splitter +1, 100)
includedColumns--len(name) - splitter-1)
columns
,
COUNT(1) over (partition by o.object_id)
from
sys.indexes i
join
sys.objects o on i.object_id = o.object_id
cross
apply (select NULLIF(charindex('|',indexCols.list),0) splitter ,
list
from (select cast((
select case when sc.is_included_column = 1 and sc.ColPos = 1 then '|' else '' end +
case when sc.ColPos > 1
then ', ' else '' end + name
from (select sc.is_included_column,
index_column_id, name
, ROW_NUMBER()
over (partition by sc.is_included_column
order by sc.index_column_id)
ColPos
from sys.index_columns
sc
join sys.columns c
on sc.object_id =
c.object_id
and sc.column_id
= c.column_id
where sc.index_id = i.index_id
and sc.object_id = i.object_id )
sc
order by sc.is_included_column
,ColPos
for xml
path (''), type) as varchar(max)) list)indexCols
) indCol
order
by SchemaName,
ObjectName, IndexName
Don't forget to register for the SQL Social event on the
16th March
We are very lucky that the following international speakers are in town to do
Devweek (www.devweek.com) and have offered
to do an usergroup evening.
Itzik Ben-Gan (http://www.solidq.com/insidetsql/books/insidetsql2008/),
Greg Low (http://www.sqldownunder.com/)
Davide Mauri (Smart
Business Intelligence Solutions with Microsoft SQL Server 2008)
Bill Vaughn (http://www.betav.com/blog/billva/)
To register contact me or go to http://sqlsocial.com/events.aspx
I've just posted about filtering with joins is difficult to optimize (http://sqlblogcasts.com/blogs/simons/archive/2010/02/16/Filtering-by-joining-is-difficult-to-optimise.aspx).
This post is along the same lines and is about the opposite, when joins don't
filter as expected but rather they expand the result set.
As with the filtering post, this join situation I am going to describe is
also difficult to identify and the database tuning advisor won't be able to help
you either.
Someone asked me the other day if I had ever come across the situaiton of
someone defining a join by using a cross join and then putting criteria in the
where clause. I hadn't but it make sense if the person has come form the old way
of writing queries where you specify all the tables in the from clause and then
define the join criteria in the where clause.
select
from
tableA, tableB, tableC
where
tableA.col1 =
tableB.col1
and
tableB.col2 =
tableC.col2
select
from
tableA
cross
join tableB
cross
join tableC
where
tableA.col1 =
tableB.col1
and
tableB.col2 =
tableC.col2
select
from
tableA
join
tableB on tableA.col1 = tableB.col1
join
tableC on tableB.col2 = tableC.col2
All these are the same.
The danger of using cross join is that you might lose a join clause and that
can be really bad, and thats what I'm going to discuss.
If you are doing a distinct or a group by in your query then you will never
know if you have inadvertantly added a cartesian join into your query and thus
generated more rows than you expect.
Look at this query to find the the max row size for a table
select
SUM(c.max_length)
from
msdb.sys.tables
t
join
msdb.sys.columns c on t.object_id =
t.object_id
where
t.name = 'sysjobhistory'
This will give a very large number, which makes you think. But why?
So the join is wrong, I've used the wrong column, this is effectively a cross
join. You might think well thats just a bug but what about
select
t.name ,
i.name, sum(s.user_lookups)
from
msdb.sys.tables
t
join
msdb.sys.indexes i on t.object_id =
i.object_id
join
msdb.sys.dm_db_index_usage_stats s on s.object_id =
t.object_id
and s.index_id = i.index_id
group
by t.name
, i.name
Whats wrong with that?
Well the issue is that msdb.sys.dm_db_index_usage_stats Isn't just for msdb.
The key columns on it are database_id, object_id and index_id. Because we
haven't added database_id is means we've got erroneous data coming being summed.
The danger is that you might start off not grouping and then see duplicates and
think you have to group. So you do and it masks the issue.
select
DB_NAME(database_id), * from msdb.sys.dm_db_index_usage_stats
where
object_id in (select object_id
from msdb.sys.dm_db_index_usage_stats
group by object_id
having COUNT (distinct
database_id)> 1)
This query will show you any iobjects with the same id in multiple
databases.
What this highlights is the need to make sure we get our joins correct. As a
starting point on one side of a query all the join columns should make up all
the columns of a unique constraint(or primary key). If that is not the case then
you will end up with a partial catesian join. This results in the number of rows
being processed internally within the query being very very large. If you run a
query and find the estimate and actuals wildly off then consider your joins.
How do you do that? Well what I do is draw my tables and joins and then write
down the primary key and unique key columns, above the tables. I can then see
clearly when I'm not joining correctly.
Imagine you have 3 tables, a Team table, a Player table and a Goal table. The
player table has a surrogate key of Team and the Players registration code. This
allows a player to move teams. The Goal table holds the Team and the Playercode
and the datetime of the Goal. If we join these together without thinking we can
end up with the joins as we have below.

This picture shows how you can use my technique to resolve the problem. By
putting the primary key columns above the tables we can see that the join
between the Player and Goal tables isn't matching the primary key on either
side. In this situation what is missing is including the TeamId on the join.
Now this is a very simple solution. Normally this occurs when you have lots
of tables and generally when you have composite keys. So if you find yourself
grappling with why your query seems to be getting its estimates completely wrong
then consider that you have your joins wrong and try this
technique.
I announced yesterday that we would be running a SQL
Social event in March with Itzik Ben Gan, Greg Low and Davide Mauri.
If that wasn't good enough we also now have Bill Vaughn coming along as well.
Bill has been working with SQL for years and has authored numerous books
covering a huge spread of topics covering areas such as SQL Server, ADO,
Reporting Services, Windows Mobile and Visual Studio
http://www.amazon.com/gp/search/ref=sr_nr_n_2?rh=i%3Astripbooks%2Cn%3A%211000%2Ck%3Awilliam+vaughn+%2Cn%3A5&bbn=1000&keywords=william+vaughn+&ie=UTF8&qid=1266428360&rnid=1000
To register for the event just use the contact details on the event page www.sqlsocial.com or use the contact form on
my blog.
Session submission for SQLBits VI is now open.
So if you have a topic related to the performance and scalability of
any part of the SQL Server stack that you think people might be interested in
then please submit it:
To submit a session do the following
Even if you have spoken before you need to complete your
profile by agreeing to the speaker agreement. If you haven't uploaded please
upload a photo.
Get submitting and we'll see you in London in 2 months
time.
Joining data is one of the most complex parts of
SQL optimisation. Especially when you are dealing with large tables and you are
trying to visualise what set processing the query engine will do.
You might say you don't need to do all that, you can just use the Database
Tuning Adviser. Well you're wrong. The database tuning advisor only helps out
indexing a database. It doesn't help out when you have bad SQL. Do you have an
app that fires a query for each line in an order, or calculates the order totals
for all customers but only returns one of them
All these situations are about database querying design. The two examples
above are quite simple to see and fix, but what happens when you just have a
very large query that takes ages.You've got covering indexes, and key aligned
indexes as much as possible but its still slow and you are only returning a few
rows. Well what is most likely happening is that within the query you are
processing more data than you need to. Trying to resolve this is very complex
and oftens comes down to joining.
There are a number of ways to reduce the number of rows processed during
a query,
1. use a where clause with a literal
2. join to another table
option 1 is very easy for the engine to process and can often mean the rows
never make it out of the storage operator, index seek, index scan. The second is
more difficult. If you have two tables with 1 million rows and when joined you
only get 10 records. The Query processor has to likely process all rows in both
tables.You maybe lucky and have key aligned indexes (indexes where the key
columns are in the same order and match those of the join clause) which means
you can do a merge join, if you haven;t got key aligned indexes then you will
likely have a hash join. In both case you still have to read all the data. What
happens when you add a 3rd table into the equation. that makes things more
complex.
The summary is that even though you only need 10 rows if you are joining
tables to filter it is likely you have to process all dat in both/all tables. If
you can't do anything about the query/table design you need lots of memory and
if your data is greater than your memory you need fast storage
More Posts
Next page »