February 2010 - Posts

I currently have a work lenovo laptop next to me. I don't use it because the FN and CTLR keys are the wrong way round. This wouldn't be a pain if I only live in Lenovo world but I don't. My other laptops and my desktop keyboard have the CTRL key in the right place.

Well Lenovo recently asked for feedback on this unique user experience and apparantly they have listened http://lenovoblogs.com/yamato/?p=518&language=en. From now on you will be able to swap the logic for these keys round. You can't swap the keys just what they do. Not sure if that will make the situation worse.

I guess you could resort to tipex (or whiteout for our north american friends as Ami Levin informed me last week).

Posted by simonsabin | 3 comment(s)

When performance tuning I often use a combination of Execution Plans, SET STATISTICS IO and SET STATISTICS TIME. what I've noticed lately is that when using TIME and actual execution plans the teim and cpu reports is very large. If I turn off actual execution plans the duration drastically reduces. I can understand this as the execution plan has to be put together and sent to the client, but it is very noticeable.

Whats more is that the extra duration seesm to be different for different query plans.

So word of warning if you are doing some performance tuning and are trying to compare CPUT and duration , you don't want to have execution plans turned on as this will skew the result.


Posted by simonsabin | 3 comment(s)

Saving a hierarchy of data from an XML document into a set of tables is a problem when the only relationship between nodes in the hierarchy is the position of the node. SQL doesn't like implied things like this. It likes things to be explicit.

This means that when you save the child nodes you are stuck as to how you can get the appropriate primary key value of the parent node.

With OPENXML you have metaproperties but with xquery you don't. This means you have to do some tricky SQL using row_number.

The downside with this approach is that it breaks the xquery processing and so might not perform as well compared to doing a simple cross apply.

In the code below we store the results from the insert into the table variable along with the source generated id. Note the use of (select 1) this is a bit of a hack. Realy you need to order by some determinstic set of columns but if none exist, I am working on the assumption that doing this trick will result in the row_number matching the order in the document. This may be a risky assumption and one that needs thorough testing.

The other trick being used is the use of MERGE. The use of MERGE allows us to populate the table variable with values from the source query as well as the inserted tables. With a normal INSERT statement you can only use the latter, the inserted table.

Finally we join back to this lookup data so that we can save teh correct Foreign key value in the child table.

create table Orders (OrderId int identity(1,1) primary key, Customer varchar(100), OrderDate datetime)

create table OrderItems (OrderDetailId int identity(1,1), OrderId int not null,ItemNo varchar(100), Qty int)



declare @orderImport xml


set @orderImport=

'<?xml version="1.0" encoding="UTF-8" ?>

<Source >


    <Order Customer="Smith jones" OrderDate="1-1-2001">

      <Item ItemNo="Darma Scales" Qty="-85" />

      <Item ItemNo="Darma Scales" Qty="85" />


    <Order Customer="Smith jones" OrderDate="1-1-2001">

      <Item ItemNo="Sharp Knives" Qty="-68" />

      <Item ItemNo="Sharp Knives" Qty="68" />





declare @OrderLookup table (InOrderId int primary key, NewOrderId int not null)


merge into Orders

using (Select ROW_NUMBER() over (order by (select 1)) Id

            ,header.value('@Customer','varchar(100)') Customer

            ,header.value('@OrderDate','datetime') OrderDate

       from @orderImport.nodes('/Source/Body/Order') doc(header)) src  on 1=2

when not matched by target then

insert values (Customer, OrderDate)

output  inserted.OrderId, src.Id into @OrderLookup(NewOrderId, InOrderId);


insert into OrderItems






from (select header.query('Item') header, ROW_NUMBER() over (order by (select 1)) Id

       from @orderImport.nodes('/Source/Body/Order') doc(header))doc

cross apply doc.header.nodes('Item') item(Item)

join @OrderLookup ol on ol.InOrderId = Id


select * from Orders

select * from OrderItems


Posted by simonsabin | 8 comment(s)
Filed under: ,

Having utility functions that do helpful stuff is great, just look at sp_who2, sp_help, sp_helpindex as examples.

What if you want to write your own. Well you can you just stick the procedure in master and prefix it with sp. You can now call the procedure from any database without calling master..procedure.

The real worth of these comes when you access system views like sys.objects and sys.tables. However unfortunately these are scoped to the database of the procedure and NOT the context database. This is a change from the previous system views in SQL 2000 and before. With those you can reference them in a global stored proc and they would assume the context of the database they are called from.

To show you the behaviour run the following code. It creates a stored proc in the master database for use elsewhere. It accesses sys.objects and sysobjects to show the difference in behaviour.

You will see that the sys.objetcs query always returns the count of objects from master. whereas the count of sysobjects returns the current database sysobjects count.

use master


drop procedure sp_getObjectCount


create procedure sp_getObjectCount


select db_name(), 'sys.objects' sysview ,COUNT(1) from sys.objects

select db_name(), 'sysobjects' sysview ,COUNT(1) from sysobjects


select db_name(), 'sys.objects' sysview ,COUNT(1) from sys.objects


exec master..sp_getObjectCount


exec msdb..sp_getObjectCount


use msdb

exec sp_getObjectCount



How do you get the best of both worlds well the only way I know is to use the undocumented procedure sp_MS_MarksystemObject This flags the procedure to behave how you expect.




use master

exec sp_MS_MarksystemObject 'sp_getObjectCount'


exec msdb..sp_getObjectCount

When you execute the procedure you know get the same results.

Posted by simonsabin | 2 comment(s)
Filed under: ,

I get increasingly frustrated with the lack of visibility of included columns in management studio and from the system stored procedures sp_...

This is a query that returns all indexes and there key and include columns

select SCHEMA_NAME (o.SCHEMA_ID) SchemaName

   ,o.name ObjectName,i.name IndexName


   ,LEFT(list, ISNULL(splitter-1,len(list))) Columns

   , SUBSTRING(list, indCol.splitter +1, 100) includedColumns--len(name) - splitter-1) columns

   , COUNT(1) over (partition by o.object_id)

from sys.indexes i

join sys.objects o on i.object_id = o.object_id

cross apply (select NULLIF(charindex('|',indexCols.list),0) splitter , list

              from (select cast((

                           select case when sc.is_included_column = 1 and sc.ColPos = 1 then '|' else '' end +

                                  case when sc.ColPos  > 1 then ', ' else '' end + name

                             from (select sc.is_included_column, index_column_id, name

                                        , ROW_NUMBER() over (partition by sc.is_included_column

                                                             order by sc.index_column_id) ColPos

                                    from sys.index_columns  sc

                                    join sys.columns        c on sc.object_id = c.object_id

                                                             and sc.column_id = c.column_id

                                   where sc.index_id = i.index_id

                                     and sc.object_id = i.object_id ) sc

                    order by sc.is_included_column


                      for xml path (''), type) as varchar(max)) list)indexCols ) indCol

order by SchemaName, ObjectName, IndexName


Posted by simonsabin | 6 comment(s)

Don't forget to register for the SQL Social event on the 16th March

We are very lucky that the following international speakers are in town to do Devweek (www.devweek.com) and have offered to do an usergroup evening.

    Itzik Ben-Gan (http://www.solidq.com/insidetsql/books/insidetsql2008/), 

    Greg Low (http://www.sqldownunder.com/)

    Davide Mauri (Smart Business Intelligence Solutions with Microsoft SQL Server 2008)

    Bill Vaughn (http://www.betav.com/blog/billva/)

To register contact me or go to http://sqlsocial.com/events.aspx


Posted by simonsabin | 2 comment(s)

I've just posted about filtering with joins is difficult to optimize (http://sqlblogcasts.com/blogs/simons/archive/2010/02/16/Filtering-by-joining-is-difficult-to-optimise.aspx). This post is along the same lines and is about the opposite, when joins don't filter as expected but rather they expand the result set.

As with the filtering post, this join situation I am going to describe is also difficult to identify and the database tuning advisor won't be able to help you either.

Someone asked me the other day if I had ever come across the situaiton of someone defining a join by using a cross join and then putting criteria in the where clause. I hadn't but it make sense if the person has come form the old way of writing queries where you specify all the tables in the from clause and then define the join criteria in the where clause.


from tableA, tableB, tableC

where tableA.col1 = tableB.col1

and tableB.col2 = tableC.col2


from tableA

cross join tableB

cross join tableC

where tableA.col1 = tableB.col1

and tableB.col2 = tableC.col2


from tableA

join tableB on tableA.col1 = tableB.col1

join tableC on tableB.col2 = tableC.col2

All these are the same.

The danger of using cross join is that you might lose a join clause and that can be really bad, and thats what I'm going to discuss.

If you are doing a distinct or a group by in your query then you will never know if you have inadvertantly added a cartesian join into your query and thus generated more rows than you expect.

Look at this query to find the the max row size for a table

select SUM(c.max_length)

from msdb.sys.tables t

join msdb.sys.columns c on t.object_id = t.object_id

where t.name = 'sysjobhistory'

This will give a very large number, which makes you think. But why?

So the join is wrong, I've used the wrong column, this is effectively a cross join. You might think well thats just a bug but what about

select t.name , i.name, sum(s.user_lookups)

from msdb.sys.tables t

join msdb.sys.indexes i on t.object_id = i.object_id

join msdb.sys.dm_db_index_usage_stats s on s.object_id = t.object_id

                                       and s.index_id = i.index_id

group by t.name , i.name

Whats wrong with that?

Well the issue is that msdb.sys.dm_db_index_usage_stats Isn't just for msdb. The key columns on it are database_id, object_id and index_id. Because we haven't added database_id is means we've got erroneous data coming being summed. The danger is that you might start off not grouping and then see duplicates and think you have to group. So you do and it masks the issue.

select DB_NAME(database_id), * from msdb.sys.dm_db_index_usage_stats

where object_id in (select object_id

                      from msdb.sys.dm_db_index_usage_stats

                  group by object_id

                    having COUNT (distinct database_id)> 1)

This query will show you any iobjects with the same id in multiple databases.

What this highlights is the need to make sure we get our joins correct. As a starting point on one side of a query all the join columns should make up all the columns of a unique constraint(or primary key). If that is not the case then you will end up with a partial catesian join. This results in the number of rows being processed internally within the query being very very large. If you run a query and find the estimate and actuals wildly off then consider your joins.

How do you do that? Well what I do is draw my tables and joins and then write down the primary key and unique key columns, above the tables. I can then see clearly when I'm not joining correctly.

Imagine you have 3 tables, a Team table, a Player table and a Goal table. The player table has a surrogate key of Team and the Players registration code. This allows a player to move teams. The Goal table holds the Team and the Playercode and the datetime of the Goal. If we join these together without thinking we can end up with the joins as we have below.

This picture shows how you can use my technique to resolve the problem. By putting the primary key columns above the tables we can see that the join between the Player and Goal tables isn't matching the primary key on either side. In this situation what is missing is including the TeamId on the join.

Now this is a very simple solution. Normally this occurs when you have lots of tables and generally when you have composite keys. So if you find yourself grappling with why your query seems to be getting its estimates completely wrong then consider that you have your joins wrong and try this technique.

Posted by simonsabin | 1 comment(s)

I announced yesterday that we would be running a SQL Social event in March with Itzik Ben Gan, Greg Low and Davide Mauri.

If that wasn't good enough we also now have Bill Vaughn coming along as well.

Bill has been working with SQL for years and has authored numerous books covering a huge spread of topics covering areas such as SQL Server,  ADO, Reporting Services, Windows Mobile and Visual Studio


To register for the event just use the contact details on the event page www.sqlsocial.com or use the contact form on my blog.


Posted by simonsabin | 1 comment(s)

Session submission for SQLBits VI is now open.

So if you have a topic related to the performance and scalability of any part of the SQL Server stack that you think people might be interested in then please submit it:

To submit a session do the following

Even if you have spoken before you need to complete your profile by agreeing to the speaker agreement. If you haven't uploaded please upload a photo.


Get submitting and we'll see you in London in 2 months time.

Posted by simonsabin | with no comments

Joining data is one of the most complex parts of SQL optimisation. Especially when you are dealing with large tables and you are trying to visualise what set processing the query engine will do.

You might say you don't need to do all that, you can just use the Database Tuning Adviser. Well you're wrong. The database tuning advisor only helps out indexing a database. It doesn't help out when you have bad SQL. Do you have an app that fires a query for each line in an order, or calculates the order totals for all customers but only returns one of them

All these situations are about database querying design. The two examples above are quite simple to see and fix, but what happens when you just have a very large query that takes ages.You've got covering indexes, and key aligned indexes as much as possible but its still slow and you are only returning a few rows. Well what is most likely happening is that within the query you are processing more data than you need to. Trying to resolve this is very complex and oftens comes down to joining.

There are a number of ways to reduce the number of rows processed during a query,

1. use a where clause with a literal

2. join to another table

option 1 is very easy for the engine to process and can often mean the rows never make it out of the storage operator, index seek, index scan. The second is more difficult. If you have two tables with 1 million rows and when joined you only get 10 records. The Query processor has to likely process all rows in both tables.You maybe lucky and have key aligned indexes (indexes where the key columns are in the same order and match those of the join clause) which means you can do a merge join, if you haven;t got key aligned indexes then you will likely have a hash join. In both case you still have to read all the data. What happens when you add a 3rd table into the equation. that makes things more complex.

The summary is that even though you only need 10 rows if you are joining tables to filter it is likely you have to process all dat in both/all tables. If you can't do anything about the query/table design you need lots of memory and if your data is greater than your memory you need fast storage

Posted by simonsabin | 3 comment(s)
More Posts Next page »