May 2009 - Posts

Paul Randal has a great survey this week. You have to decide on the ONE thing that is important when performance tuning.

It would have been good to have a second question to ask what your job role is. I would expect a correlation between job role and the answer.

http://www.sqlskills.com/BLOGS/PAUL/post/Weekly-survey-whats-the-most-important-thing-when-performance-tuning.aspx

I won't say what I decided, I was torn. I think indexing is essential as you can index out of many poor written queries but then again application code is also important. As they say "you can't polish a turd", but if you don't have enough memory then your performance won't be great either.

So which will you choose


-
Posted by simonsabin | with no comments

Wrong, wrong, wrong.

This post was in response to a post in the forums "How do I delete the log file?" http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/69c8ecc6-6414-43b1-8c08-21fdf2ca9c54

The transaction log for a SQL Server database is not a log like the ones many applications produce that is just history of the stuff you've done in the application, say for debugging. Whilst it is a record of the changes you make to your data it is a critical element in how data is kept consistent in your database database, i.e. not corrupt.

Lets start by going over a few basics. A SQL Server is made of tables, the data for a table is stored on a page. These pages are stored in the data files (mdf/ndf ).

When you make changes to rows in a table, the pages containing those rows are read from disk and put into memory. The changes are recorded in the transaction log and are then the pages changed to reflect the chanegs you've made. The pages are only change IN MEMORY, not in the data files. If at this point your server dies, your changes that are currently only in memory and not in the data files will have been lost. However they are in the transaction log. So when SQL Server starts a recovery process is run. This can be seen in the errorlog. This process looks for transactions that are in the transaction log that didn't get harded to the data file and it makes the changes to the data file. So you don't loose your data.

There is an additional process that runs called the lazy writer, what this does is free pages from cache when under memory pressure. This means that the changes you have made to pages in memory may be hardened to the data files. Great you might think thats what I want. However this can also happen in the MIDDLE of a transaction. This means that you could have changes in your data files that have yet to be committed. In this case if SQL Server crashes mid transaction, when the recovery process runs it looks for changes that were made and recorded in the transaction log and hardened to disk but for which the transaction never completed. These changes are then rolled back thus leaving your data files in a consistent state.

So even if you don't need point in time recovery you still need your transaction log.

The key thing is that if you don't need point in time recovery then you should set your recovery model to simple. What this does is frees the space in the transaction log of committed transactions and so your transaction log shouldn't grow.

One note: If you have a single transaction that requires 10Gb of transaction log space then using simple recovery won't help. For example, updating every row in a very large table in one statement, this is one transaction and so the transaction log needs to be big enough to hold the changes. If space on the drive for the transaction log space is problem then you need to look at batching up your changes into smaller transactions, that once committed can be truncated from the transaction log. See my posts about DELETE and UPDATE with TOP http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2.aspx


-
Posted by simonsabin | with no comments

To say the type system in SQL is lax is an not quite correct, its actually lax, in consistent and very annoying.

The most common feature I come across is that of integers and decimals.

If I declare a variable as decimal(10,4) and assign it the value 10/4 you could expect a number of things to happen

1. The code doesn't run because you are casting an integer to a decimal

2. The code runs and stores 2.5 in the variable

3. The code runs and does the calculation based on integers 10/4 and realises truncation will occur (as the destination type is decimal) and raise a warning

4. The code calculates the 10/4 as integers so results in an integer of 2 and sticks it in the variable.

In a nice type safe system, i.e. c# one would get 1. The result of an integer/integer is an integer and you can't store an integer in a decimal. This is nice because to get the value stored you have to explicitly do a cast. This makes you think and realise that you probably have to cast the 10 or the 4 to get the required decimal value of 2.5 returned.

In TSQL however you get 4.

If you want to look at inconsistencies then look at ISNULL and COALESCE. ISNULL returns the same type as the first parameter and COALESCE returns the data type based on evaluating all the data types passed to the function and evaluating the top datatype

declare @i int

declare @s varchar(100)

select ISNULL(@s,'100') + '100', ISNULL(@s, 100) + '100'

      ,COALESCE(@s,'100') + '100', COALESCE(@s, 100) + '100'

This is sort of where I got stuck today, I was using the Radians function to use in a conversion from longitude and latitude to OS grid reference. This is a shocking formula and one that I have javascript and C# versions but not a TSQL one. I couldn't for the life of me figure out why the TSQL result was vastly different.

So I debugged the code, thank you SQL 2008 (although I did find a nice feature I will blog about later).

What I found was this line wasn't returning the same value in TSQL as in C#, it was returning 0 in TSQL, and even I know 49 degrees isn't 0 radians

declare @lat0 numeric(29,15)= radians(49)

Gosh I thought I had found bug in TSQL. Then I had a euraka moment, you're probably already there. I changed the code to

declare @lat0 numeric(29,15)= radians(49.0)

and it returned the correct value. On looking at BOL it does say that radians returns the same data type as the value passed. Which to me sounds mad because the most even for a value of 360 degrees the radians value is only 6 and so the decimal places are very very important. Whats more  any value of degrees between 0 and 90 is < 1 and so if you only specify an integer for the degrees, as in my case above, you will get 0.

Anyway lesson learnt. So from now on as well as always changing 10/2 to 10.0/2 I will be making sure any literals passed to functions like this will also have .0 tagged on the end.


-

If you like to be on the bleeding edge and want to try out the R2 release of SQL Server 2008 during the CTP cycle make sure you register here

http://www.microsoft.com/sqlserver/2008/en/us/r2.aspx Spaces are limited so get in quick


-
Posted by simonsabin | 1 comment(s)

I earlier posted about using derived tables with UPDATE and DELETE to be able to update or delete the TOP n rows (http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx).

In many large scale systems archiving is essential and so being able to efficiently archive data is essential, but more often than not your selectivity of what to archive is not based on the primary key.

Whats nice is that, if you have an index that supports this, you can still use the same behaviour as demonstrated in the other article to efficiently delete the rows you want. I have often had indexes on large tables just to support the archiving process.

Take this example populated table. 

drop table Orders

go

create table Orders (Id int primary key, PadingCol char (100), orderDate datetime, status int)

declare @i int

set @i=1

SET NOCOUNT ON

set statistics time off

set statistics io off

while (@i<100000)

begin

  if @@TRANCOUNT = 0

    begin transaction

  insert into Orders values (@i,'x', DATEADD(day,rand()* 1000,GETDATE()), CAST(rand()+.2 as int));

  if @i % 2000 = 0

    commit transaction

  set @i=@i+1

end

if @@TRANCOUNT > 0

  begin transaction

go

create index IX_Orders_status on t1Orders(status, orderDate)

Note the index created on the status and OrderDate.

If we want to archive the oldest closed orders (status 0) you can do the following

delete orders

from (select top (100) *

        from orders

        where status = 0

        order by orderDate) orders       

 This works because an index is sorted by the keys, but in this situation the index keys are status and then orderDate. Well because we are filtering on an exact key value the rest of the index values for will be sorted by the second key which is the orderDate.

If you make the filter on status be multivalue i.e IN(0,1) the plan changes to require a TOP N sort.

or with parallism

These both have very expensive sort operations

What happens if we change the index and only have OrderDate in the key

drop index  Orders.IX_Orders_Status

go

create index IX_Orders_orderDate on Orders(orderDate) include (status )

If we now do the delete we get the same plan, sort of.

The key thing is the index scan, this now has a predicate in it. Which does the status filtering. but because the scan is in OrderDate order when it passes the data to the TOP operator it is already sorted.

Whats interesting is that if you apply a filter on status that is not a single value i.e. IN(0,1) you still get a very optimal plan. IF and its a big if. The data you are look for is found early on in the scan. In this scenario (if the data were real), the oldest orders are are likley to be old and so will be found early on in the scan. If for instance you were looking for a specific customer, the scan might have scan a lot of rows to find 100 that match. In that case you might want to consider the previous index structure. This is highlighted if you run the delete with the last index, you will find many pages read. The number will be much higher than before because to delete from the clustered index it has to seek to the row to delete the row and this results in pages being read. Whats more the nonclustered indexes also need to be deleted.


-
Posted by simonsabin | 6 comment(s)

The simplest way to delete the top X rows is use to use a derived table.

delete t1

from (select top (10000) *

        from t1 order by a) t1

This is one of the tricks I teach in my Advanced TSQL for Performance and Scalabilty course

The SQLCAT team posted a solution to deleteing the TOP x rows from a table using a view http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx. The view isn't needed you can just use a derived table as above.

You can do similar things with update,

update t1

  set b = 'y'

  from (select top (100) b

        from t1

        order by a ) t1      

If you want to know what was updated or deleted then you can use the OUTPUT clause.

 

delete t1

output deleted.a

from (select top (100) *

        from t1 order by a) t1

 

update t1

  set b = 'y'

OUTPUT inserted.a

  from (select top (100) *

        from t1

        order by a ) t1

 


-

As the title says the May 2009 edition of SQL 2008 books online is now available for download.

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=765433f7-0983-4d7a-b628-0a98145bcb97


-
Posted by simonsabin | with no comments

Trevor Dwyer an ex SQL Server MVP has started blogging on http://sqlblogcasts.com/blogs/sqlartist/default.aspx

His mind is full of all sorts of amazing stuff so I'm looking forward to seeing what he is going to blog about.


-
Posted by simonsabin | with no comments

The slides and demos for Darren’s session can be found here http://www.sqlis.com/post/SSIS-Tips-Tricks-(Presentation).aspx

 

For the list of events coming up go to http://sqlserverfaq.com

 

Here are some answers to the Q&A asked at the start,

 

Replication file sizes

From my investigation the multiple files produced in the snapshot are based on the statistics for the primary key column, base on rows. If your rows have differing sizes i.e. contain different size documents, or your statistics are out of date then you could end up with different size files. The numbers of rows will always be different because the values used to segment the data are those in the statistics and they might not fall on the exact boundary. Look at the output from DBCC SHOW_STATISTICS (table,pk_constraint)

 

WMI security

There is a great section in MSDN on WMI remote access http://msdn.microsoft.com/en-us/library/aa389290(VS.85).aspx

As expected there are a number of aspects including. WMI uses RPC, this uses port 135 which is a common security flaw and used by many viruses including Nimda, CodeRed and Slammer(http://us.trendmicro.com/us/threats/enterprise/glossary/n/network-viruses/index.php).  Or DCOM permissions might not allow remote uses to start the WMI service.

The section in MSDN should help you through the issues.

 

A big thank you to Redgate for sponsoring the event and I hope to see you again

 


-
Posted by simonsabin | with no comments

Often when you paste TSQL code from Word, an email or off the web you often find blank lines are inserted in the code.

I think the issue is due to parsing paragraphs and non pagaraph breaks i.e. <p> and <br> They have different meaning in HTML/Word world where <p> has extra space after it (normally). But with text there is no way to specify any difference in text world to give the extra space after pargraphs they stick in another carriage return.

There is a simple solution, you can get rid of these by doing a search and replace

Search for \n:b*\n and replace \n and make sure use regular expressions is checked

The :b* allows for the odd tab or space on the blank lines


-
Posted by simonsabin | 2 comment(s)
Filed under:
More Posts Next page »