September 2010 - Posts

Rowversion vs Timestamp !! what’s the word then microsoft?

I like to promote all things SQL Server whenever I can and when the subject of allocating some time to maintenance of our applications was raised I was quick to suggest we could replace the numerous Timestamp columns with Rowversion so that we future proof our application in that area.

Here’s the latest from BOL SQL 2008 R2

“Avoid using this feature …. “ – well no problem, I duly sent out an email to our developers explaining how we should be using rowversion and not timestamp…………….  doh !  did I feel stupid when they asked how to add soemthing that wasn’t there ( GUI ) or got changed by SQL Server ( scripts )

Here’s the GUI

 

Look there it isn’t !! not a rowversion in sight .. no worries we’ll script it in then, that works …

create table dbo.testing
( numkey int,
Thedate datetime,
dbTimestamp rowversion
);

--Here’s the table scripted from SSMS after I’ve created it.

/****** Object:  Table [dbo].[testing]     ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[testing](
      [numkey] [int] NULL,
      [Thedate] [datetime] NULL,
      [dbTimestamp] [timestamp] NOT NULL
) ON [PRIMARY]

GO

oh no it doesn’t – you can script your table with rowversion but it gets changed to Timestamp.

Now I don’t think this is very good at all, however it is consistent in as much as this is standard behaviour in SQL 2005, SQL 2008 and SQL 2008 R2, so that’s all right then!

It’s amazing what I didn’t know about dbcc !!

As part of a series of processes I manipulate ( cleanse )  a copy of production for use in a test/support environment, one of things which I do is switch to simple recovery and shrink the t-log. This process is automated overnight and is intended to present a restored environment the following morning. Anyway as part of the process I use dbcc updateusage to ensure that the t-log gets fully shrunk after putting the database to simple recovery. For reasons I’m not going to list I want the ldf file to be as small as possible.

To make sure the shrink works I use  dbcc updateusage(0) with count_rows;   if you’ve ever had to deal with a log that refuses to shrink or buffers which refuse to clear you’ll understand.

What I wasn’t aware of was that although dbcc updateusage will load some data into buffer cache the with count_rows option will attempt to load the entire dataset into cache. This had the effect of essentially clearing the buffer cache for my production database .. not good.

How did I track what was happening? Well I repeated the process on a DR server monitoring the buffer cache as I went, the sql for this is below; If your database is bigger than your memory this isn’t something you want to do most likely, very interesting but I guess unless you’re actually looking for this you won’t know. BOL has no mention that I can see of the downside of using this command – fortunately  my database is small, had it been a couple of terabytes the imapct might have been very severe

select db_name(database_id) as db_name, 
count(page_id)as number_pages,count(page_id)/128 as mb
from sys.dm_os_buffer_descriptors 
where database_id !=32767 
group by database_id 
order by database_id;
Posted by GrumpyOldDBA with no comments

IF Exits vs Count(*) – just when you thought it was safe ….

Here’s a couple more of the topics I covered in my presentation “ Gone in 60 Nano seconds “.  It’s tricky sometimes formatting posts to display correctly on the blog so for most of the material from the presentation I’m going to link to the pages on my web site.

As I’m going to put all the topics there the previous post about adding columns is included. I’ve put the example of count(*) seriously out performing an if exists  and I’ve expanded upon the calculated columns topic. In the presentation I talked about how I found out I had this issue, well I’ve put in the performance graphs which first alerted me and tried best as I can remember what else I said.

The main gist of the presentation was to talk about performance without an index in sight – as I remarked there’s only so many indexes you can add to a database and at some stage you have to examine the options of changing how a query is written to gain performance.

Here’s the links to the next two topics and the previous.

http://www.grumpyolddba.co.uk/gonein60ns/IfExistsvsCount.htm

http://www.grumpyolddba.co.uk/gonein60ns/calculatedcolumns.htm

http://www.grumpyolddba.co.uk/gonein60ns/NoMoreColumns.htm

 

As I talked a lot about the concepts of partitioning data ( not just partitioned tables ) I hope to put all I said into one or two documents by the end of the week, there are about 30 or 40 pages in total so it takes a little while to format them correctly into html – please bear with me.

Bye Bye Timestamp it was nice knowing you

In passing I discovered that SQL Server 2008 R2 doesn’t support the timestamp data type any longer. If you’re still using timestamps better do a find and replace for rowcount.

Posted by GrumpyOldDBA with no comments
Filed under:

Don’t add that column

This is the first of the topics from my presentation “ Gone in 60 nano seconds “

 

As a quick summary this is why additional columns can cause degraded performance.

Background

  • I put this brief list of points together after having a carefully tuned query degraded by the addition of a column onto one of the tables and into the join.
  • Performance tuning is more than just adding indexes,  and the performance of an application can be affected by many factors, but above all you want to be able to give your users a happy experience.
  • Queries which perform slowly can frustrate users as well as give your application ( e.g. database – e.g. SQL server ) a bad name, which we don’t want.  Users constantly hitting refresh for slow running  and/or heavy io queries compound the problem further.
  • When I put these points one response was what to do if there was a valid business reason to add a column to a table, my view is that there never is a valid “business reason” because you the developer are in control of the schema and have the choice on how to handle data or more likely extra attribute(s).
  • It’s never going to be clear cut, all I wanted was that before adding extra columns to a table some thought was applied to the possible adverse affects of the action.

1.    Most important - if a query has been optimised with an index for a table and an additional column is added to that query for that table then the index becomes invalid.

There are two possible outcomes 

a)  A table scan

b) a key lookup  which means at best two reads of the table for each row. This may not sound so bad but if the lookup will always be a clustered index seek/scan - any clustered index seek/scan always returns/reads the entire row to cache even if only one bit value is required. If a table is wide and has a large (?) number of rows the impact can be such that the entire query may take many times longer even to the extent of timing out.

This may seem extreme but consider this simple example,  a company table only has 1.6 million rows and is fairly narrow with a max row size of 1076 bytes.

A bit field is added, the actual table size does not alter much.

Picking an index that has a max row size of 60 bytes, one page read ( one io ) will contain approx 140 rows.

If we don't use this index and need a clustered index seek ( best option )  140 rows will actually require 20 reads ( 7 rows per page )  so instead of 8k of data being read we will have read 160k.

A bookmark will use a typical index and the clustered index, in this case  41 reads, however there's always a possibility of a scan  1.8GB of data ( 229,000 page reads ) - a long way from 1 io and 8k.

A  client had this type of issue by adding a bit field to indicate a row being deleted , so queries were changed to include        " and deleted = 0 " but  indexes were not changed.  This is what inspired my SQL Bits presentation “ Can I index a bit? “

( btw  a not in the where will always produce a full scan )

2.    The column may not produce a useful index

Generally columns added to an existing table will contain something to be checked or retrieved. Should this new column not backfill variable values to existing rows then say taking a  Sales table  as an example, with say 11 million rows. Should existing rows set a default value or worse still null  then we will have 11 million rows with one value and slowly added new rows having values.

The optimiser will ignore an index on this column, generally a secondary index requires better than 90% selectivity for the optimiser to use it ( there's always exceptions of course )

3. The table grows so that there are less rows per page.

Rows fit within a 8kb page, therefore a row of approx 1000 bytes will fit 8 per page, 11 million rows will occupy  1,375,000 pages

If the row size changes so that we can only fit 7 rows per page we'll need  1,571,429 pages, an increase of  1.56GB  (  1 page = 8kb )

a select returning 1,000 rows will now need 143 io compared to 125 io previously, this may not seem much but sometimes it's the small things that hurt.

4. The column will probably be nullable

Indexes on nullable columns ( or including nullable columns ) are 3 bytes a row wider. On our example Sale table  this means any index including the added column will be approx 32MB larger just for the null

This is pretty small stuff in database terms, when I presented on index analysis I was working with tables of 400 million rows and more, here a null on an indexed column made a size difference of 1.1GB - scalability is important.

5. The table just gets larger

Adding an integer  will increase our example  Sale table by  approx 42Mb, a datetime  84MB, a guid  168MB   ( you should never use guids as keys if there is any other option available )

Adding columns has the potential to increase the io on every query using that table.

6. May cause locking and blocking

It may also increase the number of locks taken and may cause escalation to table locking.

Every access of a table/row requires locks, every lock requires resource, there's only so much resource. At some point SQL Server may escalate any type of locking to optimise resource, if additional columns cause table and/or index growth  ( as point 3 ) then the number of locks may kick over the escalation threshold and a table lock occurs, shared read locks stop updates and deletes.

 

Just a few things to consider if you're thinking of adding a column.

 

© www.grumpyolddba.co.uk  September 2010

Posted by GrumpyOldDBA with no comments

SSRS – Do you back up ReportServerTempDB ??

For one reason or another one of our Dev SSRS ( 2005 ) Servers did an unexpected restart and came back up with the ReportServerTempdb database marked as suspect.

Interesting! There are no backups taken on this box but I figured anything TempDB would just recreate if I restarted SQL Server. Yeah I know I should know this but I suppose it shows just how resilient and reliable SQL Server is – I run SSRS on laptops and test servers at home and none of these systems get 100% clean shutdowns.

Anyway I soon discovered that ReportServerTempDB doesn’t get recreated – a quick BING brought up a couple of forum posts which explained you should drop the database, create a new database, put back the role, run a setup script to recreate the objects and reset the permissions.

First up no script anywhere on my install DVD but there were a couple of copies in a ServicePack and CU. Alas having done this I kept getting messages about a missing column.  After some deliberation I identified the table with the missing column, well without the column to be absolutely precise and the table was different between my prod SSRS server and the rebuilt database.

So I just scripted out the table and permissions and ran the script against the Dev box and I was back in business.

Other options on my list not tried were a backup and restore from another server and a complete db generate script from another server.

So there you have it – make sure you backup ReportServerTempDB, you may need a backup one day. Should I have known this ? Absolutely , am I the only one ? possibly not.

Posted by GrumpyOldDBA with 2 comment(s)
Filed under:

Presenting to VBUG Bracknell UK 7th Sept

http://vbug.co.uk/events/default.aspx

I'm making a presentation on some aspects of my work for Regus   http://www.regus.co.uk/   for the UK  or  http://www.regus.com/  for the USA. Regus are the world leading provider in office space sliced and diced any way you might like.

At some point a DBA recognises there are only so many indexes you can add and so much hardware you can throw at a database and must get down to the finer points of T SQL. I'm hoping to present some interesting aspects of how sometimes a solution is anything but obvious. As always the presentation will be real world based; some of the subjects I hope to cover include dealing with archiving through partitioning and how you go about applying partitioning to a production database. I hope to give some interesting insights into partitioning you won't find in BOL, and this isn't for a DataWarehouse either.

I have an example of a production issue resolved with calculated columns, but this goes just beyond a simple table change. I have some interesting solutions with queries that the optimiser can't seem to resolve. I have a nice paper on why you shouldn't add columns to a table in an existing database. I also plan to show how to delete 10GB of data from a running database in full recovery ( on my single cpu laptop ) in a matter of a couple of seconds. ( It runs in a few millseconds on my server but will take a little longer on my laptop no doubt )

You could also win an exclusive GrumpyOldDBA Mug.

I've been looking at the use of the CLR vs T-SQL for functions that don't use  dml - as yet I've only a couple of examples but the results were interesting

Posted by GrumpyOldDBA with no comments
Filed under: