June 2007 - Posts

Continuing the new blogs beeing added to sqlblogcasts.com I'd loke to welcome Ray into the blogging world

http://sqlblogcasts.com/blogs/rayb/archive/2007/06/28/microsoft-sql-server-2005.aspx

We also jave Andrew http://sqlblogcasts.com/blogs/acalvett/default.aspx and Leo http://sqlblogcasts.com/blogs/leopasta/default.aspx (Love the title)

I look forward to seeing you all at the sqlblogcasters dinner http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/06/22/sqlblogsters-first-quarterly-meal-in-london.aspx

 



-
Posted by simonsabin | with no comments

I've just published my RSS feed in feedburner. The link is http://feeds.feedburner.com/SimonsSqlServerStuff

Currently there are this many subscribers.



-
Posted by simonsabin | with no comments
I have finally got round to looking into the error reports and found a few bugs in this component.

I have uploaded a new install for this component including the fixes. The component now validates correctly that the correct inputs are attached to the component.

The key to using this component is to attach two outputs from the upstream component, the error output and the normal output. The normal output should be connected to the passthrough input of the enhanced error component and the error output connected to the error input.

The enhanced error component uses the normal output during the pre execution phase to map the column LineageIds to column names, after that the normal output is not touched it just passes through the component, thus the name of the input.

I hope explains how the component should be used.



-

I've decided its time to move on from Totaljobs Group to work for myself. That leaves an opportunity for someone to join Totaljobs Group and be responsible for a really challenging database architecture.

Totaljobs Group is the largest internet job board company in the UK and has been growing rapidly for the past 6 years. We are responsible to for 9 job sites in the UK and internationally with more planned in the future.

SQL Server is at the heart of the job sites providing many functions including the core job and candidate searching services. You will be responsible for,

  • Developing the database architecture
  • Mentoring the SQL team 
  • Liasing with the business to develop technical solutions
  • Reccommending and championing the use of SQL Server to enhance the services TJG offer

Totaljobs Group is an early adopter of new technology where it benefits the business, SQL 2005 was in production shortly after RTM and there is already a commitment to using SQL Server 2008 (Katmai).

If you like to develop innovative solutions in SQL Server and want to see how you can get the most out of SQL Server, working in a fast moving, friendly, relaxed environment then this is the job for you.

If you're interested then you can apply for the job here



-
Posted by simonsabin | with no comments

I've got a 50% off voucher for SQL 2005 certification exams, only downside is that it has to be used by the end of the week.

If you want it let me know and I'll send you the code.

First come first served.



-
Posted by simonsabin | with no comments

If you are lucky enough to be running enterprise edition of SQL you may have looked or want to look into online index rebuilds. Unfortunately having blobs in your index blocks you from being able to do online rebuild of that index. Cruically what that means is that if your table is clustered then you cannot rebuild the clustered index on that table

You can still reorganise you index with blobs.

For this reason my reccomendation is to consider seperating out your blob data into another table. For instance if you stored articles in your database, you would have an article table. In this table you will store the contents of the article along with other information about the article. Many queries are likely to query the information and not the article contents. If you stored the article contents on the same table as the other information you wouldn't be able to re-index the table if it had a clustered index.

So in this scenario you would be best off with an Article and ArticleContents table.

This mechansim is also of benefit if using a full text index. Having a seperate table allows you to isolate full text to that table. In addition it allows you to drop, recreate and even repopulate your data without impacting your main table with the core information in it.



-

Tony's blogged about the use of heaps http://sqlblogcasts.com/blogs/tonyrogerson/archive/2007/06/24/row-fragmentation-hopscotch-heap-v-clustered-and-io-cost.aspx. Whilst this highlights data on a page being out of sequence and thus yoyoing when you scan the data, it doesn't mean you should always have a clustered index.

The stats in Tony's blog are only because the query is a full table scan in the first (heap) query. With a clustered index you are not doing a table scan your are only doing a partial scan and so the reads are different. If you remove the where clause thus forcing a table scan then the heap is qucker because it is more compact, it has to read less pages.

You can also address these issues with a covering index. An non clustered index is effectively the its own special table with a clustered index with the leaf pages also containing the Row id or Primary key column.

Also the issue here is highlighted by a scan. If you are performing an operation that does a seek heaps aren't bad. Also clustered indexes result in the whole row being read, thus if you only need a few rows they can also be worse than a heap with a covering index.

Finally having a clustered index is beneficial if you key size is < 8 bytes because the key is put in the leaf of non-clustered indexes. Whereas a heap the row id is and the row id is 8 bytes

Finally with clustered indexes you will end up with page splits, this can result in page fragmentation across the database file, the worst case would be that in order to read 8 pages (8k), 8 extents(64k) would have to be read because each page is on a different extent. The ALTER INDEX REORGANISE and REBUILD address this fragmentation by the former moving pages to be in the same order on disk as logically and also compacting pages in and the latter rebuilding the index entirely. REORGANISE is an online operation.

Following this post Tony and I have had a very long discussion. The bottom line is that if there is one thing you should learn its the internals of indexes. This would be my starter, heaps and b trees, covering indexes, bookmark lookups, included columns, page fragmentation, page splitting ...



-

We are currently implementing a data warehouse and the design of the time dimension was recently being done.

Firstly it was agreed that day and time should be split out. Combining the two into one dimension is nuts your dimension would be huge and unworkable.

So considering the day dimension what should the key be.

There are a number of options

1. No surrogate key just use the date (with no time component as we don't have a date datatype yet)
2. Use a generated surrogate key i.e. an identity column
3. Convert the datetime to an integer. As the decimal portion of a datetime represents the time, if you remove it then you have a number that represents the day.
4. Use an integer representation of the date as a string i.e. 1 july 2007 becomes 20070601

I am sure there are others we just considered the above.

The following are the pros and cons of each.

1.
Is simple because you don't have to translate the date in your ETL (if you don't have time in your datetimes). However a datetime is 8 bytes compared to a 4 byte integer.

2.
This requires you to lookup the relevant key in your ETL and has no meaning when used in queries so to get the date you always have to join to the time dimension. This can use a smallint, but do you realy need it

3.
This is simple because you can obtain the integer and remove the time component easily by doing floor(cast(datecol as float)). However you are basing your key on the internal representation of a datetime in SQL Server, an example of the problems you have with this is using this number in Excel. Excel starts at 31 Dec 1899 not 1 Jan 1900 and also has a different leap year at some point, which means your dates can be 2 days out. In addition to get the date information you need you need to join to the time dimension or convert the number to a datetime.

4.
This is also simple to calculate by doing, cast(convert(char(8),@d ,112) as int). Ok so the overhead is greater than 3, however the calculation still takes .00059ms compared to 0.00017ms for 3.
The big benefit is that the key is a meaningful value that makes it much easier to query.

So option 4 is my preferred route and having spoke to a number of fellow MVPs they all agreed that whilst meaningful surrogate keys are something to be avoided, this is one case where the rule should be broken.



-

The code and slides from my presentation from today (Service Broker - Asynchronous processing in SQL )are now online.

The code is on codplex http://www.codeplex.com/MultiThreadedSQL,

The slides are on http://sqlblogcasts.com/files/folders/servicebrokerpresentations/entry1885.aspx

and a video of the service broker restaring after I stopped SQL Server is here http://sqlblogcasts.com/files/folders/servicebrokerdemos/entry1884.aspx



-

I have just finished my presentation on asynchronous process/parallel processing using Service Broker at the Microsoft Technology Conference in Ireland. I was a bit tentative after deciding last night, after a few guinesses and whisky (thanks Barry), to add a new feature to my presentation, failure.

Yep I thought it would be great if I did a demo of how service broker handled failure and performed reliably. How can I simulate failure I thought, well the best I could come up with at short notice was to shutdown Sql Server with no wait whilst service broker was processing.

Anyone that has done presentations will realise that effectively killing SQL Server mid demo is probably not the wisest idea., any way the presentation was going well and so I went for it.

In my demo Service Broker is configure with activation to run an SP process the messages and upload the file.

So I kicked of my process that results in ~180 messages(files) sitting in the queue to be uploaded. I then showed that Service Broker was processing some messages and then shutdown SQL.

Once shutdown, my app dies because it can't access the DB.

I then restart SQL and pray.

Eventually the databases recover and service brokder starts processing the messages again without any intervention.

Wow it worked. When I tried it last night I was quite shocked that it worked.

A video of this will is available here http://sqlblogcasts.com/files/folders/servicebrokerdemos/entry1884.aspx



-
More Posts Next page »