January 2009 - Posts

Tracking contention on the SAN - testing Times

One hotly disputed topic on SAN performance is the matter of contention.This may materialise in the fabric due to over subscription or excessive fan in. A fibre channel fabric is a network which connects servers and storage through switches, just like an Ethernet network connects us to the internet and to each other in the office.

Ultimately there will be less ports on the actual storage then there are servers and this is where problems may arise.

 basic image of a core edge networkpicture                       















·         This is a highly simplified drawing illustrating the principles of a core edge network. There are 10 servers each with let us say 2 x 4GB HBAs. Network speeds always take the lowest common denominator so the two connections into the storage will also be at 4GB.  ( This is an over simplification )

·         Essentially there’s more potential bandwidth than supported by the storage;  best example I can think of might be the security scans at an airport, 20 check in desks but only 2 body scanners. ·         An area of contention can be the actual disks or luns, there are different ways of configuring SAN storage, luns may be mapped to physical sets of disks or mapped virtually to an unspecified number of spindles. ( We won’t go into this here ). If you share disks/luns there is always the possibility of contention. You can try this at home if you’ve created more than one partition on your laptop or PC harddrive. Just run a job or task, then run repeat but whilst copying a large file(s) to the other partition. ·         It is possible that the backplane bandwidth may not be sufficient, most storage has a number of ports, these might be allocated as 8GB, 4GB and 1GB and may relates to types of disks or certain trays, 8 x 8GB ports suggests 64GB bandwidth, however this may not be the case – check with your friendly vendor. ·         Finally the HBAs and switches may suffer due to buffering, SQL Server tends to push out lots of small io, this is different from a fileserver and there may be latency within the HBA and/or switches. ·         So how can you tell? Well I’ve been testing by running a job at regular intervals, it’s up to you what you choose but ideally it should be able to provide consistent run times and results, should be repeatable and ideally portable. It should, in my view, be a sql server test using sql code.  ·         Here’s what you might see as results from an hourly test, Red designates failure due to an overrun of time.·         The results are hypothetical, but one might deduce that 18:00 shows users going home, the red areas may indicate this when backups occur, the 13:00 run is better, lunchtime maybe?·         If your application is 7 x 24 then this type of result is bad news. ·         Bear in mind this is from a test so the results are not affected by lunch breaks or going home, but most likely the SAN is.·         Some typical contention I have encountered over the years has come from Exchange, this manifested as increased io latency with no change in application load, yet another subject area!  
Date – start time Duration( h:m:s )
12/01/2009 08:05 00:34:35
12/01/2009 07:05 00:36:32
12/01/2009 06:05 00:33:42
12/01/2009 05:05 00:44:58
12/01/2009 04:05 00:37:19
12/01/2009 03:05 00:43:32
12/01/2009 02:05 00:54:59
12/01/2009 01:05 00:55:00
12/01/2009 00:05 00:54:59
11/01/2009 23:05 00:54:59
11/01/2009 22:05 00:54:59
11/01/2009 21:05 00:54:59
11/01/2009 20:05 00:46:09
11/01/2009 19:05 00:34:32
11/01/2009 18:05 00:23:43
11/01/2009 17:05 00:27:58
11/01/2009 16:05 00:28:39
11/01/2009 15:05 00:34:25
11/01/2009 14:05 00:32:45
11/01/2009 13:05 00:29:47
11/01/2009 12:05 00:33:36
11/01/2009 11:05 00:29:41
11/01/2009 10:05 00:33:51
11/01/2009 09:05 00:36:48
11/01/2009 08:05 00:31:22
Posted by GrumpyOldDBA with 1 comment(s)
Filed under: , ,

SQL Server 2008 Information

I was searching for a particular document for SQL 2008 and thought I'd list the links to what I found, there's some especially good white papers in Technet, I can recommend the T-SQL enhancements and the indexed view white papers especially:

SQL Server 2008 White Papers         http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx

Technet SQL 2008 White Papers      http://technet.microsoft.com/en-us/library/bb418496.aspx

A list of blogs and other sites, sadly no Grumpy Old DBA   http://msdn.microsoft.com/en-us/sqlserver/bb671054.aspx



Posted by GrumpyOldDBA with no comments
Filed under: ,

Testing Times - mdf fragmentation

Now I've never been convinced that file level fragmentation is irrelevant on a SAN, at least two sources have assured me that this is the case, but then they also assured me of a number of other points most of which it appears I have proved to be basically incorrect. < grin >

Now I've followed with great interest a series of posts by Linchi Shea  http://sqlblog.com/blogs/linchi_shea/ concerning SAN fragmentation, if you don't subscribe to this blog then you're really missing a great source of technical knowledge, without this blog I would have struggled ( well I still struggled but that's another story ) with trying to get the storage teams to consider HBA queue depth for a start.

So I decided to come up with the insane fragmentation test: Assume that there is no DBA and you're creating a number of databases on your server; here's what I did;

I created three databases and then ran my test 1,  which creates and populates a 1 million row table of approx 8.5GB, in each database simultaneously. The population of the databases caused auto growth and true to form windows + sql server managed to totally fragment all three databases despite ample free space on the LUN.

On a serious note this should be taken as a warning as to what can happen if you don't manage database growth. Here's the output showing my fragmented databases

 Fragmentation insanityThe three databases are called Stresful, rubbish1 and rubbish2.














  • I then emptied the datafiles of data and then ran my scripts which populated the three databases with 1 million rows of data each, I did this three times.
  • I dropped all the databases, created non fragmented databases and repeated the test three times.
  • The three runs against the fragmented databases took on average 106 mins.
  • The three runs on non fragmented databases took on average 16 mins

It's not for me to name the particular SAN that I used for this test but it is a serious enterprise bit of kit and is claimed to be able to scale with every condition known to man and maybe a few more!  To the best of my knowledge the HBAs are 4GB, there's more than 1 of course because this test was run on a cluster.

  • Logically as this was all data inserts I knew the performance would be bad and I intend to repeat the test with a smaller number of fragments and to test updates as well as inserts, see my other posts.
  • We don't always have just one database on our servers and we don't always have a sperate lun for each mdf file, much as some of my tests showed that creating multiple files ( not filegroups ) for your database could degrade performance the movements of the disk heads I would suggest are the factor here.

Linchi only did a test on single files, maybe running multiple tests would produce a different result.

Maybe the particular SAN I'm testing against is different, sadly like probably most DBAs I don't have acess to multiple hardware platforms, it's only due to a data centre migration I've been able to run these tests, I'm attempting to show that the migration does not bring any degredation of performance compared to our current data centre.

I should give a quick few words of thanks to Tony Rogerson http://sqlblogcasts.com/blogs/tonyrogerson/  who has fended a number of my questions and who also run SAN benchmarks  http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/09/22/1089.aspx 

I hope this post formats correctly - always a bit of a gamble when adding images!



Posted by GrumpyOldDBA with no comments
Filed under: ,