I've decided to implement a new blogging infrastructure based on WordPress; the community server stuff I'm using has got a bit tired and WordPress offers a wealth of plug-ins to do pretty much everything I want.

So, see http://dataidol.com/tonyrogerson for my new blog; its not just a move, I have broadened my coverage to encompass all things data, the first couple of posts talk about Short-Stroking hard disks (something I've presented on a number of times now), I've also got some Erlang content.

Anyway, enjoy!

T

Last October {2011} we held our the first ever "SQL Relay", this title was chosen because it was to be held on five consecutive days, with each of the UK regional SQL Server User Groups hosting an evening event in a relay fashion.

These events were a tremendous success; we all had a lot of FUN hosting the events, and had received some excellent feedback from both our regular and first time attendees to the User Group events.

 

Having known about the impending release of SLQ Server 2012 for a while now, the same UG Leaders and few new ones have been working tirelessly behind the scenes to organize the “SQL” to “SQL Relay” in the imaginatively titled “SQL Relay 2012” – Hey we’re DBAs not marketing Gurus!

 

Once again, we will be holding five days of regional events, the various user group leaders have been working very hard in the background, this time we are providing five full day events and they are FREE

 

These full day events will comprise of a morning overview of SQL Server 2012, which will be given by Microsoft speakers and Partners. The afternoon session will focus on some deeper technical content on the 2012 features; these will be delivered by UK SQL MVPs (Microsoft Most Valuable Professionals). With the evening events being a great way to complete the day or join us if you weren’t fortunate enough to be able to attend our morning or afternoon sessions, the content of these evening events will vary from location to location on their individual style and content – more details to follow soon.

 

Each of the five events will have their own registration and there are limitations on the number of attendees each event can hold, so be sure to register as early as possible to secure your place, you can attend as many or as few sessions as you wish, just indicate your preference when registering.

 

You may be asking “How much does It Cost to attend?” These are FREE events, due to HUGE generosity of our sponsors.

When and Where our these events being held

21st May Edinburgh
http://sqlserverfaq.com/?eid=378
Microsoft Edinburgh, Waverley Gate, 2-4 Waterloo Place, Edinburgh, EH1 3EG

22nd May Manchester
http://sqlserverfaq.com/?eid=373
The Co-operative, CIS Tower, Miller Street,
Manchester, M60 0AL

23rd May Birmingham
http://sqlserverfaq.com/?eid=357
Lakeside Centre, Aston University's Conference Centre, Birmingham, B4 7ET

24th May Bristol
http://sqlserverfaq.com/?eid=391
Avon Gorge Hotel, Sion Hill, Clifton, Bristol, BS8 4LD

30th May London
http://sqlserverfaq.com/?eid=377
Microsoft London (Cardinal Place), 100 Victoria Street
London SW1E 5JL


Aside from learning from the excellent speakers throughout the day, you will have the chance to network as well as chances to win prizes

 

Why not bring along a friend/colleague – Many of you work in teams or have SQL professional friends, so we have a special invitation and competition for you

Simply register and indicate a friend you’d like to invite – if both of you attend you also have a chance to both win a fabulous {book} prize.


Would you like to know more on this these events? fear not as we will be providing further details over the next few weeks in some follow-up emails

We like to ensure you get the information right @ your fingertips - So we also have also put all the relevant information on the events @ these locations as well.

FACEBOOK: https://www.facebook.com/SQLRelay2012 & https://www.facebook.com/SQLRelay2012/events
LINKEDIN GROUPS: http://www.linkedin.com/groups?gid=4153765&trk=hb_side_g & http://www.linkedin.com/groups?gid=2904068&trk=hb_side_g
TWITTER: Simply follow this hash tag for all the latest news on the event #sqlrelay

Want to attend one of the User group events, visit our group website www.sqlserverfaq.com for a full listing of all our events, we also have a map of all UK User Groups http://tsqltidy.com/newmap/map.htm

 

 

It is often said that Hard Disk Drive storage is significantly cheaper per GiByte than Solid State Devices – this is wholly inaccurate within the database space. People need to look at the cost of the complete solution and not just a single component part in isolation to what is really required to meet the business requirement.

Buying a single Hitachi Ultrastar 600GB 3.5” SAS 15Krpm hard disk drive will cost approximately £239.60 (http://scan.co.uk, 22nd March 2012) compared to an OCZ 600GB Z-Drive R4 CM84 PCIe costing £2,316.54 (http://scan.co.uk, 22nd March 2012); I’ve not included FusionIO ioDrive because there is no public pricing available for it – something I never understand and personally when companies do this I immediately think what are they hiding, luckily in FusionIO’s case the product is proven though is expensive compared to OCZ enterprise offerings.

On the face of it the single 15Krpm hard disk has a price per GB of £0.39, the SSD £3.86; this is what you will see in the press and this is what sales people will use in comparing the two technologies – do not be fooled by this bullshit people!

What is the requirement? The requirement is the database will have a static size of 400GB kept static through archiving so growth and trim will balance the database size, the client requires resilience, there will be several hundred call centre staff querying the database where queries will read a small amount of data but there will be no hot spot in the data so the randomness will come across the entire 400GB of the database, estimates predict that the IOps required will be approximately 4,000IOps at peak times, because it’s a call centre system the IO latency is important and must remain below 5ms per IO. The balance between read and write is 70% read, 30% write.

The requirement is now defined and we have three of the most important pieces of the puzzle – space required, estimated IOps and maximum latency per IO.

Something to consider with regard SQL Server; write activity requires synchronous IO to the storage media specifically the transaction log; that means the write thread will wait until the IO is completed and hardened off until the thread can continue execution, the requirement has stated that 30% of the system activity will be write so we can expect a high amount of synchronous activity.

The hardware solution needs to be defined; two possible solutions: hard disk or solid state based; the real question now is how many hard disks are required to achieve the IO throughput, the latency and resilience, ditto for the solid state.

Hard Drive solution

On a test on an HP DL380, P410i controller using IOMeter against a single 15Krpm 146GB SAS drive, the throughput given on a transfer size of 8KiB against a 40GiB file on a freshly formatted disk where the partition is the only partition on the disk thus the 40GiB file is on the outer edge of the drive so more sectors can be read before head movement is required:

For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 3,733 IOps at an average latency of 34.06ms (34 MiB/s).

The same test was done on the same disk but the test file was 130GiB: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 528 IOps at an average latency of 217.49ms (4 MiB/s).

From the result it is clear random performance gets worse as the disk fills up – I’m currently writing an article on short stroking which will cover this in detail.

Given the work load is random in nature looking at the random performance of the single drive when only 40 GiB of the 146 GB is used gives near the IOps required but the latency is way out.

Luckily I have tested 6 x 15Krpm 146GB SAS 15Krpm drives in a RAID 0 using the same test methodology, for the same test above on a 130 GiB for each drive added the performance boost is near linear, for each drive added throughput goes up by 5 MiB/sec, IOps by 700 IOps and latency reducing nearly 50% per drive added (172 ms, 94 ms, 65 ms, 47 ms, 37 ms, 30 ms). This is because the same 130GiB is spread out more as you add drives 130 / 1, 130 / 2, 130 / 3 etc. so implicit short stroking is occurring because there is less file on each drive so less head movement required.

The best latency is still 30 ms but we have the IOps required now, but that’s on a 130GiB file and not the 400GiB we need.

Some reality check here: a) the drive randomness is more likely to be 50/50 and not a full 100% but the above has highlighted the effect randomness has on the drive and the more a drive fills with data the worse the effect.

For argument sake let us assume that for the given workload we need 8 disks to do the job, for resilience reasons we will need 16 because we need to RAID 1+0 them in order to get the throughput and the resilience, RAID 5 would degrade performance.

Cost for hard drives: 16 x £239.60 = £3,833.60

For the hard drives we will need disk controllers and a separate external disk array because the likelihood is that the server itself won’t take the drives, a quick spec off DELL for a PowerVault MD1220 which gives the dual pathing with 16 disks 146GB 15Krpm 2.5” disks is priced at £7,438.00, note its probably more once we had two controller cards to sit in the server in, racking etc.

Minimum cost taking the DELL quote as an example is therefore: {Cost of Hardware} / {Storage Required}

£7,438.60 / 400 = £18.595 per GB

£18.59 per GiB is a far cry from the £0.39 we had been told by the salesman and the myth. Yes, the storage array is composed of 16 x 146 disks in RAID 10 (therefore 8 usable) giving an effective usable storage availability of 1168GB but the actual storage requirement is only 400 and the extra disks have had to be purchased to get the  IOps up.

Solid State Drive solution

A single card significantly exceeds the IOps and latency required, for resilience two will be required.

( £2,316.54 * 2 ) / 400 = £11.58 per GB

With the SSD solution only two PCIe sockets are required, no external disk units, no additional controllers, no redundant controllers etc.

Conclusion

I hope by showing you an example that the myth that hard disk drives are cheaper per GiB than Solid State has now been dispelled - £11.58 per GB for SSD compared to £18.59 for Hard Disk.

I’ve not even touched on the running costs, compare the costs of running 18 hard disks, that’s a lot of heat and power compared to two PCIe cards!

Just a quick note: I've left a fair amount of information out due to this being a blog! If in doubt, email me :)

I'll also deal with the myth that SSD's wear out at a later date as well - that's just way over done still, yes, 5 years ago, but now - no.

When comparing SSD against Hard drive performance it really makes me cross when folk think comparing an array of SSD running on 3GBits/sec to hard drives running on 6GBits/second is somehow valid. In a paper from DELL (http://www.dell.com/downloads/global/products/pvaul/en/PowerEdge-PowerVaultH800-CacheCade-final.pdf) on increasing database performance using the DELL PERC H800 with Solid State Drives they compare four SSD drives connected at 3Gbits/sec against ten 10Krpm drives connected at 6Gbits [Tony slaps forehead while shouting DOH!].

It is true in the case of hard drives it probably doesn’t make much difference 3Gbit or 6Gbit because SAS and SATA are both end to end protocols rather than shared bus architecture like SCSI, so the hard drive doesn’t share bandwidth and probably can’t get near the 600MiBytes/second throughput that 6Gbit gives unless you are doing contiguous reads, in my own tests on a single 15Krpm SAS disk using IOMeter (8 worker threads, queue depth of 16 with a stripe size of 64KiB, an 8KiB transfer size on a drive formatted with an allocation size of 8KiB for a 100% sequential read test) I only get 347MiBytes per second sustained throughput at an average latency of 2.87ms per IO equating to 44.5K IOps, ok, if that was 3GBits it would be less – around 280MiBytes per second, oh, but wait a minute [...fingers tap desk]

You’ll struggle to find in the commodity space an SSD that doesn’t have the SATA 3 (6GBits) interface, SSD’s are fast not only low latency and high IOps but they also offer a very large sustained transfer rate, consider the OCZ Agility 3 it so happens that in my masters dissertation I did the same test but on a difference box, I got 374MiBytes per second at an average latency of 2.67ms per IO equating to 47.9K IOps – cost of an 240GB Agility 3 is £174.24 (http://www.scan.co.uk/products/240gb-ocz-agility-3-ssd-25-sata-6gb-s-sandforce-2281-read-525mb-s-write-500mb-s-85k-iops), but that same drive set in a box connected with SATA 2 (3Gbits) would only yield around 280MiBytes per second thus losing almost 100MiBytes per second throughput and a ton of IOps too.

So why the hell are “enterprise” vendors still only connecting SSD’s at 3GBits? Well, my conspiracy states that they have no interest in you moving to SSD because they’ll lose so much money, the argument that they use SATA 2 doesn’t wash, SATA 3 has been out for some time now and all the commodity stuff you buy uses it now.

Consider the cost, not in terms of price per GB but price per IOps, SSD absolutely thrash Hard Drives on that, it was true that the opposite was also true that Hard Drives thrashed SSD’s on price per GB, but is that true now, I’m not so sure – a 300GByte 2.5” 15Krpm SAS drive costs £329.76 ex VAT (http://www.scan.co.uk/products/300gb-seagate-st9300653ss-savvio-15k3-25-hdd-sas-6gb-s-15000rpm-64mb-cache-27ms) which equates to £1.09 per GB compared to a 480GB OCZ Agility 3 costing £422.10 ex VAT (http://www.scan.co.uk/products/480gb-ocz-agility-3-ssd-25-sata-6gb-s-sandforce-2281-read-525mb-s-write-410mb-s-30k-iops) which equates to £0.88 per GB.

Ok, I compared an “enterprise” hard drive with a “commodity” SSD, ok, so things get a little more complicated here, most “enterprise” SSD’s are SLC and most commodity are MLC, SLC gives more performance and wear, I’ll talk about that another day.

For now though, don’t get sucked in by vendor marketing, SATA 2 (3Gbit) just doesn’t cut it, SSD need 6Gbit to breath and even that SSD’s are pushing. Alas, SSD’s are connected using SATA so all the controllers I’ve seen thus far from HP and DELL only do SATA 2 – deliberate? Well, I’ll let you decide on that one.

Learn how you can load over 44 million rows with an average length of 241 bytes into SQL Server at a rate of over 530K rows per second using kit that costs less that £2K.

https://skydrive.live.com/?mkt=en-gb#cid=DD00BC6E00F55EDF&id=DD00BC6E00F55EDF%21473

The paper is the end result of my two year masters in Business Intelligence at the University of Dundee.

Later this week I'll be putting http://www.reportingbrick.com live which will be a continuation of the paper where I'll post further research on the subject as time progresses.

This Friday (20th Jan 2012) between 2pm and 3:45pm I'll  be at the School of Computing, University of Dundee to demo the kit and answer any questions in person.

Updated 29th may 2012 for sql server 2012 (below)

Microsoft have changed the licence model for multiple passive machines, you can now only have one secondary passive failover per primary - see http://download.microsoft.com/download/7/3/C/73CAD4E0-D0B5-4BE5-AB49-D5B886A5AE00/SQL_Server_2012_Licensing_Guide_Apr2012.pdf 

Personally I don't agree with this move, like previous editions you should be able to have multiple passive failovers.

-------------

If you were in any doubt at all that you need to license Standby / Passive Failover servers then the White Paper “Do Not Pay Too Much for Your Database Licensing” will settle those doubts.

I’ve had debate before people thinking you can only have a single instance as a standby machine, that’s just wrong; it would mean you could have a scenario where you had a 2 node active/passive cluster with database mirroring and log shipping (a total of 4 SQL Server instances) – in that set up you only need to buy one physical license so long as the standby nodes have the same or less physical processors (cores are irrelevant).

So next time your supplier suggests you need a license for your standby box tell them you don’t and educate them by pointing them to the white paper.

For clarity I’ve copied the extract below from the White Paper.

Extract from “Do Not Pay Too Much for Your Database Licensing

Standby Server

Customers often implement standby server to make sure the application continues to function in case primary server fails. Standby server continuously receives updates from the primary server and will take over the role of primary server in case of failure in the primary server.

Following are comparisons of how each vendor supports standby server licensing.

SQL Server
Customers does not need to license standby (or passive) server provided that the number of processors in the standby server is equal or less than those in the active server.

Oracle DB
Oracle requires customer to fully license both active and standby servers even though the standby server is essentially idle most of the time.

IBM DB2
IBM licensing on standby server is quite complicated and is different for every editions of DB2. For Enterprise Edition, a minimum of 100 PVUs or 25 Authorized User is needed to license standby server.

 

The following graph compares prices based on a database application with two processors (dual-core) and 25 users with one standby server.

[chart snipped] 

Note   All prices are based on newest Intel Xeon Nehalem processor database pricing for purchases within the United States and are in United States dollars. Pricing is based on information available on vendor Web sites for Enterprise Edition.

Microsoft SQL Server Enterprise Edition
25 users (CALs) x $164 / CAL + $8,592 / Server = $12,692 (no need to license standby server)

Oracle Enterprise Edition (base license without options)
Named User Plus minimum (25 Named Users Plus per Core) = 25 x 2 = 50 Named Users Plus x $950 / Named Users Plus x 2 servers = $95,000

IBM DB2 Enterprise Edition (base license without feature pack)
Need to purchase 125 Authorized User (400 PVUs/100 PVUs = 4 X 25 = 100 Authorized User + 25 Authorized Users for standby server) = 125 Authorized Users x $1,040 / Authorized Users = $130,000

 

When considering buying a laptop that’s going to cost me around £5,000 I really need to justify the purchase from a business perspective; my Lenovo W700 has served me very well for the last 2 years, it’s an extremely good machine and as solid as a rock (and as heavy), alas though it is limited to the 8GB.

As SQL Server 2012 approaches and with my interest in working in the Business Intelligence space over the next year or two it is clear I need a powerful machine that I can run a full infrastructure though virtualised.

My requirements

For High Availability / Disaster Recovery research and demonstration

Machine for a domain controller
Four machines in a shared disk cluster (SQL Server Clustering active – active etc.)
Five  machines in a file share cluster (SQL Server Availability Groups)

For Business Intelligence research and demonstration

Not entirely sure how many machine I want to run here, but it would be to cover the entire BI stack in an enterprise setting, sharepoint, sql server etc.

For Big Data Research

I have a fondness for the NoSQL approach to scalability and dealing with large volumes so I need a number of machines to research VoltDB, Hadoop etc.

As you can see the requirements for a SQL Server consultant to service their clients well is considerable; will 8GB suffice, alas no, it will no longer do. I’m a very strong believer that in order to do your job well you must expense it, short cuts only cost you time, waiting 5 minutes instead of an hour for something to run not only saves me time but my clients time, I can do things quicker and more importantly I can demonstrate concepts.

My W700 with the 8GB of RAM and SSD’s cost me around £3.5K two years ago, to be honest I’ve not got the full use I wanted out of it but the machine has had the power when I’ve needed it, it’s served me and my clients well.

Alienware now do a model (the M18x) with 32GB of RAM; yes 32GB in a laptop! Dual drives so I can whack a couple of really good SSD’s in there, a quad core with hyper threading i7 and a decent speed.

I can reduce the cost of the memory by getting it from Crucial, so instead of £1.5K for 32GB it will be around £900, I can also cost save on the SSD as well. The beauty about the M18x is that it is USB3.0, SATA 3 and also really importantly has eSATA, running VM’s will never be easier, I can have a removeable SSD with my VM’s on it and can plug it into my home machine or laptop – an ideal world!

The initial outlay of £5K is peanuts compared to the benefits I’ll give my clients, I will be able to present real enterprise concepts, I’ll also be able to give training on those real enterprise concepts and with real, albeit virtualised machines.

This October, the UK SQL Server community is inviting anyone with an interest in SQL Server to a series of free evening user group events being held around the country.

Supported by SQLServerFaq, the 3rd-6th October 2011 sees the first annual SQLRelay with many of the UK’s SQL Server user groups coming together to run 13 evening meetings across the country in 4 days.

Everyone’s Welcome

Whether you’re a developer or administrator, professional or amateur, we hope there’s an event near you where you will not only get to meet and knowledge-share with like-minded people but also get to see a well-known SQL Server community member present. Our aim as user group leaders is to make you feel part of the already popular UK SQL Server community where we believe even the most novice of people can teach an old hand a thing or two.

If you’re already a member of a local user group why not bring a friend or colleague along to one our SQLRelay meetings in October? Every user group taking part has been organised to have either a Microsoft SQL Server team member or a Microsoft Most Valuable Professional (MVP) as a speaker.

SQLRelay Agenda

The UK SQL Server user group week kicks off in Liverpool at SQLBits, the semi-annual SQL Server conference, from the 29th September to 1st October. If you’re already attending then make sure you visit Community Corner where you can meet the user group leaders as well as their members. The SQLRelay team will be wearing instantly identifiable outfits and be available to give you more information about their events and answer your questions. For more information on SQLBits please visit www.sqlbits.com.

Following SQLBits, between Monday 3rd and Wednesday 5th of October, the 13 regional user groups will be holding free evening meetings very similar to their normal sessions, the only difference being that we’ve made sure each meeting will have a well-known SQL Server community member presenting.

Monday 3rd October
Manchester, Surrey, Birmingham, Kent

Tuesday 4th
Leeds, Bristol, London, Hertfordshire

Wednesday 5th
Edinburgh, Southampton, Exeter, Cardiff

We also welcome a new user group that will start in October in the County of Essex, more details will follow.

On Thursday 6th October the SQLRelay week will finish with an event at Microsoft’s Cardinal Place venue in London, with an internationally recognised speaker speaking. This is your chance to meet the attendees, speakers and organisers from all of the regional events as well as our yet to be named guest speaker. However, you can be assured that they are a well-known in the international SQL Server community and as soon as we can confirm more about their appearance we will. In addition to our speakers there will also be a prize draw where there is a chance to win the cool prizes provided by our sponsors.

How to Register

Details of each events date, time, location and agenda can be found on the UK SQL Server community web site www.sqlserverfaq.com where you can also register to attend each event.

Finally, we hope having received this email you feel welcomed by the UK SQL Server community and that you’ll join in our excitement about having a week of SQL Server community events.

Regards

The UK SQL Server User Group Leaders

CPU Saturation – an over performing IO subsystem

It’s not often that you see the IO subsystem able to out drive the cores in the box. To demonstrate this behaviour the TestGuid_HeapInsert table (see http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx for set up scripts), a 50 million row table (a heap) 54GBytes in size will be used, processor affinity will be used to lock SQL Server to a specific number of available cores.

Note: the equipment for this test is described here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/22/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-3-initial-base-line-with-iometer-and-first-test-group.aspx. In summary, Windows 2008 R2 x64, SQL 2008 R2 x64, 16GB of RAM but SQL Server is limited to 1GB, processor is a AMD Phenom II X6 Six Core 1090T Black Edition.

Interestingly this behaviour will not show up when using the COUNT(*) function without any WHERE clause, performance is consistent regardless of the number of cores used, however, once you have a Compute Scalar in the plan then the core scalability issue described kicks in.

select COUNT(*) from TestGuid_HeapINSERT with ( index = 0 )

Average Disk Bytes/read              404,396
Average Disk Sec/read                  0.001
Average Disk Reads/sec                2,806
Average Read Bytes/sec               1,134,827,393

The SQL below has been used; the index hint used to simplify the experiment and make sure a table scan is being performed:

select MAX( cast( expandrow as bigint ) ) from TestGuid_HeapINSERT with ( index = 0 )

Cores

Avg Disk Bytes/Read

Avg Disk Sec/Read

Avg Disk Reads/Sec

Avg Read Bytes/Sec

Logical Reads

Read-Ahead Reads

CPU Time (ms)

Elapsed (ms)

1

411,619

0.000

622

256,306,080

7,142,864

7,142,864

211,225

228,679

2

406,215

0.001

1,244

506,625,778

7,142,864

7,142,864

216,249

116,102

3

407,495

0.001

1,833

746,940,222

7,142,864

7,142,754

220,336

78,957

4

409,191

0.001

2,392

978,956,526

7,142,864

7,142,864

223,797

60,129

5

408,588

0.001

2,795

1,142,225,279

7,142,864

7,142,864

225,248

51,388

6

408,588

0.001

3,270

1,324,789,338

7,142,864

7,142,864

229,429

44,631


Reviewing the figures above Average Disk Bytes per read, Average Disk Sec/Read, Logical Reads, Read-ahead reads and CPU Time (ms) remain fairly constant, however Average Disk Reads/Sec and Average Read Bytes/Sec are stepped and Elapsed (ms) reduces dramatically with a second core then savings diminish as cores are added.

Cores

Avg Disk Reads/Sec

Avg Read Bytes/Sec

Elapsed (ms)

2

622

250,319,698

-112,577

3

589

240,314,444

-37,145

4

559

232,016,304

-18,828

5

403

163,268,753

-8,741

6

475

182,564,059

-6,757

 

It should be noted, this problem isn’t actually specific to the IO subsystem, and the problem occurs if you have enough of your table in the buffer pool too, essentially the core cannot get the data through quick enough.

Relating to the real world

The figures above show that given a query that has no parallelisation and therefore runs on a single core the duration will be 512% of the capability of the box should all cores be used and will use only 19% of the availability Read Bytes/sec maximum.

In past versions of SQL Server we have been used to turning parallelism off because it more often than not extended the duration of the query, which was fixed in SQL Server 2008 where the parallelism does now work well. However, legacy systems and code and a perpetual myth that needs breaking is that MAXDOP should be removed and just let SQL Server get on with it.

It is true that on a system with a number of concurrent connections that this problem will balance itself out, however, what about the sequential overnight batch jobs that so many of us have?

No direct advice here (yet), may be that will come when I finally do my conclusions around these SSD benchmarks, but you need to be aware that this problem is here and as IO subsystems perform better (which mine does) then these issues will need addressing.

Benchmarks (SSD based)

In the following test I’ve tried to be realistic in terms of what a real world query would do - joins! Ordinarily you’d have several tables joining together; this is where you take the performance hit if a) you’ve an IO subsystem that has high read latency and b) lots of fragmentation.

Benchmarks will be performed on SSD and then on rotational disk, the first lot of benchmarks will focus on read queries, writing will be done at a later date where I will discuss the hybrid approach of storage design.

There are three versions of the table each held on its own file group so that there is no object interleaving to interfere and give a clear and clean set of numbers. The three versions are a) table populated with 50 million single INSERTS that has a clustered index on the guid column (TestGuid_INSERT), b) a table populated from the table from a) but as a single insert so there is no fragmentation (TestGuid_SingleINSERT) and finally c) table populated with 50 million single INSERTS that is a heap and has a single non-clustered index on the guid column (TestGUID_HeapINSERT), a padding column of char(1024) is used to pad the row thus causing more page splitting.

The database set up script can be found in the blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx

Base lining the test kit

To give benchmarks a back drop the capabilities of the kit needs to be established, in this instance a number of measures need to be established by placing the kit under specific load types. IOMeter has been used to test raw grunt in terms of IO’s per second, throughput and read latency.

Kit

CPU and Memory

£154.16 - Gigabyte GA-890FXA-UD7 AMD 890FX (Socket AM3)
          PCI-Express DDR3 Motherboard

£170.20 - AMD Phenom II X6 Six Core 1090T Black Edition
          3.20GHz (Socket AM3)

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

Sub-Total: 570.86

IO Sub System

£108.36 - 80GB OCZ Vertex 2 SATA II 2.5" SSD Solid State Disk
          (285MB/sec read 275MB/sec write)

 £50.67 - OCZSSDPX-1HSDL4P OCZ HSDL 3.5 INCH IBIS 4 port HSDL Card

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

Sub-Total: 1044.73

Build details

Operating System: Windows 2008 R2 x64

SQL Server 2008 R2 (10.50.1617) x64

C: on OCZ Vertex 2

E: as software RAID 0 over the two IBIS drives and formatted with an allocation size of 64Kbytes, the two IBIS drives internally RAID 0 with the default 64KB stripe.

Although there is 16GB of RAM in the box SQL Server 2008 R2 is fixed at min/max 1024 to put pressure on the buffer pool and thus stretch the I/O system.

IOMeter tests on a 50GB file

104857600 512 sectors

Three transfer sizes have been chosen, 64KB (an extent), 8 KB (a page) and 256KB (read ahead reading multiple extents).

For the 8B and 64KB I’ve given figures for both ends of the sequential and random read spectrum, for the 256KB test 50/50 was chosen because that is more realistic.

Transfer Size

Read or Write

% Rand

% Seq

Outstanding IO

IO's

/sec

Mbytes /sec

Avg I/O (ms)

Max I/O (ms)

CPU %

64

Read

100

0

32

18644

1165

1.71

10.23

27.26

64

Read

100

0

64

21305

1331

3

5.91

26.68

64

Read

100

0

86

21523

1345

3.99

6.75

18.76

64

Read

0

100

32

22718

1419

1.4

2.39

36.57

64

Read

0

100

64

20122

1257

3.17

9.6

34.18

64

Read

0

100

86

21233

1327

4.05

6.83

44.39

8

Read

100

0

32

65627

512

0.48

13.05

30.49

8

Read

100

0

64

60915

475

1.05

3.13

46.24

8

Read

100

0

86

75519

590

1.13

22.73

32.95

8

Read

0

100

32

79506

621

0.4

1.9

29.79

8

Read

0

100

64

79039

617

0.81

2.52

55.61

8

Read

0

100

86

67485

527

1.27

19.27

21.24

256

Read

50

50

32

5211

1302

6.14

10.83

26.64

256

Read

50

50

64

5121

1280

12.46

21.5

31.56

256

Read

50

50

86

5200

1300

16.5

34.92

27.79

 

Populating the data

The clustered index with 50 million inserts took 15 hours 4 minutes to populate the table; the 50 million inserts into a heap took 7 hours 55 minutes.

The table below shows the results of sys.dm_db_index_physical_stats, you can clearly see that TestGuid_INSERT (having the clustered index) and built from 50 million inserts is somewhat fragmented. The Heap contains almost exactly the same number of pages as the leaf level of the TestGuid_SingleINSERT which has no fragmentation – that is what I’d expect to see. The non-clustered index (index_id 2) which is the primary key (nonclustered) is extremely badly fragmented which again is what is expected because the insert order is purely random.

 

index id

index depth

index level

avg fragmentation in percent

fragment count

avg fragment size in pages

page count

avg page space used in percent

record count

min record size in bytes

max record size in bytes

avg record size in bytes

TestGuid_HeapInsert

0

1

0

3.4

31478

226.9

7142864

91

50000000

1047

1047

1047

TestGuid_HeapInsert

2

4

0

99.2

243713

1.0

243713

68

50000000

25

25

25

TestGuid_HeapInsert

2

4

1

99.9

1087

1.0

1087

69

243713

23

23

23

TestGuid_HeapInsert

2

4

2

50.0

4

1.0

4

84

1087

23

23

23

TestGuid_HeapInsert

2

4

3

0.0

1

1.0

1

1

4

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_SingleINSERT

1

4

0

0.7

48827

146.3

7142858

91

50000000

1047

1047

1047

TestGuid_SingleINSERT

1

4

1

0.0

47938

1.0

47938

46

7142858

23

23

23

TestGuid_SingleINSERT

1

4

2

0.0

321

1.0

321

46

47938

23

23

23

TestGuid_SingleINSERT

1

4

3

0.0

1

1.0

1

99

321

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_INSERT

1

4

0

99.1

10625949

1.0

10625950

61

50000000

1047

1047

1047

TestGuid_INSERT

1

4

1

100.0

47661

1.0

47661

69

10625950

23

23

23

TestGuid_INSERT

1

4

2

99.6

224

1.0

224

66

47661

23

23

23

TestGuid_INSERT

1

4

3

0.0

1

1.0

1

69

224

23

23

23

 

Simple Performance Comparison (Test group 1)

A series of tests will be performed and blogged over the coming weeks after which a conclusion will be drawn.

The first test is simply taking 50,000 of the 50 million rows; each row is equally spaced throughout the table (see SQL below) and then used to join back in to the big table in order to do a MAX. This tests performance of random look ups, basically this test is a real world and what I’d expect in most SQL Server databases with modest indexing strategies and maintenance.

--

--          TEST 1 (on fragmented INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_INSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8192

      avg disk sec/read   0.000

      avg disk reads/sec  48,887

      avg read bytes/sec  400,501,565

 

Table 'TestGuid_INSERT'. Scan count 1, logical reads 10673614, physical reads 25690, read-ahead reads 10709354, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 73134 ms,  elapsed time = 214243 ms.

 

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_INSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_INSERT'. Scan count 0, logical reads 351705, physical reads 6, read-ahead reads 196291, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2231 ms,  elapsed time = 1756 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

--

--          TEST 2 (on fragmented INSERT with heap)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_HEapINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8211

      avg disk sec/read   0.000

      avg disk reads/sec  5,749

      avg read bytes/sec  47,213,526

 

Table 'TestGuid_HeapINSERT'. Scan count 1, logical reads 244803, physical reads 444, read-ahead reads 278989, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 22074 ms,  elapsed time = 25424 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_HEapINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_HeapINSERT'. Scan count 0, logical reads 345645, physical reads 7, read-ahead reads 185834, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2571 ms,  elapsed time = 1624 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

 

--

--          TEST 3 (on single (non-fragmented) INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_SingleINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 297,763

      avg disk sec/read   0.001

      avg disk reads/sec  3,221

      avg read bytes/sec  959,224,428

 

Table 'TestGuid_SingleINSERT'. Scan count 1, logical reads 7190799, physical reads 21110, read-ahead reads 7190779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 44960 ms,  elapsed time = 62299 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_SingleINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_SingleINSERT'. Scan count 0, logical reads 353439, physical reads 6, read-ahead reads 203706, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2386 ms,  elapsed time = 1675 ms.

 

*/

 

 

Findings of Comparison (Test group 1)

Population of #lookup

 

Avg Disk Bytes/read

Avg Disk Sec/Read

Avg Disk Reads/sec

Avg Read Bytes/sec

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

8,192

0.000

48,887

400,501,565

10,673,614

25,690

10,709,354

73,134

214,243

TestGuid_HeapINSERT

8,211

0.000

5,749

47,213,526

244,803

444

278,989

22,074

25,424

TestGuid_SingleINSERT

297,763

0.001

3,221

959,224,428

7,190,799

21,110

7,190,779

44,960

62,299

 

The Heap clearly comes out on top because SQL Server is using the non-clustered index to get the list of rowguid’s to use so significantly less data to read.

Interestingly the read-ahead isn’t reading anything greater than a single page, this is what you’d expect because the read-ahead works best on contiguous data hence the contiguous TestGuid_SingleINSERT table has a high average bytes per read.

MAX( )

 

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

351,705

6

196,291

2,231

1,756

TestGuid_HeapINSERT

345,645

7

185,834

2,571

1,624

TestGuid_SingleINSERT

353,439

6

203,706

2,386

1,675

 

Given 50,000 rows (#lookup) against the 50 million row table the results of the join are similar in all cases regardless of how fragmented the data is.

Initial findings would suggest that random row lookups on data stored on SSD have uniform seek performance regardless of fragmentation and the size of the table because the nature of the join used in this case Nested Loops.

Plan for lookup against TestGuid_INSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[PK__TestGuid__AB46272D7F60ED59] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

Plan for lookup against TestGuid_HeapINSERT

  |--Stream Aggregate(DEFINE:([Expr1006]=MAX([partialagg1007])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1007]=MAX([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003], [Expr1011]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Compute Scalar(DEFINE:([Expr1010]=BmkToPage([Bmk1003])))

                      |    |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1009]) WITH UNORDERED PREFETCH)

                      |         |--Sort(ORDER BY:([l].[guidcol] ASC))

                      |         |    |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |         |--Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[PK__TestGuid__AB46272C1273C1CD] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

                      |--RID Lookup(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT] AS [tg]), SEEK:([Bmk1003]=[Bmk1003]) LOOKUP ORDERED FORWARD)

Plan for lookup against TestGuid_SingleINSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[PK__TestGuid__AB46272D08EA5793] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

 

In the next blog instalment I’ll look at how the I/O system can outperform the cores in the box creating a CPU bottleneck, we’ll then return to more tests and finally a conclusion.

Any comments feel free to email me privately tonyrogerson@torver.net

More Posts Next page »