Learn how you can load over 44 million rows with an average length of 241 bytes into SQL Server at a rate of over 530K rows per second using kit that costs less that £2K.

https://skydrive.live.com/?mkt=en-gb#cid=DD00BC6E00F55EDF&id=DD00BC6E00F55EDF%21473

The paper is the end result of my two year masters in Business Intelligence at the University of Dundee.

Later this week I'll be putting http://www.reportingbrick.com live which will be a continuation of the paper where I'll post further research on the subject as time progresses.

This Friday (20th Jan 2012) between 2pm and 3:45pm I'll  be at the School of Computing, University of Dundee to demo the kit and answer any questions in person.

If you were in any doubt at all that you need to license Standby / Passive Failover servers then the White Paper “Do Not Pay Too Much for Your Database Licensing” will settle those doubts.

I’ve had debate before people thinking you can only have a single instance as a standby machine, that’s just wrong; it would mean you could have a scenario where you had a 2 node active/passive cluster with database mirroring and log shipping (a total of 4 SQL Server instances) – in that set up you only need to buy one physical license so long as the standby nodes have the same or less physical processors (cores are irrelevant).

So next time your supplier suggests you need a license for your standby box tell them you don’t and educate them by pointing them to the white paper.

For clarity I’ve copied the extract below from the White Paper.

Extract from “Do Not Pay Too Much for Your Database Licensing

Standby Server

Customers often implement standby server to make sure the application continues to function in case primary server fails. Standby server continuously receives updates from the primary server and will take over the role of primary server in case of failure in the primary server.

Following are comparisons of how each vendor supports standby server licensing.

SQL Server
Customers does not need to license standby (or passive) server provided that the number of processors in the standby server is equal or less than those in the active server.

Oracle DB
Oracle requires customer to fully license both active and standby servers even though the standby server is essentially idle most of the time.

IBM DB2
IBM licensing on standby server is quite complicated and is different for every editions of DB2. For Enterprise Edition, a minimum of 100 PVUs or 25 Authorized User is needed to license standby server.

 

The following graph compares prices based on a database application with two processors (dual-core) and 25 users with one standby server.

[chart snipped] 

Note   All prices are based on newest Intel Xeon Nehalem processor database pricing for purchases within the United States and are in United States dollars. Pricing is based on information available on vendor Web sites for Enterprise Edition.

Microsoft SQL Server Enterprise Edition
25 users (CALs) x $164 / CAL + $8,592 / Server = $12,692 (no need to license standby server)

Oracle Enterprise Edition (base license without options)
Named User Plus minimum (25 Named Users Plus per Core) = 25 x 2 = 50 Named Users Plus x $950 / Named Users Plus x 2 servers = $95,000

IBM DB2 Enterprise Edition (base license without feature pack)
Need to purchase 125 Authorized User (400 PVUs/100 PVUs = 4 X 25 = 100 Authorized User + 25 Authorized Users for standby server) = 125 Authorized Users x $1,040 / Authorized Users = $130,000

 

When considering buying a laptop that’s going to cost me around £5,000 I really need to justify the purchase from a business perspective; my Lenovo W700 has served me very well for the last 2 years, it’s an extremely good machine and as solid as a rock (and as heavy), alas though it is limited to the 8GB.

As SQL Server 2012 approaches and with my interest in working in the Business Intelligence space over the next year or two it is clear I need a powerful machine that I can run a full infrastructure though virtualised.

My requirements

For High Availability / Disaster Recovery research and demonstration

Machine for a domain controller
Four machines in a shared disk cluster (SQL Server Clustering active – active etc.)
Five  machines in a file share cluster (SQL Server Availability Groups)

For Business Intelligence research and demonstration

Not entirely sure how many machine I want to run here, but it would be to cover the entire BI stack in an enterprise setting, sharepoint, sql server etc.

For Big Data Research

I have a fondness for the NoSQL approach to scalability and dealing with large volumes so I need a number of machines to research VoltDB, Hadoop etc.

As you can see the requirements for a SQL Server consultant to service their clients well is considerable; will 8GB suffice, alas no, it will no longer do. I’m a very strong believer that in order to do your job well you must expense it, short cuts only cost you time, waiting 5 minutes instead of an hour for something to run not only saves me time but my clients time, I can do things quicker and more importantly I can demonstrate concepts.

My W700 with the 8GB of RAM and SSD’s cost me around £3.5K two years ago, to be honest I’ve not got the full use I wanted out of it but the machine has had the power when I’ve needed it, it’s served me and my clients well.

Alienware now do a model (the M18x) with 32GB of RAM; yes 32GB in a laptop! Dual drives so I can whack a couple of really good SSD’s in there, a quad core with hyper threading i7 and a decent speed.

I can reduce the cost of the memory by getting it from Crucial, so instead of £1.5K for 32GB it will be around £900, I can also cost save on the SSD as well. The beauty about the M18x is that it is USB3.0, SATA 3 and also really importantly has eSATA, running VM’s will never be easier, I can have a removeable SSD with my VM’s on it and can plug it into my home machine or laptop – an ideal world!

The initial outlay of £5K is peanuts compared to the benefits I’ll give my clients, I will be able to present real enterprise concepts, I’ll also be able to give training on those real enterprise concepts and with real, albeit virtualised machines.

This October, the UK SQL Server community is inviting anyone with an interest in SQL Server to a series of free evening user group events being held around the country.

Supported by SQLServerFaq, the 3rd-6th October 2011 sees the first annual SQLRelay with many of the UK’s SQL Server user groups coming together to run 13 evening meetings across the country in 4 days.

Everyone’s Welcome

Whether you’re a developer or administrator, professional or amateur, we hope there’s an event near you where you will not only get to meet and knowledge-share with like-minded people but also get to see a well-known SQL Server community member present. Our aim as user group leaders is to make you feel part of the already popular UK SQL Server community where we believe even the most novice of people can teach an old hand a thing or two.

If you’re already a member of a local user group why not bring a friend or colleague along to one our SQLRelay meetings in October? Every user group taking part has been organised to have either a Microsoft SQL Server team member or a Microsoft Most Valuable Professional (MVP) as a speaker.

SQLRelay Agenda

The UK SQL Server user group week kicks off in Liverpool at SQLBits, the semi-annual SQL Server conference, from the 29th September to 1st October. If you’re already attending then make sure you visit Community Corner where you can meet the user group leaders as well as their members. The SQLRelay team will be wearing instantly identifiable outfits and be available to give you more information about their events and answer your questions. For more information on SQLBits please visit www.sqlbits.com.

Following SQLBits, between Monday 3rd and Wednesday 5th of October, the 13 regional user groups will be holding free evening meetings very similar to their normal sessions, the only difference being that we’ve made sure each meeting will have a well-known SQL Server community member presenting.

Monday 3rd October
Manchester, Surrey, Birmingham, Kent

Tuesday 4th
Leeds, Bristol, London, Hertfordshire

Wednesday 5th
Edinburgh, Southampton, Exeter, Cardiff

We also welcome a new user group that will start in October in the County of Essex, more details will follow.

On Thursday 6th October the SQLRelay week will finish with an event at Microsoft’s Cardinal Place venue in London, with an internationally recognised speaker speaking. This is your chance to meet the attendees, speakers and organisers from all of the regional events as well as our yet to be named guest speaker. However, you can be assured that they are a well-known in the international SQL Server community and as soon as we can confirm more about their appearance we will. In addition to our speakers there will also be a prize draw where there is a chance to win the cool prizes provided by our sponsors.

How to Register

Details of each events date, time, location and agenda can be found on the UK SQL Server community web site www.sqlserverfaq.com where you can also register to attend each event.

Finally, we hope having received this email you feel welcomed by the UK SQL Server community and that you’ll join in our excitement about having a week of SQL Server community events.

Regards

The UK SQL Server User Group Leaders

CPU Saturation – an over performing IO subsystem

It’s not often that you see the IO subsystem able to out drive the cores in the box. To demonstrate this behaviour the TestGuid_HeapInsert table (see http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx for set up scripts), a 50 million row table (a heap) 54GBytes in size will be used, processor affinity will be used to lock SQL Server to a specific number of available cores.

Note: the equipment for this test is described here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/22/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-3-initial-base-line-with-iometer-and-first-test-group.aspx. In summary, Windows 2008 R2 x64, SQL 2008 R2 x64, 16GB of RAM but SQL Server is limited to 1GB, processor is a AMD Phenom II X6 Six Core 1090T Black Edition.

Interestingly this behaviour will not show up when using the COUNT(*) function without any WHERE clause, performance is consistent regardless of the number of cores used, however, once you have a Compute Scalar in the plan then the core scalability issue described kicks in.

select COUNT(*) from TestGuid_HeapINSERT with ( index = 0 )

Average Disk Bytes/read              404,396
Average Disk Sec/read                  0.001
Average Disk Reads/sec                2,806
Average Read Bytes/sec               1,134,827,393

The SQL below has been used; the index hint used to simplify the experiment and make sure a table scan is being performed:

select MAX( cast( expandrow as bigint ) ) from TestGuid_HeapINSERT with ( index = 0 )

Cores

Avg Disk Bytes/Read

Avg Disk Sec/Read

Avg Disk Reads/Sec

Avg Read Bytes/Sec

Logical Reads

Read-Ahead Reads

CPU Time (ms)

Elapsed (ms)

1

411,619

0.000

622

256,306,080

7,142,864

7,142,864

211,225

228,679

2

406,215

0.001

1,244

506,625,778

7,142,864

7,142,864

216,249

116,102

3

407,495

0.001

1,833

746,940,222

7,142,864

7,142,754

220,336

78,957

4

409,191

0.001

2,392

978,956,526

7,142,864

7,142,864

223,797

60,129

5

408,588

0.001

2,795

1,142,225,279

7,142,864

7,142,864

225,248

51,388

6

408,588

0.001

3,270

1,324,789,338

7,142,864

7,142,864

229,429

44,631


Reviewing the figures above Average Disk Bytes per read, Average Disk Sec/Read, Logical Reads, Read-ahead reads and CPU Time (ms) remain fairly constant, however Average Disk Reads/Sec and Average Read Bytes/Sec are stepped and Elapsed (ms) reduces dramatically with a second core then savings diminish as cores are added.

Cores

Avg Disk Reads/Sec

Avg Read Bytes/Sec

Elapsed (ms)

2

622

250,319,698

-112,577

3

589

240,314,444

-37,145

4

559

232,016,304

-18,828

5

403

163,268,753

-8,741

6

475

182,564,059

-6,757

 

It should be noted, this problem isn’t actually specific to the IO subsystem, and the problem occurs if you have enough of your table in the buffer pool too, essentially the core cannot get the data through quick enough.

Relating to the real world

The figures above show that given a query that has no parallelisation and therefore runs on a single core the duration will be 512% of the capability of the box should all cores be used and will use only 19% of the availability Read Bytes/sec maximum.

In past versions of SQL Server we have been used to turning parallelism off because it more often than not extended the duration of the query, which was fixed in SQL Server 2008 where the parallelism does now work well. However, legacy systems and code and a perpetual myth that needs breaking is that MAXDOP should be removed and just let SQL Server get on with it.

It is true that on a system with a number of concurrent connections that this problem will balance itself out, however, what about the sequential overnight batch jobs that so many of us have?

No direct advice here (yet), may be that will come when I finally do my conclusions around these SSD benchmarks, but you need to be aware that this problem is here and as IO subsystems perform better (which mine does) then these issues will need addressing.

Benchmarks (SSD based)

In the following test I’ve tried to be realistic in terms of what a real world query would do - joins! Ordinarily you’d have several tables joining together; this is where you take the performance hit if a) you’ve an IO subsystem that has high read latency and b) lots of fragmentation.

Benchmarks will be performed on SSD and then on rotational disk, the first lot of benchmarks will focus on read queries, writing will be done at a later date where I will discuss the hybrid approach of storage design.

There are three versions of the table each held on its own file group so that there is no object interleaving to interfere and give a clear and clean set of numbers. The three versions are a) table populated with 50 million single INSERTS that has a clustered index on the guid column (TestGuid_INSERT), b) a table populated from the table from a) but as a single insert so there is no fragmentation (TestGuid_SingleINSERT) and finally c) table populated with 50 million single INSERTS that is a heap and has a single non-clustered index on the guid column (TestGUID_HeapINSERT), a padding column of char(1024) is used to pad the row thus causing more page splitting.

The database set up script can be found in the blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx

Base lining the test kit

To give benchmarks a back drop the capabilities of the kit needs to be established, in this instance a number of measures need to be established by placing the kit under specific load types. IOMeter has been used to test raw grunt in terms of IO’s per second, throughput and read latency.

Kit

CPU and Memory

£154.16 - Gigabyte GA-890FXA-UD7 AMD 890FX (Socket AM3)
          PCI-Express DDR3 Motherboard

£170.20 - AMD Phenom II X6 Six Core 1090T Black Edition
          3.20GHz (Socket AM3)

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

Sub-Total: 570.86

IO Sub System

£108.36 - 80GB OCZ Vertex 2 SATA II 2.5" SSD Solid State Disk
          (285MB/sec read 275MB/sec write)

 £50.67 - OCZSSDPX-1HSDL4P OCZ HSDL 3.5 INCH IBIS 4 port HSDL Card

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

Sub-Total: 1044.73

Build details

Operating System: Windows 2008 R2 x64

SQL Server 2008 R2 (10.50.1617) x64

C: on OCZ Vertex 2

E: as software RAID 0 over the two IBIS drives and formatted with an allocation size of 64Kbytes, the two IBIS drives internally RAID 0 with the default 64KB stripe.

Although there is 16GB of RAM in the box SQL Server 2008 R2 is fixed at min/max 1024 to put pressure on the buffer pool and thus stretch the I/O system.

IOMeter tests on a 50GB file

104857600 512 sectors

Three transfer sizes have been chosen, 64KB (an extent), 8 KB (a page) and 256KB (read ahead reading multiple extents).

For the 8B and 64KB I’ve given figures for both ends of the sequential and random read spectrum, for the 256KB test 50/50 was chosen because that is more realistic.

Transfer Size

Read or Write

% Rand

% Seq

Outstanding IO

IO's

/sec

Mbytes /sec

Avg I/O (ms)

Max I/O (ms)

CPU %

64

Read

100

0

32

18644

1165

1.71

10.23

27.26

64

Read

100

0

64

21305

1331

3

5.91

26.68

64

Read

100

0

86

21523

1345

3.99

6.75

18.76

64

Read

0

100

32

22718

1419

1.4

2.39

36.57

64

Read

0

100

64

20122

1257

3.17

9.6

34.18

64

Read

0

100

86

21233

1327

4.05

6.83

44.39

8

Read

100

0

32

65627

512

0.48

13.05

30.49

8

Read

100

0

64

60915

475

1.05

3.13

46.24

8

Read

100

0

86

75519

590

1.13

22.73

32.95

8

Read

0

100

32

79506

621

0.4

1.9

29.79

8

Read

0

100

64

79039

617

0.81

2.52

55.61

8

Read

0

100

86

67485

527

1.27

19.27

21.24

256

Read

50

50

32

5211

1302

6.14

10.83

26.64

256

Read

50

50

64

5121

1280

12.46

21.5

31.56

256

Read

50

50

86

5200

1300

16.5

34.92

27.79

 

Populating the data

The clustered index with 50 million inserts took 15 hours 4 minutes to populate the table; the 50 million inserts into a heap took 7 hours 55 minutes.

The table below shows the results of sys.dm_db_index_physical_stats, you can clearly see that TestGuid_INSERT (having the clustered index) and built from 50 million inserts is somewhat fragmented. The Heap contains almost exactly the same number of pages as the leaf level of the TestGuid_SingleINSERT which has no fragmentation – that is what I’d expect to see. The non-clustered index (index_id 2) which is the primary key (nonclustered) is extremely badly fragmented which again is what is expected because the insert order is purely random.

 

index id

index depth

index level

avg fragmentation in percent

fragment count

avg fragment size in pages

page count

avg page space used in percent

record count

min record size in bytes

max record size in bytes

avg record size in bytes

TestGuid_HeapInsert

0

1

0

3.4

31478

226.9

7142864

91

50000000

1047

1047

1047

TestGuid_HeapInsert

2

4

0

99.2

243713

1.0

243713

68

50000000

25

25

25

TestGuid_HeapInsert

2

4

1

99.9

1087

1.0

1087

69

243713

23

23

23

TestGuid_HeapInsert

2

4

2

50.0

4

1.0

4

84

1087

23

23

23

TestGuid_HeapInsert

2

4

3

0.0

1

1.0

1

1

4

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_SingleINSERT

1

4

0

0.7

48827

146.3

7142858

91

50000000

1047

1047

1047

TestGuid_SingleINSERT

1

4

1

0.0

47938

1.0

47938

46

7142858

23

23

23

TestGuid_SingleINSERT

1

4

2

0.0

321

1.0

321

46

47938

23

23

23

TestGuid_SingleINSERT

1

4

3

0.0

1

1.0

1

99

321

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_INSERT

1

4

0

99.1

10625949

1.0

10625950

61

50000000

1047

1047

1047

TestGuid_INSERT

1

4

1

100.0

47661

1.0

47661

69

10625950

23

23

23

TestGuid_INSERT

1

4

2

99.6

224

1.0

224

66

47661

23

23

23

TestGuid_INSERT

1

4

3

0.0

1

1.0

1

69

224

23

23

23

 

Simple Performance Comparison (Test group 1)

A series of tests will be performed and blogged over the coming weeks after which a conclusion will be drawn.

The first test is simply taking 50,000 of the 50 million rows; each row is equally spaced throughout the table (see SQL below) and then used to join back in to the big table in order to do a MAX. This tests performance of random look ups, basically this test is a real world and what I’d expect in most SQL Server databases with modest indexing strategies and maintenance.

--

--          TEST 1 (on fragmented INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_INSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8192

      avg disk sec/read   0.000

      avg disk reads/sec  48,887

      avg read bytes/sec  400,501,565

 

Table 'TestGuid_INSERT'. Scan count 1, logical reads 10673614, physical reads 25690, read-ahead reads 10709354, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 73134 ms,  elapsed time = 214243 ms.

 

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_INSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_INSERT'. Scan count 0, logical reads 351705, physical reads 6, read-ahead reads 196291, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2231 ms,  elapsed time = 1756 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

--

--          TEST 2 (on fragmented INSERT with heap)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_HEapINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8211

      avg disk sec/read   0.000

      avg disk reads/sec  5,749

      avg read bytes/sec  47,213,526

 

Table 'TestGuid_HeapINSERT'. Scan count 1, logical reads 244803, physical reads 444, read-ahead reads 278989, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 22074 ms,  elapsed time = 25424 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_HEapINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_HeapINSERT'. Scan count 0, logical reads 345645, physical reads 7, read-ahead reads 185834, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2571 ms,  elapsed time = 1624 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

 

--

--          TEST 3 (on single (non-fragmented) INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_SingleINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 297,763

      avg disk sec/read   0.001

      avg disk reads/sec  3,221

      avg read bytes/sec  959,224,428

 

Table 'TestGuid_SingleINSERT'. Scan count 1, logical reads 7190799, physical reads 21110, read-ahead reads 7190779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 44960 ms,  elapsed time = 62299 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_SingleINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_SingleINSERT'. Scan count 0, logical reads 353439, physical reads 6, read-ahead reads 203706, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2386 ms,  elapsed time = 1675 ms.

 

*/

 

 

Findings of Comparison (Test group 1)

Population of #lookup

 

Avg Disk Bytes/read

Avg Disk Sec/Read

Avg Disk Reads/sec

Avg Read Bytes/sec

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

8,192

0.000

48,887

400,501,565

10,673,614

25,690

10,709,354

73,134

214,243

TestGuid_HeapINSERT

8,211

0.000

5,749

47,213,526

244,803

444

278,989

22,074

25,424

TestGuid_SingleINSERT

297,763

0.001

3,221

959,224,428

7,190,799

21,110

7,190,779

44,960

62,299

 

The Heap clearly comes out on top because SQL Server is using the non-clustered index to get the list of rowguid’s to use so significantly less data to read.

Interestingly the read-ahead isn’t reading anything greater than a single page, this is what you’d expect because the read-ahead works best on contiguous data hence the contiguous TestGuid_SingleINSERT table has a high average bytes per read.

MAX( )

 

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

351,705

6

196,291

2,231

1,756

TestGuid_HeapINSERT

345,645

7

185,834

2,571

1,624

TestGuid_SingleINSERT

353,439

6

203,706

2,386

1,675

 

Given 50,000 rows (#lookup) against the 50 million row table the results of the join are similar in all cases regardless of how fragmented the data is.

Initial findings would suggest that random row lookups on data stored on SSD have uniform seek performance regardless of fragmentation and the size of the table because the nature of the join used in this case Nested Loops.

Plan for lookup against TestGuid_INSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[PK__TestGuid__AB46272D7F60ED59] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

Plan for lookup against TestGuid_HeapINSERT

  |--Stream Aggregate(DEFINE:([Expr1006]=MAX([partialagg1007])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1007]=MAX([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003], [Expr1011]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Compute Scalar(DEFINE:([Expr1010]=BmkToPage([Bmk1003])))

                      |    |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1009]) WITH UNORDERED PREFETCH)

                      |         |--Sort(ORDER BY:([l].[guidcol] ASC))

                      |         |    |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |         |--Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[PK__TestGuid__AB46272C1273C1CD] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

                      |--RID Lookup(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT] AS [tg]), SEEK:([Bmk1003]=[Bmk1003]) LOOKUP ORDERED FORWARD)

Plan for lookup against TestGuid_SingleINSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[PK__TestGuid__AB46272D08EA5793] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

 

In the next blog instalment I’ll look at how the I/O system can outperform the cores in the box creating a CPU bottleneck, we’ll then return to more tests and finally a conclusion.

Any comments feel free to email me privately tonyrogerson@torver.net

In relation to my blog post on guids and ssds (part 1) the script below forms the basis of the permanent tables in the database.

TestGuid_INSERT forms a table that is 50 million rows and around 85.5GBytes in size.

The tests I will be doing as part of the benchmarking are real world tests for instance an INNER JOIN between the TestGuid_INSERT and a lookup table with 50K guids in it.

If you can think of any real world tests you want me to run then email me tonyrogerson@torver.net and I'll include in the benchmarks, better still - create the test yourself and be ready to run the benchmarks to better understand the game changer I'm talking about.

I've created the contiguous table on its own file group so I know the data is as contiguous as I can get it.

The 1GB start size and increment on the original test was deliberate to make it even more on-drive fragmented (it is already on SSD because of the mapping that goes on).

CREATE DATABASE [GUIDTest_SSD] ON  PRIMARY

( NAME = N'GUIDTest_SSD', FILENAME = N'E:\S2008R2\GUIDTest_SSD.mdf' , SIZE = 1GB , FILEGROWTH = 1GB )

 LOG ON

( NAME = N'GUIDTest_SSD_log', FILENAME = N'E:\S2008R2\GUIDTest_SSD_log.ldf' , SIZE = 1GB , FILEGROWTH = 1GB)

GO

alter database GUIDTest_SSD set recovery simple

go

USE GUIDTest_SSD

go

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

go

--    population script (2:55)

create table TestGuid_INSERT (

      guidcol uniqueidentifier not null default( newid() ) primary key clustered,

      expandrow char(1024) not null

      );

declare @i int = 1;

set nocount on;

while @i <= 50000000

begin

      insert TestGuid_INSERT ( guidcol, expandrow ) values( NEWID(), cast( @i as CHAR(1024) ) )

      set @i = @i + 1

end

go

 

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

go

USE [master]

GO

ALTER DATABASE [GUIDTest_SSD] ADD FILEGROUP [CONTIG_TBL]

GO

ALTER DATABASE [GUIDTest_SSD] ADD FILE ( NAME = N'GUIDTest_SSD_CONTIG', FILENAME = N'E:\S2008R2\GUIDTest_SSD_CONTIG.ndf' , SIZE = 100GB , FILEGROWTH = 1024KB ) TO FILEGROUP [CONTIG_TBL]

GO

USE GUIDTest_SSD

go

 

create   table TestGuid_SingleINSERT (

      guidcol uniqueidentifier not null default( newid() ) primary key clustered,

      expandrow char(1024) not null

      ) ON [CONTIG_TBL];

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

go

insert TestGuid_SingleINSERT ( guidcol, expandrow )

      select guidcol, expandrow

      from TestGuid_INSERT

      order by guidcol, expandrow

go

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

go   

USE [master]

GO

ALTER DATABASE [GUIDTest_SSD] ADD FILEGROUP [HEAP_TBL]

GO

ALTER DATABASE [GUIDTest_SSD] ADD FILE ( NAME = N'GUIDTest_SSD_HEAP_TBL', FILENAME = N'E:\S2008R2\GUIDTest_SSD_HEAP_TBL.ndf' , SIZE = 100GB , FILEGROWTH = 1024KB ) TO FILEGROUP [HEAP_TBL]

GO

USE GUIDTest_SSD

go

--    population script (2:55)

create table TestGuid_HeapINSERT (

      guidcol uniqueidentifier not null default( newid() ) primary key nonclustered,

      expandrow char(1024) not null

      ) on [heap_tbl];

declare @i int = 1;

set nocount on;

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

while @i <= 50000000

begin

      insert TestGuid_HeapINSERT ( guidcol, expandrow ) values( NEWID(), cast( @i as CHAR(1024) ) )

      set @i = @i + 1

end

go

select * from sys.dm_io_virtual_file_stats( DB_ID(), null )

go

Using GUID’s (uniqueidentifier) for surrogate keys has always and still is frowned on by database designers and optimisation consultants alike and rightly so, the fragmentation and randomness of how rows are stored and accessed can cripple performance with traditional storage systems.

People on all sides are acutely aware of the benefits GUID’s bring for distributed processing whereby the middle tier is contained on many servers or the database is spread across many physical machines.

Solid State Drives have been a game changer, they remove problems associated with traditional rotational disk technology because they not only offer uniform seek times across the storage but also the read latency for fully random reads is consistent and extremely low, they do have a weakness – cells burn out on write intensive workloads, Multiple Layer Cells (MLC) this around 10K cycles, for Single Layer Cells (SLC) this around 100K cycles and for the newer Phase Change Memory (PCM) this around 100M (http://www.fastestssd.com/articles/pcm-based-storage-a-game-changer/).

This article seeks to demonstrate that we as database designers and optimisation specialists need to adjust our approach and realise SSD’s provide a unique opportunity to scale out as well as up the database and offer less complicated database implementations especially as now like never before there is a greater need to spread data across many physical machines. The article is split into two parts, the first gives background for SSD, and the second the actual benchmarks.

What is really important IO size or query duration?

We should only be concerned with query duration, queries should return in an acceptable duration depending on the query being performed, user expectation and number of concurrent queries executing. Query speed is the outcome from a number of parameters, the rather simplistic formula goes something like queryduration = ( iolatency + iothroughput + cputime + blocking )* concurrentusers.

I’ve presented a number of times now on the commodity “reporting brick” I’ve built around an AMD Phenom 1090T, 16GB RAM and 2 x 240GB OCZ IBIS drives in a RAID 0, I’ve shown how all 6 cores are needed and top out in order to get the full potential out of the two IBIS drives, when limited to 1 or 2 cores instead of 1.3GBytes per second of throughput, the throughput drops off to a pitiful (though good still) 130-260Mbytes per second.

SO – are we really bothered about the IO subsystem when it can outperform the CPU cores?

Query speed against data held on rotational disk is not consistent; it is subject to the non-uniform seek latency that is dramatically affected if the data is fragmented or held randomly across the sectors on the disk.

Let’s have a look at the hardware that relates to performance that is a) the IO sub-system for getting the data off and b) the system RAM for the SQL Server buffer pool.

Hardware

Costs and Classification of Solid State Drives

SSD’s can be classified into commodity or enterprise; SATA attached or PCI attached.

Commodity SSD tends to be the cheaper MLC which has a lower write life, MLC is also found in some enterprise SSDs but SLC is predominate because of its higher write life. Cell burn issues and therefore the life of the drive extended by a number of techniques – in the enterprise space they are simply a lot better and there is also more redundancy on the stated drive size.

Costs of SSD can vary dramatically, for enterprise PCI based SSD is a must because of the reduced latency and the hardware is specifically designed around SSD rather than having to use a legacy protocol such as SATA and drivers there of which is designed around rotational disk technology and the problems associated with it for performance optimisation.

The OCZ Z-Drive product line (http://www.ocztechnology.com/ocz-z-drive-r3-p84-pci-express-ssd.html) is currently the best price per Gigabyte; you can buy a 1TB card for £4K (http://www.amazon.co.uk/OCZ-Z-Drive-PCI-Express-SSD-p84/dp/B003EEMIH2/ref=sr_1_4?ie=UTF8&qid=1310737655&sr=8-4) but hold out and wait for the R3 which is due shortly; a single card will give you 1GBytes per second throughput and capable of 135K IOs/second – the read latency will also be a couple of milliseconds at those levels as well – that’s fully random IO!

Just a thought to throw in the pot: consider that when you overwrite a piece of data on the SSD you aren’t necessarily overwriting, you are marking previous cells as deleted and writing information, now, write performance degrades if TRIM hasn’t caught up and reset the cells; so, would it not be logical to buy an SSD that has far more storage capacity than you actually need, for instance, if you only need 240GB for your database, buy one that is twice that, that would reduce the possibility of you hitting cells that you’d have to do a full erase cycle in order to write over.

Costs and Classification of Rotational Drives

We’ve had rotational speeds of 5.4K, 7.2K, 10K and then in 2001 technology hit a wall at 15K rpm, that’s 10 years ago! Even now the size of 15Krpm disks only go up to 147GB, at 10Krpm we have disks up to 900GB. The cost for a 15Krpm 147GB is around £185 ex VAT, the average access time a whopping 3.4 milliseconds.

How many disks would you need to create a disk array that could keep up with a pair of OCZ IBIS Drives in a RAID 0 delivering over 80K 64Kbyte 100% random reads per second at a latency of 1.2milliseconds?

Rotational disks are just fine for the transaction log because so long as it’s the only thing being written to on that array then the seek times are going to be extremely low.

Cost of memory and Limits

I recently cost up a DL385 G7 server, it will hold up to 256GB of RAM; 16GB of RAM cost £575 so for 256GB it would set you back £9.2K.

Summary of Part 1

Enterprise SSDs are still expensive when looked at from a price per GByte, however, from a read/write latency per thousands of IOs/second they are extremely cheap.

Operational care should be taken if the data access pattern is high overwriting (UPDATE) rather than mostly INSERT because you may quickly burn the SSD out. Design considerations come into play here, you can easily implement a type 2 Slowly Changing Dimension (SCD) in the operational system.

In part 2 we will look at the benchmarks from various real world tests against a 50 million row 85 GBytes table which has over 99% fragmentation in the leaf and other levels of the index. You may well be surprised at what you will see and it’s a game changer in terms of the way we currently think.

This thing is lovely, very nice bluetooth keyboard that has nice feedback on the keypress, there is no mouse but you can use the stylus or get yourself a bluetooth mouse, me, I've opted for a Microsoft ARC mouse which is a delight to use, the USB doors are a pain to open for the first time if like me you don't have any finger nails.

It came as a suprise that the slate shows four processors, Dual Core with multi-threading, I didn't really look at the processor I was more interested in the amount of memory and the SSD; you don't get the full 4GB even with the 64 bit version of Windows 7 installed (which I immediately upgraded to Ultimate through my MSDN subscription).

The box is extremely responsive - extremely, it loads Winword in literally a second.

I've got office 2010 and onenote 2010 on there now; one problem is that on applying all (43) windows updates since the upgrade the machine is still sat on step 3 of 3 on the start up configuring updates screen after about an hour, you can't turn this machine off without using a paper clip to reset it and as I have just found you need a paper clip :). Installing Windows 7 SP1 was effortless.

One of the first things I did on it was to reduce the size of the font, by default its set at 125%, my eye sight is ok :) so I've set that back down.

Amazon Kindle for the PC works really well, plenty of text on the screen when viewed portrait, the case it comes with also allows the slate to stand up in various positions - portrait, horizontal - seems stable enough.

The wireless works well, seems to have a better signal than my other two laptop machines which is good news.

The gadget passed the pose test at work :).

I use offline files to keep a copy of all my work stuff locally, I'm not sure what it is, well, its probably my server but whenever I try and sync it runs for a couple of minutes then fails with network name no longer contactable, funnily enough its fine from my big laptop so I can only guess this may be a driver type issue on the EP121 itself - very odd and very annoying.

I do a lot of presenting and need to plug into a VGA project because most sites that's all that is offered, the EP121 has a mini-hdmi output which is great except for this scenario, hdmi is digital, vga is analogue, you will struggle to find a cost effective solution, I found HDFury and also a device HP do, however, a better solution appears to be getting a USB graphics adapter for instance the one I've ordered is the ClimaxDigital USB 2.0 to DVI,VGA or HDMI Adaptor which gives everything I need - VGA and DVI output and great resolution as well - ok, so fingers crossed because I'm presenting next Wednesday in Edinburgh and not taking my 300kg lenovo w700 (I'm sure my back just sighed in relief) - it certainly works really well on my LED TV, the install was simple - it just works!

One of the several reasons for buying this piece of kit was to use it on my LED TV to remote into my main machine to check stuff whilst sat in my living room, also to watch webcasts and lecture videos in comfort away from my office, because of the wireless speed and limitation I'm opting for a USB network adapter from Belkin - that will also allow me to take advantage of my home gigabit network, there are only 2 usb ports on the slate so I'm going to knock up a hub so connecting it in is straight forward and simple, I'm also going to purchase a second power supply so I don't have to faff about with that either.

I now have the developer x64 edition of SQL Server 2008 R2, yes everything :) - about 16GB left to play with on the machine now but that will be fine, I'll put AdventureWorks on there so I can play and demo stuff which is all I'm after from this, my development machine is significantly more powerful and meets my storage needs too.

Travel test this weekend and next week, I'm in Dundee for my final exam for the masters degree.

The Reporting Brick concept is not really anything new, it starts the walk toward bringing the work Jim Gray and Tom Barclay et al did on CyberBricks up-to-date in terms of current kit.

A reporting brick is simply a box built from commodity kit utilising commodity SSD, namely the OCZ IBIS drives to gain extremely high levels of performance for a fraction of the cost required for typical server and san installs today.

I'll write up over the next few months as I work further on the concept, for now the deck attached summarises some of the ideas around it, the deck was presented at last nights London SQL Server User Group, I will be presenting it again in Edinburgh on the 29th June and other locations later in the year.

Deck: Commodity Kit.pptx

 

More Posts Next page »