22 July 2011 16:03 tonyrogerson

Use GUID’s (uniqueidentifier) for keys when storing data on SSD’s (Part 3 - Initial Base line with IOMeter and First Test Group)

Benchmarks (SSD based)

In the following test I’ve tried to be realistic in terms of what a real world query would do - joins! Ordinarily you’d have several tables joining together; this is where you take the performance hit if a) you’ve an IO subsystem that has high read latency and b) lots of fragmentation.

Benchmarks will be performed on SSD and then on rotational disk, the first lot of benchmarks will focus on read queries, writing will be done at a later date where I will discuss the hybrid approach of storage design.

There are three versions of the table each held on its own file group so that there is no object interleaving to interfere and give a clear and clean set of numbers. The three versions are a) table populated with 50 million single INSERTS that has a clustered index on the guid column (TestGuid_INSERT), b) a table populated from the table from a) but as a single insert so there is no fragmentation (TestGuid_SingleINSERT) and finally c) table populated with 50 million single INSERTS that is a heap and has a single non-clustered index on the guid column (TestGUID_HeapINSERT), a padding column of char(1024) is used to pad the row thus causing more page splitting.

The database set up script can be found in the blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2011/07/19/use-guid-s-uniqueidentifier-for-keys-when-storing-data-on-ssd-s-part-2-initial-database-schema.aspx

Base lining the test kit

To give benchmarks a back drop the capabilities of the kit needs to be established, in this instance a number of measures need to be established by placing the kit under specific load types. IOMeter has been used to test raw grunt in terms of IO’s per second, throughput and read latency.

Kit

CPU and Memory

£154.16 - Gigabyte GA-890FXA-UD7 AMD 890FX (Socket AM3)
          PCI-Express DDR3 Motherboard

£170.20 - AMD Phenom II X6 Six Core 1090T Black Edition
          3.20GHz (Socket AM3)

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

£123.25 - 8GB Team Xtreem LV DDR3 PC3-16000 (9-11-9-27)
          Dual Channel Low-Voltage kit

Sub-Total: 570.86

IO Sub System

£108.36 - 80GB OCZ Vertex 2 SATA II 2.5" SSD Solid State Disk
          (285MB/sec read 275MB/sec write)

 £50.67 - OCZSSDPX-1HSDL4P OCZ HSDL 3.5 INCH IBIS 4 port HSDL Card

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

£442.85 - 240GB OCZ IBIS 3.5-inch HSDL High-Speed Data Link SSD
          (read 740MB/s - write 720MB/s)

Sub-Total: 1044.73

Build details

Operating System: Windows 2008 R2 x64

SQL Server 2008 R2 (10.50.1617) x64

C: on OCZ Vertex 2

E: as software RAID 0 over the two IBIS drives and formatted with an allocation size of 64Kbytes, the two IBIS drives internally RAID 0 with the default 64KB stripe.

Although there is 16GB of RAM in the box SQL Server 2008 R2 is fixed at min/max 1024 to put pressure on the buffer pool and thus stretch the I/O system.

IOMeter tests on a 50GB file

104857600 512 sectors

Three transfer sizes have been chosen, 64KB (an extent), 8 KB (a page) and 256KB (read ahead reading multiple extents).

For the 8B and 64KB I’ve given figures for both ends of the sequential and random read spectrum, for the 256KB test 50/50 was chosen because that is more realistic.

Transfer Size

Read or Write

% Rand

% Seq

Outstanding IO

IO's

/sec

Mbytes /sec

Avg I/O (ms)

Max I/O (ms)

CPU %

64

Read

100

0

32

18644

1165

1.71

10.23

27.26

64

Read

100

0

64

21305

1331

3

5.91

26.68

64

Read

100

0

86

21523

1345

3.99

6.75

18.76

64

Read

0

100

32

22718

1419

1.4

2.39

36.57

64

Read

0

100

64

20122

1257

3.17

9.6

34.18

64

Read

0

100

86

21233

1327

4.05

6.83

44.39

8

Read

100

0

32

65627

512

0.48

13.05

30.49

8

Read

100

0

64

60915

475

1.05

3.13

46.24

8

Read

100

0

86

75519

590

1.13

22.73

32.95

8

Read

0

100

32

79506

621

0.4

1.9

29.79

8

Read

0

100

64

79039

617

0.81

2.52

55.61

8

Read

0

100

86

67485

527

1.27

19.27

21.24

256

Read

50

50

32

5211

1302

6.14

10.83

26.64

256

Read

50

50

64

5121

1280

12.46

21.5

31.56

256

Read

50

50

86

5200

1300

16.5

34.92

27.79

 

Populating the data

The clustered index with 50 million inserts took 15 hours 4 minutes to populate the table; the 50 million inserts into a heap took 7 hours 55 minutes.

The table below shows the results of sys.dm_db_index_physical_stats, you can clearly see that TestGuid_INSERT (having the clustered index) and built from 50 million inserts is somewhat fragmented. The Heap contains almost exactly the same number of pages as the leaf level of the TestGuid_SingleINSERT which has no fragmentation – that is what I’d expect to see. The non-clustered index (index_id 2) which is the primary key (nonclustered) is extremely badly fragmented which again is what is expected because the insert order is purely random.

 

index id

index depth

index level

avg fragmentation in percent

fragment count

avg fragment size in pages

page count

avg page space used in percent

record count

min record size in bytes

max record size in bytes

avg record size in bytes

TestGuid_HeapInsert

0

1

0

3.4

31478

226.9

7142864

91

50000000

1047

1047

1047

TestGuid_HeapInsert

2

4

0

99.2

243713

1.0

243713

68

50000000

25

25

25

TestGuid_HeapInsert

2

4

1

99.9

1087

1.0

1087

69

243713

23

23

23

TestGuid_HeapInsert

2

4

2

50.0

4

1.0

4

84

1087

23

23

23

TestGuid_HeapInsert

2

4

3

0.0

1

1.0

1

1

4

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_SingleINSERT

1

4

0

0.7

48827

146.3

7142858

91

50000000

1047

1047

1047

TestGuid_SingleINSERT

1

4

1

0.0

47938

1.0

47938

46

7142858

23

23

23

TestGuid_SingleINSERT

1

4

2

0.0

321

1.0

321

46

47938

23

23

23

TestGuid_SingleINSERT

1

4

3

0.0

1

1.0

1

99

321

23

23

23

 

 

 

 

 

 

 

 

 

 

 

 

 

TestGuid_INSERT

1

4

0

99.1

10625949

1.0

10625950

61

50000000

1047

1047

1047

TestGuid_INSERT

1

4

1

100.0

47661

1.0

47661

69

10625950

23

23

23

TestGuid_INSERT

1

4

2

99.6

224

1.0

224

66

47661

23

23

23

TestGuid_INSERT

1

4

3

0.0

1

1.0

1

69

224

23

23

23

 

Simple Performance Comparison (Test group 1)

A series of tests will be performed and blogged over the coming weeks after which a conclusion will be drawn.

The first test is simply taking 50,000 of the 50 million rows; each row is equally spaced throughout the table (see SQL below) and then used to join back in to the big table in order to do a MAX. This tests performance of random look ups, basically this test is a real world and what I’d expect in most SQL Server databases with modest indexing strategies and maintenance.

--

--          TEST 1 (on fragmented INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_INSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8192

      avg disk sec/read   0.000

      avg disk reads/sec  48,887

      avg read bytes/sec  400,501,565

 

Table 'TestGuid_INSERT'. Scan count 1, logical reads 10673614, physical reads 25690, read-ahead reads 10709354, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 73134 ms,  elapsed time = 214243 ms.

 

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_INSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_INSERT'. Scan count 0, logical reads 351705, physical reads 6, read-ahead reads 196291, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2231 ms,  elapsed time = 1756 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

--

--          TEST 2 (on fragmented INSERT with heap)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_HEapINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 8211

      avg disk sec/read   0.000

      avg disk reads/sec  5,749

      avg read bytes/sec  47,213,526

 

Table 'TestGuid_HeapINSERT'. Scan count 1, logical reads 244803, physical reads 444, read-ahead reads 278989, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 22074 ms,  elapsed time = 25424 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_HEapINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_HeapINSERT'. Scan count 0, logical reads 345645, physical reads 7, read-ahead reads 185834, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2571 ms,  elapsed time = 1624 ms.

 

*/

 

--

------

 

drop table #t, #lookup

 

 

 

--

--          TEST 3 (on single (non-fragmented) INSERT with clustered index)

--

 

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select guidcol

into #lookup

from (      select guidcol, ROW_NUMBER() over( order by guidcol ) as rn

            from TestGuid_SingleINSERT ) as d

where d.rn % 1000 = 0

go

/*

perfmon:

      avg disk bytes/read 297,763

      avg disk sec/read   0.001

      avg disk reads/sec  3,221

      avg read bytes/sec  959,224,428

 

Table 'TestGuid_SingleINSERT'. Scan count 1, logical reads 7190799, physical reads 21110, read-ahead reads 7190779, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 44960 ms,  elapsed time = 62299 ms.

  

*/

checkpoint

go

dbcc dropcleanbuffers

dbcc freeproccache

go

 

select max( tg.expandrow )

from #lookup as l

      inner join TestGuid_SingleINSERT as tg on tg.guidcol = l.guidcol

go

 

/*

 

Table 'TestGuid_SingleINSERT'. Scan count 0, logical reads 353439, physical reads 6, read-ahead reads 203706, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#lookup'. Scan count 7, logical reads 156, physical reads 0, read-ahead reads 5, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

 SQL Server Execution Times:

   CPU time = 2386 ms,  elapsed time = 1675 ms.

 

*/

 

 

Findings of Comparison (Test group 1)

Population of #lookup

 

Avg Disk Bytes/read

Avg Disk Sec/Read

Avg Disk Reads/sec

Avg Read Bytes/sec

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

8,192

0.000

48,887

400,501,565

10,673,614

25,690

10,709,354

73,134

214,243

TestGuid_HeapINSERT

8,211

0.000

5,749

47,213,526

244,803

444

278,989

22,074

25,424

TestGuid_SingleINSERT

297,763

0.001

3,221

959,224,428

7,190,799

21,110

7,190,779

44,960

62,299

 

The Heap clearly comes out on top because SQL Server is using the non-clustered index to get the list of rowguid’s to use so significantly less data to read.

Interestingly the read-ahead isn’t reading anything greater than a single page, this is what you’d expect because the read-ahead works best on contiguous data hence the contiguous TestGuid_SingleINSERT table has a high average bytes per read.

MAX( )

 

Logical Reads

Physical Reads

Read-ahead

CPU (ms)

Elapsed (ms)

TestGuid_INSERT

351,705

6

196,291

2,231

1,756

TestGuid_HeapINSERT

345,645

7

185,834

2,571

1,624

TestGuid_SingleINSERT

353,439

6

203,706

2,386

1,675

 

Given 50,000 rows (#lookup) against the 50 million row table the results of the join are similar in all cases regardless of how fragmented the data is.

Initial findings would suggest that random row lookups on data stored on SSD have uniform seek performance regardless of fragmentation and the size of the table because the nature of the join used in this case Nested Loops.

Plan for lookup against TestGuid_INSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_INSERT].[PK__TestGuid__AB46272D7F60ED59] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

Plan for lookup against TestGuid_HeapINSERT

  |--Stream Aggregate(DEFINE:([Expr1006]=MAX([partialagg1007])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1007]=MAX([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003], [Expr1011]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Compute Scalar(DEFINE:([Expr1010]=BmkToPage([Bmk1003])))

                      |    |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1009]) WITH UNORDERED PREFETCH)

                      |         |--Sort(ORDER BY:([l].[guidcol] ASC))

                      |         |    |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |         |--Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT].[PK__TestGuid__AB46272C1273C1CD] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

                      |--RID Lookup(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_HeapINSERT] AS [tg]), SEEK:([Bmk1003]=[Bmk1003]) LOOKUP ORDERED FORWARD)

Plan for lookup against TestGuid_SingleINSERT

  |--Stream Aggregate(DEFINE:([Expr1005]=MAX([partialagg1006])))

       |--Parallelism(Gather Streams)

            |--Stream Aggregate(DEFINE:([partialagg1006]=MAX([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[expandrow] as [tg].[expandrow])))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:([l].[guidcol], [Expr1007]) OPTIMIZED WITH UNORDERED PREFETCH)

                      |--Table Scan(OBJECT:([tempdb].[dbo].[#lookup] AS [l]))

                      |--Clustered Index Seek(OBJECT:([GUIDTest_SSD].[dbo].[TestGuid_SingleINSERT].[PK__TestGuid__AB46272D08EA5793] AS [tg]), SEEK:([tg].[guidcol]=[tempdb].[dbo].[#lookup].[guidcol] as [l].[guidcol]) ORDERED FORWARD)

 

In the next blog instalment I’ll look at how the I/O system can outperform the cores in the box creating a CPU bottleneck, we’ll then return to more tests and finally a conclusion.

Any comments feel free to email me privately tonyrogerson@torver.net

Comments

No Comments