Monday, July 18, 2011 7:38 AM
Use GUID’s (uniqueidentifier) for keys when storing data on SSD’s (Part 1 - SSD Backgrounder)
Using GUID’s (uniqueidentifier) for surrogate keys has
always and still is frowned on by database designers and optimisation
consultants alike and rightly so, the fragmentation and randomness of how rows
are stored and accessed can cripple performance with traditional storage
People on all sides are acutely aware of the benefits GUID’s
bring for distributed processing whereby the middle tier is contained on many
servers or the database is spread across many physical machines.
Solid State Drives have been a game changer, they remove
problems associated with traditional rotational disk technology because they
not only offer uniform seek times across the storage but also the read latency
for fully random reads is consistent and extremely low, they do have a weakness
– cells burn out on write intensive workloads, Multiple Layer Cells (MLC) this
around 10K cycles, for Single Layer Cells (SLC) this around 100K cycles and for
the newer Phase Change Memory (PCM) this around 100M (http://www.fastestssd.com/articles/pcm-based-storage-a-game-changer/).
This article seeks to demonstrate that we as database
designers and optimisation specialists need to adjust our approach and realise
SSD’s provide a unique opportunity to scale out as well as up the database and
offer less complicated database implementations especially as now like never
before there is a greater need to spread data across many physical machines.
The article is split into two parts, the first gives background for SSD, and
the second the actual benchmarks.
What is really
important IO size or query duration?
We should only be concerned with query duration, queries should
return in an acceptable duration depending on the query being performed, user
expectation and number of concurrent queries executing. Query speed is the
outcome from a number of parameters, the rather simplistic formula goes
something like queryduration = ( iolatency +
iothroughput + cputime + blocking )* concurrentusers.
I’ve presented a number of times now on the commodity
“reporting brick” I’ve built around an AMD Phenom 1090T, 16GB RAM and 2 x 240GB
OCZ IBIS drives in a RAID 0, I’ve shown how all 6 cores are needed and top out
in order to get the full potential out of the two IBIS drives, when limited to
1 or 2 cores instead of 1.3GBytes per second of throughput, the throughput
drops off to a pitiful (though good still) 130-260Mbytes per second.
SO – are we really
bothered about the IO subsystem when it can outperform the CPU cores?
Query speed against data held on rotational disk is not
consistent; it is subject to the non-uniform seek latency that is dramatically
affected if the data is fragmented or held randomly across the sectors on the
Let’s have a look at the hardware that relates to
performance that is a) the IO sub-system for getting the data off and b) the
system RAM for the SQL Server buffer pool.
Classification of Solid State Drives
SSD’s can be classified into commodity or enterprise; SATA
attached or PCI attached.
Commodity SSD tends to be the cheaper MLC which has a lower
write life, MLC is also found in some enterprise SSDs but SLC is predominate
because of its higher write life. Cell burn issues and therefore the life of
the drive extended by a number of techniques – in the enterprise space they are
simply a lot better and there is also more redundancy on the stated drive size.
Costs of SSD can vary dramatically, for enterprise PCI based
SSD is a must because of the reduced latency and the hardware is specifically
designed around SSD rather than having to use a legacy protocol such as SATA
and drivers there of which is designed around rotational disk technology and
the problems associated with it for performance optimisation.
The OCZ Z-Drive product line (http://www.ocztechnology.com/ocz-z-drive-r3-p84-pci-express-ssd.html)
is currently the best price per Gigabyte; you can buy a 1TB card for £4K (http://www.amazon.co.uk/OCZ-Z-Drive-PCI-Express-SSD-p84/dp/B003EEMIH2/ref=sr_1_4?ie=UTF8&qid=1310737655&sr=8-4)
but hold out and wait for the R3 which is due shortly; a single card will give
you 1GBytes per second throughput and capable of 135K IOs/second – the read
latency will also be a couple of milliseconds at those levels as well – that’s
fully random IO!
Just a thought to
throw in the pot: consider that when you overwrite a piece of data on the
SSD you aren’t necessarily overwriting, you are marking previous cells as deleted
and writing information, now, write performance degrades if TRIM hasn’t caught
up and reset the cells; so, would it not be logical to buy an SSD that has far
more storage capacity than you actually need, for instance, if you only need
240GB for your database, buy one that is twice that, that would reduce the
possibility of you hitting cells that you’d have to do a full erase cycle in
order to write over.
Classification of Rotational Drives
We’ve had rotational speeds of 5.4K, 7.2K, 10K and then in
2001 technology hit a wall at 15K rpm, that’s 10 years ago! Even now the size
of 15Krpm disks only go up to 147GB, at 10Krpm we have disks up to 900GB. The
cost for a 15Krpm 147GB is around £185 ex VAT, the average access time a
whopping 3.4 milliseconds.
How many disks would you need to create a disk array that
could keep up with a pair of OCZ IBIS Drives in a RAID 0 delivering over 80K
64Kbyte 100% random reads per second at a latency of 1.2milliseconds?
Rotational disks are just fine for the transaction log
because so long as it’s the only thing being written to on that array then the
seek times are going to be extremely low.
Cost of memory and
I recently cost up a DL385 G7 server, it will hold up to 256GB
of RAM; 16GB of RAM cost £575 so for 256GB it would set you back £9.2K.
Summary of Part 1
Enterprise SSDs are still expensive when looked at from a
price per GByte, however, from a read/write latency per thousands of IOs/second
they are extremely cheap.
Operational care should be taken if the data access pattern is
high overwriting (UPDATE) rather than mostly INSERT because you may quickly
burn the SSD out. Design considerations come into play here, you can easily
implement a type 2 Slowly Changing Dimension (SCD) in the operational system.
In part 2 we will look at the benchmarks from various real
world tests against a 50 million row 85 GBytes table which has over 99%
fragmentation in the leaf and other levels of the index. You may well be
surprised at what you will see and it’s a game changer in terms of the way we