Benchmarking Hardware and Environments
I suppose it would be fair to say I didn’t pick up the Grumpy Old DBA for no reason, as well as the usual paranoia of a Production DBA , distrust of almost anyone who wants to go near a database and possibly over protective nature regarding databases I have tended to become more cynical and distrustful during my time in IT especially if I’m told I wouldn’t understand or I believe I’m being fobbed off by other teams such as networks, storage, app and web tier folks, developers – in fact almost everyone in my quest for performance or trouble shooting.
A year or so ago I was responsible for the hardware upgrade of a production cluster and this lead to a conundrum; our existing cluster was a Dell R900 four socket quad core - what should we replace it with?
The DBA will obviously want as much bang for buck as possible, but as well as the cost of the hardware there's the obvious cost of the SQL licenses to consider as well as the cpu clock speed and number of cores and so on - this was Enterprise SQL so any notion of a 40 core box was likely to meet some resistance!
The actual configuration of the chosen server is not the point of this blog post but for the record I chose a dual socket 8 core server, 16 cores plus HT so 32 threads.
So here's the rub .. 4 sockets to 2 sockets how can I present a justification for this "downsize" of a critical server which is the backbone of the company, which runs a global system across more than 100 countries 24 x 7 and handles an annual income measured in billions ( USD ).
( Actually one benefit was that the hosting cost of the 2 way box was 50% less than the Dell R900 - in ballpark figures this meant that most servers we replaced would pay for themselves in a 2 to 3 year period )
I started to look around for a software product I could run on any Server ( or PC ) which would produce a set of figures.
I chose Passmark Software https://passmark.com/ and purchased a copy of PerformanceTest, at $27 I think this is quite a bargain.
For the record I have no connections or affiliations with Passmark Software.
Now I could produce benchmarks for cpu, memory and disk
The other advantage is that there is a wealth of published data on the passmark.com website.
So I started to produce figures from servers and for the cpu benchmarks compared back to the published results on the web site, this part is important as I'll explain later.
old Server cpu benchmark 11,025
new Server cpu benchmark 20,907
Just to add a point of reference I use a single Xeon quad core ( no HT ) workstation this has a cpu benchmark of 5,944
The memory benchmarks showed even larger %age improvements.
Disk benchmarks are more tricky I'd say and as the SAN wasn't changing there would be no improvements, but I benchmarked the storage too.
Having benchmarked my server I needed to see if my result matched other reported values for a dual socket box with the same Xeon processors .. and it did.
It's important to not be overwhelmed by the tables of stats, I'd suggest that unless your sql server is running most of its workload in parallel clock speed is possibly more important than number of cores. With hindsight I would have actually bought a dual 6 core which has a higher clock speed than the dual 8 core.
We did in actual fact replace a number of servers in other systems with dual 6 cores which have a cpu benchmark of 22,000.
Now I previously said it was important to both refer to published benchmarks and keep figures for your existing estate, as part of our hardware replacements we bought a new cluster - when the benchmarks were run the cpu value was well below expected 19,072 ( it may well have been less initially as the process to resolve was very protracted to say the least.
Initially my findings were dismissed out of hand - this is one reason to buy a "known" software product and one with published results!
I'll perhaps blog about my experiences with benchmarking using SQL Server and how I got a similar reception.
This particular server had more cores and a higher clock speed and once sorted ( it was an issue within the BIOS ) we got the right value 26,200.
We're just building a new cluster for another system and I ran my checks, this showed that the cpu mark on the two nodes was very different, by about 33% , yet another BIOS setting. Again my test results were disputed but I'm used to this so I just forwarded my document which has data for a very large number of our servers.
So should the DBA apply due diligence, what is the scope of responsibility for the DBA? In my time I've been heavily involved in storage, after being told "You wouldn't understand" when I questioned the performance of a SAN I promptly put myself through two certifications on Fibre Channel Architecture with SNIA. I'm reasonable with the o/s having started with NT 4 training courses many years ago, however a technical interview recently showed my knowledge of windows clustering was slightly lacking :) And how about the integration with the various application tiers and network?
Anyway I'd suggest that getting a set of benchmark values for your hardware is important if nothing else but to check all is as expected.
I wonder how many have been able to show any hard figures for the impact of Hyperthreading or Turbo Boost ?
Here's some figures for a dual socket 8 core server.
Both cpu with Turbo Boost and HT 21,491
Both cpu no Turbo no HT 16,916
Both cpu no HT 15,995
Single cpu with Turbo Boost and HT 14,107
No I can't explain all the figures and I was only able to perform very limited testing.
Finally a couple of examples of why this is important.
I was able to discover that the volumes of SAN presented to one server were not as expected.
SAN storage is a subject to itself but in very simple terms there are classes of san storage available to me at differing performance and cost ( faster = more expensive ) The storage presented was of a lower performance than I expected - Only because I had sets of benchmarks was I able to pick up on this.
And a somewhat obscure matter, in a particular server model if you populate all the memory slots the memory speed drops down a notch, e.g. 1866 runs at 1600 . A benchmark showed two servers with the same memory speed with different values - investigation showed that the memory config in the lower benchmark server wasn't as I had specified, we're back to due diligence again you see, and although the memory speed step down isn't massive it will impact the throughput for SQL Server.
And yes SQL tests and benchmarks are very important but it's important to make sure the hardware platform is optimal.
These are all physical servers .. I know there are benchmarks available on virtualised environments - I'd suggest the same types of tests should apply here .. who's to know if the bios setting in your ESXI or Hyper V host is degrading performance?