July 2011 - Posts

Time for new licensing models?
24 July 11 10:41 PM | GavinPayneUK | with no comments

For several years now we’ve become accustomed in medium and large environment to using the per-Processor licensing models. The main driver for this has been that in the Internet environment counting, let alone identifying, our end users has become almost impossible so being able to assign each of them a per-User licence has become almost impossible.

In the SQL Server world the per-Processor model gave us an “all you can eat” option whereby as long as the physical CPUs were adequately licensed you could have as many CPU cores, as much server memory, or as many end users as you like, and well as various number of instances of SQL Server depending on your edition.

All you can eat

Per-processor licences are expensive, there’s no way of suggesting they’re not, but these days the amount of server resource we can use with a single processor licence is massive compared to what it was just 2 or 3 years ago; I regularly work with servers now with 24 or 32 logical CPUs and 64GB of memory and these are considered not quite but almost entry level servers. With Intel and AMD’s CPUs now coming with 6, 8 or 10 cores, plus Hyper-Threading, that’s quite a few logical CPUs.

Microsoft are different to some of their competitors in that they only require us to licence the physical sockets in use, whereas other database vendors are interested in the number of cores that are in use too, often pushing the licensing price up a lot.

VMware’s new licensing model

VMware, the virtualisation software company, probably has its software deployed on servers near to or equally as large as SQL Server does, and almost certainly in greater numbers. As a result their software previously did have CPU core limits, although to be fair the server hardware at the time meant these limits weren’t often reached.

In recent weeks VMware have released version 5 of their premier virtualisation platform vSphere. Along with a range of new features for medium and large enterprises they’re also released a new licensing model. Now not only is the CPU resource a licence constrained commodity but so is memory.

VMware have introduced the concept of vRAM, a total amount of memory which can be assigned to virtual machines at any point in time. Now as you buy more licences not only does the number of CPU cores you can use increase but so does the amount of memory you can use.

The future of licensing?

For now, VMware’s licensing model change is being seen as “bad” because it’s different, and will mean users will have to potentially pay more money to use the server hardware they’ve only just installed vSphere 4.1 on. There was a similar response when SQL Server 2008 R2 Enterprise Edition dropped support for an unlimited number of virtualised instances. Was that the first time we saw the beginning of Microsoft’s reaction to “all you can eat” licensing on modern server hardware?

The reality is from the vendor’s perspective something has to be done to maintain their revenues as server capabilities increase at the rate they have done in recent years. Microsoft have obviously been able to afford to offer an “all you can eat” per-Processor model albeit with a change in the definition of unlimited, but it will be interesting to see how the next generation of on-premise licensing models adapt to the multi-core multi-GB memory servers that we’re regularly deploying today.

For more information on the recent changes to the VMware licensing model please see the link here.

Ultra-high CPU Usage
06 July 11 10:45 PM | GavinPayneUK | 1 comment(s)

In the last couple of weeks I've seen a few database servers where sqlserver.exe was almost constantly at 100% CPU usage, usually they were servers I’d never seen before which didn’t help!

If you've had your environment designed or tuned by the SQLCat team then you’d probably be happy that you were getting every ounce of capability out of your hardware. Reality for most people is that 100% CPU usage is bad and indicates something somewhere is wrong, at the same time your users will all have probably stopped working as well.


Sometimes the initial reaction is to either find/buy more CPUs, or even more tactically, re-start SQL Server.  Neither of these are a long term solution, one just means more CPU resource will get burnt when the problem re-occurs, the other is a very short term fix giving perhaps a few minutes grace until the problem re-occurs.

 

Where to begin looking

The first place I look when I see a system in an ultra-high CPU state is outside of SQL Server and at what Windows and the hardware is doing.  I perform a quick check of some vital system stats:

Windows Pages/sec, % Pagefile utilisation, Available Memory
Disk latencies, IOPs and KB/sec
NIC % utilisation
Sometimes I’ll also look at % Privileged Time if only to tell me whether to check out anti-virus, DoubleTake or other system software.

Then I’ll look at the SQL Server errorlog.  This was where I found evidence of a recent cause of ultra-high CPU usage:

SQL Server’s memory being paged

On one server the backup software had a memory leak and was using over 3GB of memory.  This caused Windows to page out everything and anything it could find, including SQL Server’s memory.  Events were written to the SQL Server errorlog saying that “A significant part of sql server process memory has been paged out” and at the same time the % pagefile utilisation counter was up at 60%.  Stopping the backup service freed up over 3GB of memory and almost immediately the CPU usage by sqlserver.exe dropped from 100% to…10%.

 

The second cause I’ve seen a couple of times recently again had very little to do with CPU cycles from the outside but is an area which we all know can have a huge impact on SQL Server’s performance:

Missing Indexes

After the two checks I mentioned above the next thing I do is open Activity Monitor within Management Studio.  In an ideal world I’d use Perfstats to perform a more much detailed capture and then analyse it in SQLNexus but usually by this time you start hearing comments like “the system is down” and “no one can work”.

Within 60 seconds you can start to see trends in the top performing queries, whether ordered by CPU, executions or reads.  With those top performing queries I then get their execution plans into SQL Sentry’s Plan Explorer. 

Plan Explorer does what I need it to do for me in a hurry, it finds and highlights the most expensive operations in those queries and at this point I will often find:

  • Missing clustered indexes
  • Key/RID lookups
  • Table and index scans

In some situations the creation of 4 or 5 carefully selected “emergency” indexes based on the observations from Plan Explorer can also drop the CPU usage down from 100% to single digit values.  These can be turning heaps into clustered indexes to reduce locking and latch contention, implementing covering indexes to stop lookups or converting expensive table scans into index seeks.

If that doesn’t help then the next area I check is:

Statistics and Index Housekeeping

As we know SQL Server’s query optimiser loves having up-to-date statistics while its IO processes hate logical or physical fragmentation.  On some systems there is only weekly re-indexing which in a database with missing clustered indexes or heavy page split rates can become heavily fragmented in a matter of hours, so after a few days performance can really struggle.  In one case a check of the stats data using DBCC showed that running sp_updatestats was well overdue and fortunately 90 seconds later bought the CPU utilisation down from 100% to something much lower.  Re-indexing on a production system isn’t something I’d normally want to do, but again if the data shows it needs running and the system is already “dead” to the business users it might be a decision you take. 

 

Your Mileage Will Vary Along With Your Troubleshooting Methods

The observations and solutions I’ve written about above were point in time situations and remedies which I may never see again, however they are useful checks for when I see a poor performing system again.  They’re not the only causes of ultra-high CPU usage and my remedies weren’t the only solutions to the issues we faced.  There are also much better ways to more accurately diagnose high CPU usage, however if sharing my experiences means someone who does a Google search in a hurry finds some benefit from them I’ll feel this post was worthwhile.