Consolidation, capacity planning and the art of asking questions

I was recently asked to investigate a performance issue with a Sql server that had gradually started to die. Performance was dire and connectivity had become sporadic and services were reguarly stopping and needed to be restarted.

The event log was littered with errors like the following being issued every second:

External dump process return code 0x20000001.

 
  * MSASN1                         76190000  761A1FFF  00012000
2010-06-29 17:22:37.95 spid8s      * USERENV                        76920000  769E1FFF  000c2000
2010-06-29 17:22:37.95 spid8s      * WINMM                          76AA0000  76ACCFFF  0002d000
2010-06-29 17:22:37.95 spid8s      * opends60                       333E0000  333E6FFF  00007000
2010-06-29 17:22:37.95 spid8s      * NETAPI32                       71C40000  71C96FFF  00057000
2010-06-29 17:22:37.95 spid8s      * BatchParser                    520C0000  520DEFFF  0001f000
2010-06-29 17:22:37.95 spid8s      * comctl32                       77420000  77522FFF  00103000
2010-06-29 17:22:37.96 spid8s      * odbcint                        00900000  00916FFF  00017000
2010-06-29 17:22:37.96 spid8s      * psapi                          76B70000  76B7AFFF  0000b000
2010-06-29 17:22:37.96 spid8s      * instapi10                      00A80000  00A89FFF  0000a000
2010-06-29 17:22:37.96 spid8s      * sqlevn70                       4F610000  4F80DFFF  001fe000

2010-06-29 17:22:37.96 spid8s      * -------------------------------------------------------------------------------
2010-06-29 17:22:37.96 spid8s      * Short Stack Dump
2010-06-29 17:22:37.96 spid8s      00000000 Module(UNKNOWN+00000000)
2010-06-29 17:22:37.96 spid8s      01792A21 Module(sqlservr+00792A21)
2010-06-29 17:22:37.96 spid8s      015CCD18 Module(sqlservr+005CCD18)
2010-06-29 17:22:37.96 spid8s      015CCB91 Module(sqlservr+005CCB91)
2010-06-29 17:22:37.96 spid8s      015CB9B7 Module(sqlservr+005CB9B7)
2010-06-29 17:22:37.96 spid8s      015CB817 Module(sqlservr+005CB817)
2010-06-29 17:22:37.96 spid8s      015CBA7B Module(sqlservr+005CBA7B)
2010-06-29 17:22:37.96 spid8s      015BE5A5 Module(sqlservr+005BE5A5)
2010-06-29 17:22:37.96 spid8s      015BE6D6 Module(sqlservr+005BE6D6)
2010-06-29 17:22:37.96 spid8s      015BE38F Module(sqlservr+005BE38F)
2010-06-29 17:22:37.96 spid8s      0112F47D Module(sqlservr+0012F47D)
2010-06-29 17:22:37.96 spid8s      0112E2FA Module(sqlservr+0012E2FA)
2010-06-29 17:22:37.96 spid8s      015943B9 Module(sqlservr+005943B9)
2010-06-29 17:22:37.97 spid8s      0112E8E8 Module(sqlservr+0012E8E8)
2010-06-29 17:22:37.97 spid8s      781329BB Module(MSVCR80+000029BB)
2010-06-29 17:22:37.97 spid8s      78132A47 Module(MSVCR80+00002A47)
2010-06-29 17:22:37.97 spid8s      Stack Signature for the dump is 0x50D4088A
2010-06-29 17:22:38.37 spid8s      External dump process return code 0x20000001.

Upon investigation I found that there were 5 active instances that were not intensly loaded. This would normally not be a problem for a 4 core server. But then I examined the RAM for the server and there was only 2Gb. One of the instances (the latest to be utilised) had over a 1000 active user connections and NONE of the instances had max server memory set.It was clear was what happening after looking at the paging activity. Basically there was  memory contention between the various Sql Server instances which had brought the server to its knees.

This server had gradually had more and more instances allocated to it and because the CPU had not maxed out it was considered to have more headroom. No one ever thought of looking at RAM or even the Disk or  Network subsystems to see if they were sufficient to the database that was going to be hosted. The last instance to be consolidated which actually turned out to be the biggest, was litterally the straw that broke the camels back.

Now I know we are all pushed to find cost savings and consolidate as much as possible onto existing hardware but the learning from all of this is to ask questions when you get requested to host a new Instance (or even a DB) on an existing server. Its not enough to work off predicted CPU and database size. You also need to ask questions like how many users is it going to support, what kind of disk throughput is it going to experience, whats the network load likely to be and most importantly what kind of memory signature is it likely to have.

Microsoft recommends setting max server on all Sql Servers but this is definatly the case when you have multiple instances running on the server. Doing this kind of exercise will alert you immediatly to any issues regarding the amount of actual physical memory and make sure you never end up running 5 instances on 2Gb of ram.

@Blakmk

Published 01 July 2010 18:36 by blakmk

Comments

# Twitter Trackbacks for Consolidation, capacity planning and the art of asking questions - The Tao Of Sql Server [sqlblogcasts.com] on Topsy.com

Pingback from  Twitter Trackbacks for                 Consolidation, capacity planning and the art of asking questions - The Tao Of Sql Server         [sqlblogcasts.com]        on Topsy.com

# Introducing Greenest Host | Uncategorized | Information about Web Hosting, Web hosting service!

Pingback from  Introducing Greenest Host | Uncategorized | Information about Web Hosting, Web hosting service!