Ever wondered how calculate estimated data loss (time) for always on. The metric in the always on dashboard shows the metric quite nicely but there does seem to be a lack of documentation about where the metrics ---come from. Heres a script that calculates the data loss ( lag ) so you can set up alerts based on your DR SLA's:

 

 

 

WITH DR_CTE ( replica_server_name, database_name, last_commit_time)

AS

                (

                                select ar.replica_server_name, database_name, rs.last_commit_time

                                from master.sys.dm_hadr_database_replica_states  rs

                                inner join master.sys.availability_replicas ar on rs.replica_id = ar.replica_id

                                inner join sys.dm_hadr_database_replica_cluster_states dcs on dcs.group_database_id = rs.group_database_id and rs.replica_id = dcs.replica_id

                                where replica_server_name != @@servername

                )

select ar.replica_server_name, dcs.database_name, rs.last_commit_time, DR_CTE.last_commit_time 'DR_commit_time', datediff(ss,  DR_CTE.last_commit_time, rs.last_commit_time) 'lag_in_seconds'

from master.sys.dm_hadr_database_replica_states  rs

inner join master.sys.availability_replicas ar on rs.replica_id = ar.replica_id

inner join sys.dm_hadr_database_replica_cluster_states dcs on dcs.group_database_id = rs.group_database_id and rs.replica_id = dcs.replica_id

inner join DR_CTE on DR_CTE.database_name = dcs.database_name

where ar.replica_server_name = @@servername

order by lag_in_seconds desc

Posted by blakmk

Alright this is actually a follow up post from Gethyn Ellis post SELECT * FROM SQLBLOGGERS WHERE LOCATION = ‘UK’ . Where he composed a list of UK bloggers so I thought id summarize a list of Sql folk that tweet, but rather than make the list static I will just point you towards the list which I will keep up to date:

http://twitter.com/#!/blakmk/sqlserver-uk

It actually summarises people titles pretty well when viewed through DABR

http://dabr.co.uk/lists/blakmk/sqlserver-uk

I will keep this list updated so you are welcome to follow if you find it useful. If anyone feels left out, contact me and I will happily add you to the list.

Posted by blakmk | 1 comment(s)
Filed under: , ,

I came across an interesting situation where a developer was trying to connect to a named instance using a DNS alias without specifying the instance name. Coincidently though he remembered to include the port number and miraculously it worked. So it appears that sql server accepts connections to a specific instance based on its port number.

While it may not seem to particularly useful, I can imagine it could be used in the following situations:

  • To mirror to a server with a different instance name (but same port number)
  • To hide the complexity of instance names from end users and just rely on port number (and optionally dns alias)
Posted by blakmk | 1 comment(s)

Ever since I have been working with Named Sql Instances using static ports, it has been a pain. 

Connection strings need to be recoded to include the port syntax:

  • <ServerName>[\InstanceName],<Port>
  • <IP Address>[\InstanceName],<Port>

This can be a bit of a drag when you have multiple config files on an application server for example. Setting up client side aliases avoids the need to manually recode connection strings. However if you have multiple servers and client machines to manage, this can be a pain, you dont want to manually set up every server on every machine. I was recently searching for a way to transfer these aliases around and was assisted by Joe Stefanelli who informed me the aliases were actually stored in the registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo

To transfer these entries to another machine:

Navigate to that location in regedit and use the File -> Export... menu option to export the branch to a .REG file. Then copy that file to the new machine and use File -> Import... to load the keys you just exported into the new machine's registry.

Mark Broadbent (blog | twitter) also gave me a few ideas about how to take this registry file and roll it out via group policy. As Mark informs me these aliases can also be used for DR purposes by repointing the alias at a new server. This is a pretty neat trick I thought

Posted by blakmk

I  was recently invited to a client site where I hadn't been in a couple of months. I could smell the neglect as soon as I examined the server:

  • Error logs ripe with deadlock events
  • Horrific sql in stored procedures
  • Agent jobs failing
  • Database maintenance jobs disabled due to lack of space to perform index rebuilds.
  • The databases had also been shrunk several times leading to excessive fragmentation in the tables indexes.

Contrary to what you thinking, this wasnt a company staffed by morons on a non critical application, it actually had some pretty talented developers (including a database specialist) working and managing a high throughput OLTP system. However, a developer is a developer and the mindset is different to a DBA.

No decent DBA in there right mind would disable Sql Server maintenance jobs and shrink a LIVE database without immediatly rebuilding the indexes. Steps would have been taken to order extra space as it became low. Innefficient stored procedures would have been rejected before they entered the LIVE environment. 

What im trying to say is Dev guys and Ops people have different priorities and while a Dev is happy to churn out as much code as possible an Ops DBA is busy conserving and preserving the revenue that is coming in along with the maximising the investment on server hardware. I also dont think its totally a personality thing, I have quite happily fitted into both roles at different times but rarely within the same company. If you wear a hat, wear it totally....

 

 

 

 

Posted by blakmk | 1 comment(s)
Filed under: ,
While figuring out a consolidation strategy for Sql Server, I was concearned that the memory issues (http://blog.mischel.com/2008/10/14/copying-large-files-on-windows/)  affecting large file copies would also effect Sql server backups to a cifs location. The reason I wanted to do this was to consolidate the existing backup server onto the new hardware. It was ageing and was limited in its current throughput. As the backup disk was SAN based it should be easy to unplug from the old server and plug into the new server. However, knowing consolidation projects as intimately as I do, I know it could take a while to consolidate the remaining Sql Servers onto the new box. I didn’t want the backups from the old servers to starve the new server of memory as it does this.

So I ran some tests on the new server which was running:

  • Windows 2008 server
  • 48Gb RAM
  • 24 Cores

 

First I created a network share and backup folder. I then backed up a 40Gb file from a Sql 2005 Windows 2003 Server. The server had just rebooted and there was 40Gb of headroom. Monitoring the RAM usage I only observed a 1.8Gb increase in RAM and 100Mb increase in page file.

 

I then checked the impact that the Netbackup tape backup software  had in copying the file to tape and it appears that this process only took up 200Mb.

 

Pushing a file from the 2008 server consumed 1 Mb and pulling a file consumed 30 MB of RAM.

 

Job done, while I can see there is still and impact of copying and backing files to a server, its not huge compared to the file size its working with . As long as I leave an appropriate amount of free memory and lock pages in memory I am not likely to see major implications of this.

Emil Fridriksson also has a couple of extra tweaks that can be made if the problem becomes quite severe.

 

 

 

When I visit a client site to troubleshoot deadlocks, it is always a cause of annoyance when I cannot go through historical deadlock events to determine the frequency and causes of these deadlocks. I either have to talk them through enabling trace flag 1222 or explain to them how to enable profiler to capture specific deadlock graphs. Even then its not possible to retrospecivley go back and examine deadlocks before this point.

I recently came accross this article by Jonathan Kehayias about how to retrieve historical deadlock graphs with 2008 extended events. These scripts have proved invaluable for me in going back in time.

You may need to upgrade to Service Pack 2 to get it to work correctly, but heres a copy of the final script:

declare @xml xml

select @xml = target_data

from sys.dm_xe_session_targets

join sys.dm_xe_sessions on event_session_address = address

where name = 'system_health'

select CAST(

REPLACE(

REPLACE(XEventData.XEvent.value('(data/value)[1]', 'varchar(max)'),

'<victim-list>', '<deadlock><victim-list>'),

'<process-list>','</victim-list><process-list>')

as xml) as DeadlockGraph

FROM

(select @xml as TargetData) AS Data

CROSS APPLY TargetData.nodes ('RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData (XEvent)

An often neglected part of Consolidation is the network configuration of a server. If you have mirroring set up this is particuarly important because the network throughput can be in the same magnitude of the transaction log throughput. Generally if I am limited to four NIC's available I will team two and make them available for general application access, allocate one to an external backup device (iSCSI or a CIFS UNC path) and one for a dedicated connection to the mirrored server. This kind of configuration is for a very basic setup. If there are replication topolgies in place or a scaled out BI Infastructure on a bigger server I would also look at dedicated NIC's for this kind of trafiic. I always set my network cards to Full Duplex.

To get round the issue of legacy applications that dont support the failover partner syntax with mirroring, I often use DNS aliases that I talk about in my previous blog post. When I'm utilising mirroring over a WAN and I want to reduce the network sent over the very precious WAN link, I will disable encryption and utilise riverbed hardware as mentioned here. To force the routing of traffic via a dedicated DR connection I make an entry in windows hosts file for the mirrored server.

While this is not meant to be an exhaustive list, but it does illustrate a few best practices i've discovered. If anyone has any more tricks they would like to share, please let me know.

 

 

 

Posted by blakmk | 2 comment(s)
Filed under: ,

Following on from my comments about CPU configuration I have decided to restate my position to "it depends....". 

Having performed some investigation in the Sql community, Glenn Berry ( Twitter | Blog )  suggested that Hyper Threading should be enabled for the new Xeon 55xx, 56xx, 65xx, and 75xx (Nehalem)  processors and pointed me in the direction of the TPC-E benchmarks for evidence of their performance.

According to the benchmarks in this article http://www.qdpma.com/Benchmarks_TPCH.html  OLTP systems (TPC-E) benefit from a 30% performance gain from hyperthreading while Data Warehousing  (TPC-H) benefitted only 10% from hyperthreading.

Still concearned about the potential issues with Hyper-threading I descided to investigate further. I found this great article by Joe Chang suggesting that many of the problems with the original Hyperthreaded Netburst processors have gone away with the advent of Nehalem. He just suggested setting MAXDOP to 4 as a general rule.

Given that there is such a great performance gain to be had for Hyperthreading with OLTP, I do intend to turn it on for servers containing the new Nehalem chipset. For DW servers, I will still leave it disabled as I have spent many an hour tracking down various issues with CXPACKET and Intra Query Parallel Thread Deadlocks. I still dont trust the technology enough to go there at this point.

@Blakmk

 

 

Posted by blakmk | 8 comment(s)

So i only seem to get inspired when im totally immersing myself in a project. Having been busy for a while with a few projects requiring me to either consolidate or upgrade servers to 2008 and 2008 R2, I finally decided to put my resistance aside and Blog. This particular blog will be devoted to CPU configuration.

CPU affinity

One of the first things I like to do when I begin to consolidate a server, is to calculate how many cpu's it needs allocated to it. Often I create spreadsheet similar to the following to make this more visible:

  Numa Node  0 Numa Node  1 Numa Node  2
  CPU 0  CPU 1 CPU 2 CPU 3 CPU 4 CPU 5
Instance 1 Y Y        
Instance 2     Y Y    
Instance 3         Y Y
Instance 4     Y Y Y Y

On hardware with 16 plus processors, my preference is to reserver CPU 0 for exclusive use by the operating system. When  ever I set CPU affinity for a server, I will also set I/O affinity to the same CPU. I have not yet found a reason to change this.

Now I know some of you may find CPU binding a bit stingy but there are few reasons why I do this:

Less is more

I like to set CPU and I|\O affinity on consolidated servers and servers with many CPU's ( 12-16). My ultimate rule for this is sometimes, less is more. Sometimes we think the more the resources we give something the faster it will run. And sometimes not.

I have seen batch jobs complete 4 times faster on old servers with 4 cpu's than modern servers with 16. Why? NUMA and multiplexing. There is an overhead when accessing memory from foreign NUMA nodes (discussed in more detail below)  Modern processors work most effectivly when they are allocated a similar kind of workload, one where they can easily utilise and reuse the Level 1/2 & 3 cache. Each time a new task is allocated to its processor, it has to reload its cache. It can also be the case (and often is) that  the processor may not be available to give full attention to the process in hand.

The more you get the more you want

I have often seen with consolidation projects that developers quickly forget about being efficient with code. With the advent of new hardware, all the abysmal code that runs like a dog gets forgotton. For a while the problems go away but instead of being contained on its own server, the problem has infected other database with its contention for resources. Thats why I like CPU affinity because I can easily silo off performance issues and again force developers to tune application issues as they arise.

 

Level 2 & 3 Cache

Careful examination of TPC benchmarks will give you a real insight into the effects of Level 2 and 3 Cache. In Essence this cache give ultra fast memory access to the processor to be used for the temporary storing of calculations and lookup data. A larger cache in this area will often give better results than the use of faster processors for Sql Server Processing. Modern HP blade servers have even sacrificed Level 2 Caches for the sake of much larger Level 3 caches. I believe this is worthwhile sacrifice and worth considering when looking for the ultimate bang for your buck.

Non Uniform Memory Access (NUMA)

In most modern hardware now NUMA is enabled. The subject of NUMA is quite large but in short, NUMA governs the way in which memory banks are shared between groups of cores.  Memory access between NUMA nodes is quite slow so the advantage of having Sql Processes access memory on local NUMA nodes is quite big. Using CPU affinity to bind  Sql Instances to  a specific or group of NUMA nodes increases the possibility that the memory will be local. Therefore increasing performance.

When configuring CPU affinity you should always have in mind, which NUMA nodes relate to which CPU's.

 

Hyperthreading

One of my clients today asked me recently whether it was good to use hyperthreading in Sql 2008 release 2 and while my initial answer was no I decided to a little research. Some people I spoke to mentioned it would not be an issue with modern chipsets and others veered away from the subject. As usual with Sql Server the answer came back it depends..... Ultimatley without any compelling reason to use hyperthreading my advice to anyone would be to avoid.

UPDATE

See this blog post about my latest opinion on this issue:

 

 

 

@blakmk

 

More Posts Next page »