Ever wondered how calculate estimated data loss (time) for
always on. The metric in the always on dashboard shows the metric quite nicely
but there does seem to be a lack of documentation about where the metrics
---come from. Heres a script that calculates the data loss ( lag ) so you can
set up alerts based on your DR SLA's:
WITH DR_CTE ( replica_server_name, database_name,
last_commit_time)
AS
(
select
ar.replica_server_name, database_name, rs.last_commit_time
from
master.sys.dm_hadr_database_replica_states rs
inner join
master.sys.availability_replicas ar on rs.replica_id = ar.replica_id
inner join
sys.dm_hadr_database_replica_cluster_states dcs on dcs.group_database_id =
rs.group_database_id and rs.replica_id = dcs.replica_id
where replica_server_name !=
@@servername
)
select ar.replica_server_name, dcs.database_name,
rs.last_commit_time, DR_CTE.last_commit_time 'DR_commit_time', datediff(ss,
DR_CTE.last_commit_time, rs.last_commit_time) 'lag_in_seconds'
from master.sys.dm_hadr_database_replica_states rs
inner join master.sys.availability_replicas ar on
rs.replica_id = ar.replica_id
inner join sys.dm_hadr_database_replica_cluster_states dcs
on dcs.group_database_id = rs.group_database_id and rs.replica_id =
dcs.replica_id
inner join DR_CTE on DR_CTE.database_name =
dcs.database_name
where ar.replica_server_name = @@servername
order by lag_in_seconds desc
Alright this is actually a follow up post from Gethyn Ellis post SELECT * FROM SQLBLOGGERS WHERE LOCATION = ‘UK’ . Where he composed a list of UK bloggers so I thought id summarize a list of Sql folk that tweet, but rather than make the list static I will just point you towards the list which I will keep up to date:
http://twitter.com/#!/blakmk/sqlserver-uk
It actually summarises people titles pretty well when viewed through DABR
http://dabr.co.uk/lists/blakmk/sqlserver-uk
I will keep this list updated so you are welcome to follow if you find it useful. If anyone feels left out, contact me and I will happily add you to the list.
I came across an interesting situation where a developer was trying to connect to a named instance using a DNS alias without specifying the instance name. Coincidently though he remembered to include the port number and miraculously it worked. So it appears that sql server accepts connections to a specific instance based on its port number.
While it may not seem to particularly useful, I can imagine it could be used in the following situations:
- To mirror to a server with a different instance name (but same port number)
- To hide the complexity of instance names from end users and just rely on port number (and optionally dns alias)
Ever since I have been working with Named Sql Instances
using static ports, it has been a pain.
Connection strings need
to be recoded to include the port syntax:
- <ServerName>[\InstanceName],<Port>
- <IP
Address>[\InstanceName],<Port>
This can be a bit of a drag when you have multiple config
files on an application server for example. Setting up client side aliases avoids the need to manually recode connection strings.
However if you have multiple servers and client machines to manage, this can be
a pain, you dont want to manually set up every server on every machine. I
was recently searching for a way to transfer these aliases around and was assisted by Joe
Stefanelli who informed me the aliases were actually stored in the registry
key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client\ConnectTo
To transfer these entries to another
machine:
Navigate to that location in regedit and use the
File -> Export... menu option to export the branch to a .REG file. Then copy
that file to the new machine and use File -> Import... to load the keys you
just exported into the new machine's registry.
Mark Broadbent (blog |
twitter) also gave me a few
ideas about how to take this registry file and roll it out via group policy. As
Mark informs me these aliases can also be used for DR purposes by repointing the
alias at a new server. This is a pretty neat trick I
thought
I was recently invited to a client site where I hadn't
been in a couple of months. I could smell the neglect as soon as I examined the
server:
-
Error logs ripe with deadlock events
-
Horrific sql in stored procedures
-
Agent jobs failing
-
Database maintenance jobs disabled due to lack of
space to perform index rebuilds.
-
The databases had also been shrunk several times
leading to excessive fragmentation in the tables indexes.
Contrary to what you thinking, this wasnt a company
staffed by morons on a non critical application, it
actually had some pretty talented developers (including a database specialist) working and managing
a high throughput OLTP system. However, a developer is a developer
and the mindset is different to a DBA.
No decent DBA in there right mind would disable Sql
Server maintenance jobs and shrink a LIVE database without immediatly rebuilding
the indexes. Steps would have been taken to order extra space as it became low.
Innefficient stored procedures would have been rejected before they entered the
LIVE environment.
What im trying to say is Dev guys and Ops people have
different priorities and while a Dev is happy to churn out as much code as
possible an Ops DBA is busy conserving and preserving the revenue that is coming
in along with the maximising the investment on server hardware. I also dont
think its totally a personality thing, I have quite happily fitted into both
roles at different times but rarely within the same company. If you wear a hat,
wear it totally....
While figuring out a consolidation strategy for Sql
Server, I was concearned that the memory issues (
http://blog.mischel.com/2008/10/14/copying-large-files-on-windows/)
affecting large file copies
would also effect Sql server backups to a cifs location. The reason I wanted to
do this was to consolidate the existing backup server onto the new hardware. It
was ageing and was limited in its current throughput. As the backup disk
was SAN based it should be easy to unplug from the old server and plug into the
new server. However, knowing consolidation projects as intimately as I do, I
know it could take a while to consolidate the remaining Sql Servers
onto the new box. I didn’t want the backups from the old servers to starve the
new server of memory as it does this.
So I ran some tests on the new server
which was running:
-
Windows 2008 server
-
48Gb RAM
-
24 Cores
First I created a network share and
backup folder. I then backed up a 40Gb file from a Sql 2005 Windows 2003 Server.
The server had just rebooted and there was 40Gb of headroom. Monitoring the
RAM usage I only observed a 1.8Gb increase in RAM and 100Mb increase in page
file.
I then checked the impact
that the Netbackup tape backup software had in
copying the file to tape and it appears that this process only took up
200Mb.
Pushing a file from the 2008 server
consumed 1 Mb and pulling a file consumed 30 MB of RAM.
Job done, while I can see there is still
and impact of copying and backing files to a server, its not huge compared to
the file size its working with . As long as I leave an appropriate amount of
free memory and lock pages in memory I am not likely to see major implications
of this.
Emil Fridriksson also has a
couple of extra tweaks that can be made if the
problem becomes quite severe.
When I visit a client site to troubleshoot deadlocks, it is
always a cause of annoyance when I cannot go through historical deadlock events to determine
the frequency and causes of these deadlocks. I either have to talk them
through enabling trace flag 1222 or explain to them how to enable profiler to
capture specific deadlock graphs. Even then its not possible to retrospecivley go back
and examine deadlocks before this point.
I recently came accross this article by Jonathan Kehayias about how to retrieve
historical deadlock graphs with 2008 extended
events. These scripts have proved invaluable for me in going back
in time.
You may need to upgrade to Service Pack 2 to
get it to work correctly, but heres a copy of the
final script:
declare
@xml xml
select
@xml =
target_data
from
sys.dm_xe_session_targets
join
sys.dm_xe_sessions on
event_session_address = address
where
name = 'system_health'
select
CAST(
REPLACE
(
REPLACE
(XEventData.XEvent.value('(data/value)[1]', 'varchar(max)'),
'<victim-list>'
, '<deadlock><victim-list>'),
'<process-list>'
,'</victim-list><process-list>')
as
xml) as
DeadlockGraph
FROM
(
select @xml as
TargetData) AS Data
CROSS
APPLY TargetData.nodes
('RingBufferTarget/event[@name="xml_deadlock_report"]')
AS XEventData (XEvent)
An often neglected part of Consolidation is the network
configuration of a server. If you have mirroring set up this is particuarly
important because the network throughput can be in the same magnitude of the
transaction log throughput.
Generally if I am limited to four NIC's available I will team
two and make them available for general application access, allocate one to an
external backup device (iSCSI or a CIFS UNC path) and one for a dedicated
connection to the mirrored server. This kind of configuration is for a very
basic setup. If there are replication topolgies in place or a scaled out BI
Infastructure on a bigger server I would also look at dedicated NIC's for this
kind of trafiic. I always set my network cards to Full Duplex.
To get round the issue of legacy applications that dont
support the failover partner syntax with mirroring, I often use DNS aliases that
I talk about in my previous blog
post. When I'm utilising mirroring over a WAN and I want to reduce the network
sent over the very precious WAN link, I will disable encryption and utilise
riverbed hardware as mentioned here.
To force the routing of traffic via a dedicated DR connection I make an entry in
windows hosts file for the mirrored server.
While this is not meant to be an exhaustive list, but it does
illustrate a few best practices i've discovered. If anyone has any more tricks
they would like to share, please let me know.
Following on from my comments about CPU configuration I
have decided to restate my position to "it depends....".
Having performed some investigation in the Sql community,
Glenn Berry ( Twitter | Blog )
suggested that Hyper Threading should be enabled for the new
Xeon
55xx, 56xx, 65xx, and 75xx (Nehalem)
processors and pointed me in the direction of the
TPC-E benchmarks for evidence of their performance.
According to the benchmarks in this article http://www.qdpma.com/Benchmarks_TPCH.html
OLTP systems (TPC-E) benefit from a 30% performance gain from
hyperthreading while Data Warehousing (TPC-H) benefitted only 10% from
hyperthreading.
Still concearned about the potential issues with
Hyper-threading I descided to investigate further. I found this great
article by Joe
Chang suggesting that many of the problems with the original Hyperthreaded
Netburst processors have gone away with the advent of Nehalem. He just suggested
setting MAXDOP to 4 as a general rule.
Given that there is such a great performance gain to be
had for Hyperthreading with OLTP, I do intend to turn it on for servers
containing the new Nehalem chipset. For DW servers, I will still leave it
disabled as I have spent many an hour tracking down various issues with CXPACKET
and Intra Query Parallel Thread Deadlocks. I still dont trust the technology
enough to go there at this point.
@Blakmk
So i only seem to get inspired when im totally immersing myself in a
project. Having been busy for a while with a few projects requiring me to either
consolidate or upgrade servers to 2008 and 2008 R2, I finally decided to put my
resistance aside and Blog. This particular blog will be devoted to CPU
configuration.
CPU affinity
One of the first things I like to do when I begin
to consolidate a server, is to calculate how many cpu's it needs allocated to it.
Often I create spreadsheet similar to the following to make this more
visible:
|
Numa Node 0 |
Numa Node 1 |
Numa Node 2 |
|
CPU 0 |
CPU 1 |
CPU 2 |
CPU 3 |
CPU 4 |
CPU 5 |
Instance 1 |
Y |
Y |
|
|
|
|
Instance 2 |
|
|
Y |
Y |
|
|
Instance 3 |
|
|
|
|
Y |
Y |
Instance 4 |
|
|
Y |
Y |
Y |
Y |
On hardware with 16 plus processors, my preference is
to reserver CPU 0 for exclusive use by the operating system. When ever I set CPU
affinity for a server, I will also set I/O affinity to the same CPU. I have not
yet found a reason to change this.
Now I know some of you may find CPU binding a bit stingy but there
are few reasons why I do this:
Less is more
I like to set CPU and I|\O affinity on consolidated
servers and servers with many CPU's ( 12-16). My ultimate rule for this is
sometimes, less is more. Sometimes we think the more the resources we
give something the faster it will run. And sometimes not.
I have seen batch jobs complete 4 times faster on old servers with
4 cpu's than modern servers with 16. Why? NUMA and multiplexing. There is an
overhead when accessing memory from foreign NUMA nodes (discussed in more detail below)
Modern processors work most effectivly when they are allocated a
similar kind of workload, one where they can easily utilise and reuse the Level
1/2 & 3 cache. Each time a new task is allocated to its processor, it has to
reload its cache. It can also be the case (and often is) that the
processor may not be available to give full attention to the process in
hand.
The more you get the more you want
I have often seen with consolidation projects that
developers quickly forget about being efficient with code. With the advent of
new hardware, all the abysmal code that runs like a dog gets forgotton. For a
while the problems go away but instead of being contained on its own server, the
problem has infected other database with its contention for resources. Thats why
I like CPU affinity because I can easily silo off performance issues and again
force developers to tune application issues as they arise.
Level 2 & 3 Cache
Careful examination of TPC benchmarks will give you a real insight into the
effects of Level 2 and 3 Cache. In Essence this cache give ultra fast memory
access to the processor to be used for the temporary storing of calculations and
lookup data. A larger cache in this area will often give better results than the
use of faster processors for Sql Server Processing. Modern HP blade servers have
even sacrificed Level 2 Caches for the sake of much larger Level 3 caches. I
believe this is worthwhile sacrifice and worth considering when looking for the
ultimate bang for your buck.
Non Uniform Memory Access (NUMA)
In most modern hardware now NUMA is enabled. The subject of NUMA
is quite large but in short, NUMA governs the way in which memory banks
are shared between groups of cores. Memory access between NUMA nodes
is quite slow so the advantage of having Sql Processes access memory on local
NUMA nodes is quite big. Using CPU affinity to bind Sql Instances to
a specific or group of NUMA nodes increases the possibility that the memory
will be local. Therefore increasing performance.
When configuring CPU affinity you should always have in mind, which NUMA
nodes relate to which CPU's.
Hyperthreading
One of my clients today asked me recently whether it was
good to use hyperthreading in Sql 2008 release 2 and while my initial answer was
no I decided to a little research. Some people I spoke to mentioned it would not
be an issue with modern chipsets and others veered away from the subject. As
usual with Sql Server the answer came back it depends..... Ultimatley without
any compelling reason to use hyperthreading my advice to anyone would be to
avoid.
UPDATE
See this blog
post about my latest opinion on this issue:
@blakmk
More Posts
Next page »