May 2009 - Posts
Paul Randal has a great survey this week. You have to
decide on the ONE thing that is important when performance tuning.
It would have been good to have a second question to ask what your job role
is. I would expect a correlation between job role and the answer.
http://www.sqlskills.com/BLOGS/PAUL/post/Weekly-survey-whats-the-most-important-thing-when-performance-tuning.aspx
I won't say what I decided, I was torn. I think indexing is essential as you
can index out of many poor written queries but then again application code is
also important. As they say "you can't polish a turd", but if you don't have
enough memory then your performance won't be great either.
So which will you choose
-
Wrong, wrong, wrong.
This post was in response to a post in the forums "How
do I delete the log file?" http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/69c8ecc6-6414-43b1-8c08-21fdf2ca9c54
The transaction log for a SQL Server database is not a log like the ones many
applications produce that is just history of the stuff you've done in the
application, say for debugging. Whilst it is a record of the changes you make to
your data it is a critical element in how data is kept consistent in your
database database, i.e. not corrupt.
Lets start by going over a few basics. A SQL Server is made of tables,
the data for a table is stored on a page. These pages are stored in the data
files (mdf/ndf ).
When you make changes to rows in a table, the pages containing those rows are
read from disk and put into memory. The changes are recorded in the
transaction log and are then the pages changed to reflect the chanegs you've
made. The pages are only change IN MEMORY, not in the data files. If at this
point your server dies, your changes that are currently only in memory and not
in the data files will have been lost. However they are in the transaction log.
So when SQL Server starts a recovery process is run. This can be seen in the
errorlog. This process looks for transactions that are in the transaction log
that didn't get harded to the data file and it makes the changes to the data
file. So you don't loose your data.
There is an additional process that runs called the lazy writer, what this
does is free pages from cache when under memory pressure. This means that the
changes you have made to pages in memory may be hardened to the data files.
Great you might think thats what I want. However this can also happen in the
MIDDLE of a transaction. This means that you could have changes in your data
files that have yet to be committed. In this case if SQL Server crashes mid
transaction, when the recovery process runs it looks for changes that were made
and recorded in the transaction log and hardened to disk but for which the
transaction never completed. These changes are then rolled back thus leaving
your data files in a consistent state.
So even
if you don't need point in time recovery you still need your transaction
log.
The key thing is that if you don't need point in time recovery then you
should set your recovery model to simple. What this does is frees the space in
the transaction log of committed transactions and so your transaction log
shouldn't grow.
One note: If you have a single transaction that requires 10Gb of transaction
log space then using simple recovery won't help. For example, updating every row
in a very large table in one statement, this is one transaction and so the
transaction log needs to be big enough to hold the changes. If space on the
drive for the transaction log space is problem then you need to look at batching
up your changes into smaller transactions, that once committed can be truncated
from the transaction log. See my posts about DELETE and UPDATE with TOP http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2.aspx
-
To say the type system in SQL is lax is an not quite
correct, its actually lax, in consistent and very annoying.
The most common feature I come across is that of integers and decimals.
If I declare a variable as decimal(10,4) and assign it the value 10/4 you
could expect a number of things to happen
1. The code doesn't run because you are casting an integer to a decimal
2. The code runs and stores 2.5 in the variable
3. The code runs and does the calculation based on integers 10/4 and
realises truncation will occur (as the destination type is decimal) and raise a
warning
4. The code calculates the 10/4 as integers so results in an integer of 2 and
sticks it in the variable.
In a nice type safe system, i.e. c# one would get 1. The result of an
integer/integer is an integer and you can't store an integer in a decimal. This
is nice because to get the value stored you have to explicitly do a cast. This
makes you think and realise that you probably have to cast the 10 or the 4 to
get the required decimal value of 2.5 returned.
In TSQL however you get 4.
If you want to look at inconsistencies then look at ISNULL and COALESCE.
ISNULL returns the same type as the first parameter and COALESCE returns
the data type based on evaluating all the data types passed to
the function and evaluating the top datatype
declare @i int
declare @s varchar(100)
select ISNULL(@s,'100') + '100', ISNULL(@s, 100) + '100'
,COALESCE(@s,'100') + '100', COALESCE(@s, 100) + '100'
This is sort of where I got stuck today, I was using the Radians function to
use in a conversion from longitude and latitude to OS grid reference. This is a
shocking formula and one that I have javascript and C# versions but not a TSQL
one. I couldn't for the life of me figure out why the TSQL result was vastly
different.
So I debugged the code, thank you SQL 2008 (although I did find a nice
feature I will blog about later).
What I found was this line wasn't returning the same value in TSQL as in C#,
it was returning 0 in TSQL, and even I know 49 degrees isn't 0 radians
declare
@lat0 numeric(29,15)= radians(49)
Gosh I thought I had found bug in TSQL. Then I had a euraka moment, you're
probably already there. I changed the code to
declare
@lat0 numeric(29,15)= radians(49.0)
and it returned the correct value. On looking at BOL it does say that radians
returns the same data type as the value passed. Which to me sounds mad because
the most even for a value of 360 degrees the radians value is only 6 and so the
decimal places are very very important. Whats more any value of degrees
between 0 and 90 is < 1 and so if you only specify an integer for the
degrees, as in my case above, you will get 0.
Anyway lesson learnt. So from now on as well as always changing 10/2 to
10.0/2 I will be making sure any literals passed to functions like this will
also have .0 tagged on the end.
-
If you like to be on the bleeding edge and want to try
out the R2 release of SQL Server 2008 during the CTP cycle make sure you
register here
http://www.microsoft.com/sqlserver/2008/en/us/r2.aspx Spaces
are limited so get in quick
-
I earlier posted about using derived tables with UPDATE
and DELETE to be able to update or delete the TOP n rows (http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx).
In many large scale systems archiving is essential and so being able to
efficiently archive data is essential, but more often than not your selectivity
of what to archive is not based on the primary key.
Whats nice is that, if you have an index that supports this, you can still
use the same behaviour as demonstrated in the other article to efficiently
delete the rows you want. I have often had indexes on large tables just to
support the archiving process.
Take this example populated table.
drop table Orders
go
create table Orders (Id int primary key, PadingCol char
(100),
orderDate datetime, status int)
declare @i int
set @i=1
SET NOCOUNT ON
set statistics time
off
set statistics io off
while
(@i<100000)
begin
if
@@TRANCOUNT = 0
begin transaction
insert
into Orders values
(@i,'x', DATEADD(day,rand()* 1000,GETDATE()), CAST(rand()+.2 as int));
if @i
% 2000 =
0
commit transaction
set
@i=@i+1
end
if @@TRANCOUNT >
0
begin
transaction
go
create
index IX_Orders_status on t1Orders(status,
orderDate)
Note the index created on the status and OrderDate.
If we want to archive the oldest closed orders (status 0) you can do the
following
delete
orders
from
(select top (100) *
from orders
where status =
0
order by orderDate)
orders
This works because an index
is sorted by the keys, but in this situation the index keys are status and then orderDate.
Well because we are filtering on an exact key value the rest of the index values
for will be sorted by the second key which is the orderDate.
If you make the filter on status be multivalue i.e IN(0,1) the plan changes
to require a TOP N sort.

or with parallism

These both have very expensive sort operations
What happens if we change the index and only have OrderDate in the key
drop index
Orders.IX_Orders_Status
go
create
index IX_Orders_orderDate on Orders(orderDate)
include (status )
If we now do the delete we get the same plan, sort of.
The key thing is the index scan, this now has a predicate in it. Which does
the status filtering. but because the scan is in OrderDate order when it passes
the data to the TOP operator it is already sorted.

Whats interesting is that if you apply a filter on status
that is not a single value i.e. IN(0,1) you still get a very optimal plan. IF
and its a big if. The data you are look for is found early on in the scan. In
this scenario (if the data were real), the oldest orders are are likley to be
old and so will be found early on in the scan. If for instance you were looking for a specific
customer, the scan might have scan a lot of rows to find 100 that match. In
that case you might want to consider the previous index structure. This is highlighted if you run
the delete with the last index, you will find many pages read. The number will be much higher
than before because to delete from the clustered index it has to seek to
the row to delete the row and this results in pages being read. Whats more the nonclustered indexes
also need to be deleted.
-
The simplest way to delete the top X rows is use to use a
derived table.
delete
t1
from
(select top (10000) *
from t1 order by a) t1
This is one of the tricks I teach in my Advanced TSQL for Performance and
Scalabilty course
The SQLCAT team posted a solution to deleteing the TOP x rows from a table
using a view http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx.
The view isn't needed you can just use a derived table as above.
You can do similar things with update,
update
t1
set b
= 'y'
from
(select
top (100) b
from t1
order by a )
t1
If you want to know what was updated or deleted then you can use the OUTPUT
clause.
delete
t1
output
deleted.a
from
(select top (100) *
from t1 order by a) t1
update
t1
set b
= 'y'
OUTPUT
inserted.a
from
(select
top (100) *
from t1
order by a ) t1
-
Trevor Dwyer an ex SQL Server MVP has started blogging on
http://sqlblogcasts.com/blogs/sqlartist/default.aspx
His mind is full of all sorts of amazing stuff so I'm looking forward to
seeing what he is going to blog about.
-
The slides and demos
for Darren’s session can be found here http://www.sqlis.com/post/SSIS-Tips-Tricks-(Presentation).aspx
For the list of
events coming up go to http://sqlserverfaq.com
Here are some answers
to the Q&A asked at the start,
Replication
file sizes
From my
investigation the multiple files produced in the snapshot are based on the
statistics for the primary key column, base on rows. If your rows have differing
sizes i.e. contain different size documents, or your statistics are out of date
then you could end up with different size files. The numbers of rows will always
be different because the values used to segment the data are those in the
statistics and they might not fall on the exact boundary. Look at the output
from DBCC SHOW_STATISTICS (table,pk_constraint)
WMI
security
There is a
great section in MSDN on WMI remote access http://msdn.microsoft.com/en-us/library/aa389290(VS.85).aspx
As expected
there are a number of aspects including. WMI uses RPC, this uses port 135 which
is a common security flaw and used by many viruses including Nimda, CodeRed and
Slammer(http://us.trendmicro.com/us/threats/enterprise/glossary/n/network-viruses/index.php). Or DCOM permissions might not allow remote uses to start
the WMI service.
The section in
MSDN should help you through the issues.
A big thank
you to Redgate for sponsoring the event and I hope to see you again
-
Often when you paste TSQL code from Word, an email or off the web you often
find blank lines are inserted in the code.
I think the issue is due to parsing paragraphs and non pagaraph breaks i.e.
<p> and <br> They have different meaning in HTML/Word world where
<p> has extra space after it (normally). But with text there is no way to
specify any difference in text world to give the extra space after pargraphs
they stick in another carriage return.
There is a simple solution, you can get rid of these by doing a search and
replace
Search for \n:b*\n and replace \n and make
sure use regular expressions is checked
The :b* allows for the odd tab or space on the blank
lines
-
More Posts
Next page »