November 2006 - Posts

Well, a big DOH from me, the event is actually next week - i'm just wishing away my time mentioning December :).

http://sqlblogcasts.com/controlpanel/blogs/posteditor.aspx?SelectedNavItem=Posts&sectionid=3&postid=1401

Tony

 

Last chance to register for this evening event organised by the SQL Server User Group in partnership with Chris Webb.

To register (essential) please visit http://sqlserverfaq.com?eid=83 to register.

Registration is at 5.30, evening will commence at 6pm and finish 9pm.

6pm - 6.15pm
Introduction, news and gossip.

6.15pm - 7pm
Visualising Information with Microsoft Visio 2007
David Parker (http://www.bvisual.net/)
Visio is a unique data visualization application, and the 2007 edition adds three new important data features that provide brand new data links, data graphics and pivot diagrams.  These capabilities, along with the existing and often under-exploited tools, firmly position Visio as the foremost practical, easy to use visual information system that can be applied to a multitude of business scenarios. Microsoft Visio has been able to link ODBC data sources for many years via the Database Wizard, but this has limitations such as only a single data link per shape, and no support for stored procedures.  Microsoft Visio 2007 overcomes both of these with the new Link Data programmable API, making it simple for end users and developers to create data rich diagrams that can be refreshed simply. In addition, the new Data Graphic features enable you to enhance linked shapes with icons, data bars, text callouts and color by value. The new Pivot Diagrams can be linked to data sources, including SQL Server Analysis Services, to provide interactive drill-downs of data diagrams, allowing the user to visualise, analyse, and present aggregate information in a new appealing manner. I'll demonstrate the new features...and show how these data links can be made.

7pm - 7.45pm
Using Microsoft BI Technology with SAP
Sanjay Nayyar (IMGroup)

This session covers the various approaches to using Microsoft BI technology to provide reporting capabilities in a SAP environment.  Ranging from using Reporting Services in conjunction with SAP BW to loading data into SSAS from SAP ERP systems.

7.45 - 8pm
Break

8pm - 9pm
MOPPS 2007 – what’s it all about?
Andrew Sadler - Proclarity/Microsoft
Come to find out from Microsoft Technical Solutions Professional, Andrew Sadler, about the past, present and future of the end to end Microsoft BI story.  Featuring such highlights along the way as ProClarity, BSM, Biz#, PerformancePoint, etc…

For DDD4 we are organising Grok Talk and Park Bench lunch time activities at which we hope to encourage delegates to participate. You can help!

The Grok Talks are open to anyone, a wiki has been setup where people can put down their name and the talk and/or suggest topics for the Park Bench. If you are attending DDD4 we would like to invite you to submit a Grok Talk. As we have had a low response so far we would appreciate it if you could blog about this opportunity or mention it to anyone you know who might be interested. As a reward on the day a MSDN subscription will be given for the best Grok Talk.

Wiki link - http://www.thehughpage.com/DDD4LunchTime

DDD link - http://www.developerday.co.uk/ddd/default.asp

 

SQL Server 2005 Standard edition x64 is not memory limited, the only limitation is the OS version you run it on.

There is some confusion (perhaps its just me) on the Microsoft site under the windowsserver bit; essentially it says that Windows 2003 Standard R2 can only use up to 4GB of RAM, of course, this is an old page and hasn't been updated to reflect there are two flavours of Windows 2003 R2 Standard Server - x32 and x64. If you read a more recent page it correctly gives the hardware limitations of various editions and flavours and states Windows 2003 R2 Standard Server x64 has a memory limit of 32GB.

Anyway, to summarise; Windows 2003 R2 Standard Server x32 has a maximum of 4GB of RAM, Windows 2003 R2 Standard Server x64 has a maxium of 32GB of RAM.

Here you go...

Windows 2003 R2 Standard Server x32 and SQL Server 2005 Standard x32 on Windows - limit 4GB
Windows 2003 R2 Standard Server x64 and SQL Server 2005 Standard x64 on Windows - limit 32GB
If you ask for Windows 2003 R2 Server then you'll get the x32 and be limited to 4GB of RAM

I'd be interested to know (if anybody has the information or tried it) if SQL Server 2005 Standard x32 on x64 will use 32GB of RAM via AWE - interesting; mind you, not sure why you'd want to do that other than you bought the wrong edition or upgrade the the OS from x32 to x64 but not SQL 2005 x32 to x64.

Anthony Brown talks about the new custom reports in management studio that is a new feature in the SQL Server 2005 SP2 CTP.

Colin Leversuch Roberts talks about Transaction Replication (dealing with IDENTITY) in this part 3 of a multi-part series, he also talks about the best practices when using defaults on columns.

Neil Hambly talks about replication too - Alternate Trigger Action.

Jasper Smith who runs www.sqldbatips.com talks about custom reports in SP2 (Database Permissions and Taskpad View).

Please visit these guys, and we welcome new blogs - especially anybody from the UK; I want www.SQLBlogCasts.com and www.SQLServerFAQ.com to be a centre of excellance for UK SQL Server skills.

Yes, I'm a gadget junky; but I do need to speed for testing and development work.

I run two home built servers and one desktop running x64 windows 2003 server here in my office, why? Well, it allows me to truely play with stuff like database mirroring, iSCSI etc.. and it also gives me some resilience when things go wrong.

Anyway, I installed my new gigabit nic's today, £13 quid each (DLink DGE 530T) and my 8 port gigabit switch (DLINK DGS-1008D) for £48 - I didn't have to change any cables either which is good because putting those dam cat5 connectors on is very frustrating.

If anybody 5 years ago said to me that I'd be running a gigabit network at home for £74 I'd have laughed at them, hey, don't you just love the computing industry! You can get high performance at commodity prices now.... Perhaps I'll run in fibre to see what difference there is in latency compared with my current copper... Perhaps I better thing about upgrading my disk subsystems now too because the network appears to be faster than the disk! In my tests I was writing at 45MBytes per second but only utilising 25% of the bandwidth - hell, grand or what hee hee hee....Big Smile

I’ve shown you the NOLOCK (READ UNCOMMITTED) timebomb, now let’s look at the READ COMMITTED timebomb and how we get back inconsistent results and why; this entry came as a result of some confusion feedback from my original NOLOCK article but also from some good conversations I had with Gianluca Hotz and Kresimir Radosevic on the ATE stand at IT Forum 2006 in Barcelona.

 

To recap, the READ UNCOMMITTED isolation level allows you to read uncommitted data without you blocking, when a row moves because of a page split, column update etc… then you may well end up counting the same row multiple times or not at all.

 

READ COMMITTED gives you a consistent view of the data, specifically the committed (or last committed in case of READ_COMMITTED_SNAPSHOT) but it isn’t a point in time view – if you use READ COMMITTED and a page you are trying to read is locked then you’ll block, if you use READ_COMMITTED_SNAPSHOT then you won’t (you get the last committed value).

 

Here’s the crunch and here’s where I didn’t make myself clear in the last post because of the perspective I was talking at, even with READ COMMITTED there is a possibility of getting missing or additional counts (or aggregations) purely because of the nature of the isolation level.

 

Let’s create our test stuff…

 

create table bigtable (

    id  int not null,

    somedata char(2048) not null

)

go

set nocount on

declare @i int

set @i = 1

 

while @i <= 100000

begin

    insert bigtable( id, somedata ) values( @i, 'tony' )

    set @i = @i + 1

end

 

insert bigtable( id, somedata ) select id + 100000, somedata from bigtable

go

 

alter table bigtable add constraint pk_bigtable primary key clustered( id )

go

 

Let’s go through a number of tests, usual stuff – two connections required…

 

Run this in connection 1, I am using dropcleanbuffers to slow the query down so it gives me time to run the second update query before this one finishes.

 

dbcc dropcleanbuffers

 

select count(*) from bigtable

 

Run this in connection 2 – be quick, it needs to run and complete while the count(*) query is running.

 

update bigtable

    set id = 500000

where id = 1

 

The COUNT(*) gives the result 200,001 even though a) there are only 200,000 rows in the table and b) we are using READ COMMITTED.

 

Let’s do the same but reverse the update.

 

Run this in connection 1, I am using dropcleanbuffers to slow the query down so it gives me time to run the second update query before this one finishes.

 

dbcc dropcleanbuffers

 

select count(*) from bigtable

 

Run this in connection 2 – be quick, it needs to run and complete while the count(*) query is running.

 

update bigtable

    set id = 1

where id = 500000

 

The COUNT(*) gives the result 199,999 even though a) there are 200,000 rows in the table and b) we are using READ COMMITTED.

 

What’s going on and how does this differ from the READ UNCOMMITTED I talked about in my other blog entry?

 

The UPDATE is doing an Index Seek as part of the Clustered Index Update so the number of pages read to satisfy the query is only 6 logical reads in my own test.

 

The COUNT(*) query is doing a Clustered Index Scan and causes 66,846 logical page reads.

 

Imagine the table is on pages 1 through to page 66,846 and here is what happens with READ COMMITTED (and READ_COMMITTED_SNAPSHOT).

 

COUNT(*) query starts reading from page 1, then incrementally page by page through to 66,846; as it traverses through the table it acquires a Shared Intent lock on the table itself but for each page it acquires a shared lock but the locks are acquired and released as we traverse through the table - they are not held for the entire duration of the statement.

 

Say we have read pages 1 to 5000 and you now run the update statement on id=1, that is on page 1 which no longer has a shared lock on it, so, we can and do update that row – again, because the update is using an index seek and only using 6 pages and the COUNT(*) is doing its lock acquire and release in the middle of the table we are allowed and do essentially move the row from page 1 to page 66,846. This is why we count the row again.

 

There are many ways to see the locks being acquired and released, sp_lock or if you are running SQL Server 2005 use the query below:-

 

select *

from sys.dm_tran_locks

where resource_database_id = db_id()

 

Another method is to using SQL Server profiler, you’ll need the @@SPID for the connections you want to track otherwise too much info comes out, you’ll also want to limit the COUNT(*) to say 5000 rows (shown below). The events to capture are Lock:Acquired and Lock:Released; just capture all the columns and don’t forget to limit it for the SPIDS!!!

 

select count(*) from ( select top 5000 * from bigtable ) as c

 

READ_COMMITTED_SNAPSHOT does nothing for you here for this particular problem, READ_COMMITTED_SNAPSHOT prevents you being blocked by writers that have pages locked so the COUNT(*) won’t be effected by the UPDATE if for instance you where actually doing more in the UPDATE transaction instead of milliseconds of work and one row updated that our test does.

 

If you are after an absolute accurate answer we need to consider other isolation modes, READ COMMITTED just gives you a consistent view in terms of committed data.

 

Our problem here is the lock acquire release behaviour as we traverse the table, what we really need is to keep locks acquired so to prevent data we have read from being updated, we can do that using HOLDLOCK or using the REPEATABLE READ isolation level. Again, you can witness this behaviour through SQL Server Profiler as mentioned earlier.

 

The big (big) problem with this approaches is that quickly gives rise to deadlocks and very quickly if your queries are reporting type, basically anything that’s going to cause a lot of page reads because you are keeping the locks as you go. Let’s take our example, let’s see what effect two updates in a transaction has on our process

 

In connection 1

 

begin tran

 

update bigtable

    set id = 500001

where id = 200000

 

In connection 2

 

set transaction isolation level read committed

 

dbcc dropcleanbuffers

 

select count(*) from bigtable with ( HOLDLOCK )

 

We have 266,680 locks acquired on our bigtable table and the connection is blocked, back in connection 1

 

update bigtable

    set id = 500000

where id = 1

 

rollback

 

Connection 2 deadlocked…

 

Msg 1205, Level 13, State 51, Line 3

Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

When modifying our test to use the SERIALISATION isolation mode we get a shed load of locks, well, 100,389 in my test.

 

In connection 1

 

begin tran

 

update bigtable

    set id = 500001

where id = 200000

 

At this point there are only a few things locked

 

DB                     S        GRANT

PAG    1:100266        IX       GRANT

TAB                    IX       GRANT

KEY    (21002849ce51)  X        GRANT

KEY    (40004eb85a66)  X        GRANT

 

In connection 2

 

set transaction isolation level serializable

 

dbcc dropcleanbuffers

 

select count(*) from bigtable with ( HOLDLOCK )

 

We now have 100,389 locks being held and connection 2 is blocking, back in connection 1

 

update bigtable

    set id = 500000

where id = 1

 

rollback

 

Connection 2 deadlocked…

 

Msg 1205, Level 13, State 51, Line 3

Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

It gets worse, each connection running the query with the SERIALIZABLE or HODLOCK will generate that number of locks – so if you have lots of connections querying using those techniques the locking resource will become noticeable – no lock escalation occurs.

 

I suppose that leads us to the new SNAPSHOT isolation level in SQL Server 2005, we must enable the database to use this mode:-

 

alter database testlocking set allow_snapshot_isolation on

 

Let’s run up our test, first the update in connection one…

 

begin tran

 

update bigtable

    set id = 500001

where id = 200000

 

Again we have the same locks in place because of the update; now, run the query so it uses the snapshot isolation level, but capture the locks using Profiler…

 

set transaction isolation level snapshot

 

dbcc dropcleanbuffers

 

select count(*) from bigtable

 

In profiler we get a single lock for the query – Sch-S which is a system schema lock to give schema stability rather than the normal data lock; the query does not block and we get the correct result of 200,000 rows; even running it so the COUNT(*) kicks off first and the update that moves id from 1 to 500,000 (start of table to end) still gives us the correct row count and does not block.

 

Basically when using snapshot isolation we get the state of the data when we start our transaction, not the transaction in terms of BEGIN TRAN but the transaction in terms of SELECT COUNT(*) statement transaction – remember each statement is done in a transaction. The data is snapshotted (mmm – perhaps that isn’t a word lol) when we execute our query and if we are inside a BEGIN TRAN then from that point forward is the ‘version’ we see.

 

Anyway, going to summarise now; perhaps I’ll post some stuff on the snapshot isolation when I get time; in a nutshell though – SQL Server copies any change into the version store in tempdb so it keeps a consistent point in time (of the transaction) snapshot of your data until you commit or rollback the transaction.

 

To summarise, READ UNCOMMITTED does not give a consistent view of the database – its trash, you can get double counts, missing rows – the works; you do have blocking so you can never read an uncommitted value, READ COMMITTED and READ COMMITTED SNAPSHOT again does not give a point in time view of your data, it uses a sliding window as your data is read so your counts can again be wrong, rows missing, counted more than once, HOLDLOCK and REPEATABLE READ will give you a point in time consistent view however you will be very prone to deadlocks, the only way to get a real point in time consistent view without all the deadlocking is to use the SNAPSHOT isolation.

 

In a real system blocking may be enough to give you correct results – I cannot tell you for sure because each schema and access patterns are different, but to absolutely make sure of a point in time correct result then use SNAPSHOT isolation which only really works in SQL Server 2005 and above.

That grab your attention? I've still got two invitations to give away and they must be redeemed by the end of December.

Lets re-focus, that £lotsK worth of software and support that you will receive for a year.

What do you need to do? I want people to contribute technical information, hints and tips or advice to the SQL Server community via the UK SQL Server User Group sister site www.sqlblogcasts.com; the 2 winners will be judged by myself and a couple of other SQL people as to a) quality of the content, b) usefulness and c) effort put in (so no copying from someone else because we'll know) - it doesn't need to be miles of writing nor in depth technical study, I'm after novice right through to expert stuff; database design, SQL workings, BI - anything database related.

At the end of this month (31st November) we will annouce two winners.

What to do in order to participate:-

1.   Register as a member on www.sqlblogcasts.com and write your articles as blog entries

2.   Put a link to your blog entries as comments to this article, this will automatically send me an email so I know when content has been posted.

3.   On 1st December I will post back the competition winners.

4.   Your content MUST be original work and it MUST reside on www.sqlblogcasts.com - the copyright remains with YOU and I don't care where else its posted.

Notes: In no way am I responsible for the prizes etc.... these are invitations to msdn subscriptions given to me by Microsoft as part of my MVP contribution benefits.

I'm in Barcelona all next week at IT-Forum, luckily it wasn't this week as I've been full of flu.

If you want to come say hello or even better bring a SQL Server problem or come task me for some SQL Server advice then I'm working on the SQL Server ATE booth on these dates and times...

Wednesday, 10am -> 5.15pm; flippin heck all day - I'll need a beer after that!
Thursday, 1pm -> 4.15pm

Tony.

Using NOLOCK or READ UNCOMMITTED isolation (that’s what NOLOCK is) gives us a consistency problem. This entry came out of a conversation I had with Itzik Ben Gan when he was over a couple of weeks ago; essentially when you are using NOLOCK you can double read or completely miss rows.

 

The set up script:-

 

create table TestConsistency (

    id  int not null identity,

    filler char(8000) not null default( 'tony rogerson' ),

    somerandomness uniqueidentifier not null default( newid() )

 

)

go

 

set nocount on

declare @i int

set @i = 1

 

while @i <= 100000

begin

    insert TestConsistency ( filler ) default values

    set @i = @i + 1

end

go

 

sp_spaceused TestConsistency

go

 

create unique clustered index clidx on TestConsistency( somerandomness )

 

Now we have our table we can do some testing; note I’ve made the example extreme just to highlight the behaviour is there, reproducing this type of thing on a real system is difficult but I want to make you aware it can happen.

 

Run the monitoring script below in Connection A, it basically keeps checking the row count as reported and prints out when there is a difference – we aren’t inserting nor deleting rows so logical it should always report 100,000 rows in the table.

 

declare @rows_in_table int

declare @rows_now int

declare @shows tinyint

 

set @shows = 0

set @rows_in_table = ( select count(*)

                       from TestConsistency )

 

while 1 = 1

begin

    waitfor delay '00:00:00.300'

    set @rows_now = ( select count(*)

                      from TestConsistency WITH ( NOLOCK, INDEX = clidx ) )

    if @rows_in_table <> @rows_now

    begin

        print 'COUNT(*) produced ' + cast( @rows_now as varchar(10) ) + ' which is a difference of ' + cast( @rows_now - @rows_in_table as varchar(10) )

        set @shows = @shows + 1

        if @shows >= 10

            break

    end

end

 

 

In a separate connection we need to simulate update activity, this particular activity simulates the situation where you are updating columns that will cause the row to move or cause a page split – hence I’ve used NEWID() which is extreme J.

 

update TestConsistency

    set somerandomness = newid()

 

Wait a while and you will get some results, on my box here I got the following output:-

 

COUNT(*) produced 99996 which is a difference of -4

COUNT(*) produced 99984 which is a difference of -16

COUNT(*) produced 99990 which is a difference of -10

COUNT(*) produced 99987 which is a difference of -13

COUNT(*) produced 99995 which is a difference of -5

COUNT(*) produced 99989 which is a difference of -11

COUNT(*) produced 100012 which is a difference of 12

COUNT(*) produced 99981 which is a difference of -19

COUNT(*) produced 99982 which is a difference of -18

COUNT(*) produced 99980 which is a difference of -20

 

Oh dear, look at that! We are missing or sometimes multi-counting rows. Imagine if I was aggregating a trade value or something – you’d have double, triple or even ‘x’led counted the rows – not good at all.

 

When you use NOLOCK or READ UNCOMMITTED you are telling SQL Server you are not interested in consistency and just want the data; I know most people use NOLOCK to get round the writer blocking reader problem, hell, I’ve told people to use it myself in the past, and, most of the time you will be alright, but that behaviour is there waiting to cause you a problem.

 

In SQL Server 2005 we have READ_COMMITTED_SNAPSHOT which gives you the last committed value so you get a consistent result from your query (no multi/missing counts) and writers don’t block readers.

 

In SQL Server 2000 well, there are no other options except tuning the queries and making sure you aren’t reading lots of pages to get the query result.

 

Further reading: http://www.sqlmag.com/Article/ArticleID/92888/sql_server_92888.html

 

Note (17th Nov 2006): I've done a more comprehensive entry on the problems with READ COMMITTED isolation and getting missing rows/additional rows there too - http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx