SimonS Blog on SQL Server Stuff
SQL Server 2008 UK Usergroup London Launch event

As part of the SQL Server Launch wave we are holding a usergroup meeting to celebrate the launch (not RTM).

We're going to run this as an open session so you'll be able to ask us to cover the areas you want us to cover.

We have loads of giveaways, including Vista, training vouchers, Technet subscriptions and SQL Server licenses.
To register for the event go to the UK SQL Usergroup site http://www.sqlserverfaq.com/?eid=114

Registration is at 5.30, evening will commence at 6pm and finish 9pm.

6pm - 6.30pm
Round Table Discussion
Update on what's been going on and is going on in the SQL Server space.
Bring your SQL problems and ask the audience, bounce ideas - anything related to SQL Server.

The topics to cover in next parts could be any of the following, Simon and Jasper will hope to answer and demo the features you want to know about.

TSQL improvements, new data types, changes to CLR, spatial data, hierarchies, service broker, changes to tools, SSIS improvements, Integrated Full Text, Sparse Columns, Filtered Indexes, XQuery changes, Compression, Change Data Capture, Change Tracking, Intellisense, Table Valued Parameters, Script Task in SSIS, Performance Data Collector, Reporting services

6.30pm – 7.30pm 
Part 1

7.30 – 7.50pm 
Break - Beer and Pizza

7.50pm - 9.00pm
Part 2

Speaker Bios
 
Simon Sabin  http://sqlblogcasts.com/blogs/simons Jasper Smith http://sqlblogcasts.com/blogs/sqldbatips
Simon is Director of his own data consultancy company Onarc Consulting that specialises in data related systems. He is also a Developer Skills Partner with SQLSkills. He has a particular expertise in the world of search, distributed architectures, business intelligence and application development. He has worked with SQL Server since 1998 and has always focused on high performance reliable systems.
Simon was awarded as an MVP in 2006. He founded the first free SQL Server conference in the UK SQLBits in 2007 along with other MVPs in the UK. He is a regular speaker at SQL Server events as well as writing his blog and writing for SQLServerCentral and Simple-Talk.com.
He has most recently been involved in developing the SQL Server 2008 training for Microsoft.
Jasper has been working with SQL Server for around 6 years and is currently a member of the Database Services team at Nationwide Building Society in the UK. He also writes articles and code for his website : http://www.sqldbatips.com
Posted by simonsabin | with no comments
Filed under: , , ,

Don't forget the London SQL Usergroup is meeting this Thurdsay (17th April).

The agenda is as follows,
 
6pm - 6.30pm
Round Table Discussion
Update on what's been going on and is going on in the SQL Server space.
Bring your SQL problems and ask the audience, bounce ideas - anything related to SQL Server.

6.30pm – 7.30pm  Will you still want me, when I'm 64? - Christian Bolton (COEO)
An in-depth look at the move toward 64-bit computing with SQL Server covering where and why it will help, what the caveats are and the most up to date best practices gathered from global SQL Server deployments.

7.30 – 7.50pm Break


7.50pm - 9.00pm
The cursors of SQLStein - Eric Alsop (IM Group)

Nothing divides the SQL community like a cursor, so why are they still around? In this session we are going to look at how to use cursors and when they are most applicable.

To register go here http://www.sqlserverfaq.com/?eid=110

Posted by simonsabin | 1 comment(s)

If you are using CTP6 of SQL Server 2008 and want to use Integrated Full Text and find it is not working. You might get timeouts or other general errors. It could be due to an account that iFTS uses.

iFTS does some stuff out of process for security and reliability in CTP6 the run this process under a local user account FDH$<instance name> by default.

However this accounts password may expire or even may be set to change on next logon (depending on domain settings etc).

If you find iFTS is not working then go into Local Users and Groups. Select the FDH$ account and check the settings. If you need to change the password there is a sp sp_fulltext_resetfdhostaccount which you can use to sync the password in SQL once you have changed it in the User manager.

 THIS ONLY APPLIES TO CTP 6 AND PRE CTP6. I've been told the behaviour is changing for future CTPs.

 

If you've got a complex data set to produce how should you do it. Adam Machanic has posted about an Anti Pattern (bad practice) for such a situation. http://sqlblog.com/blogs/adam_machanic/archive/2007/12/15/anti-patterns-and-malpractices-volume-1-tumbling-data.aspx

He talks baout the approach of building temp tables and then using the them to produce more temp tables and finally the result. I agree in many situations this is a bad practice however as with database normalisation there is a trade off which you need to decide upon.

I personally find that building individual statements can be good for a few reasons,

  • Supporting the code.
    Derived tables, pivots, case statements and sub queries are advanced SQL and so using lots of them means the person supporting the code has to be able to understand it.

  • Debuging code
    If you have an issue with the output having the code in one bunch generally means you have to disect the query to be able to debug it. Having it already broken down eases this issue and allows for assertions to be made in the code to ensure that the data is behaving as expected.

  • Performance
    Putting everything in one query will greatly increase the options for the optimiser. I have seen this result in very bad plans, especially with parallism involved.

  • Manageability
    Doing things in one means you have a single black box, it starts and finishes. Its very difficult to see what going on. Breaking it up into smaller chunks provides better options for seeing and managing progress.

  • Memory usage
    Doing it all in one go can result in you needing a large amount memory for the whole query. Splitting it up reduces the single memory requirements.

Now I have to say that these comments are generally when building large complex data sets for reports or exporting data. In addition there is also the ETL scenario. In that case you have a decision on the granularity of your processing whether you want row by row or set based. The last generally offers best performance but less control over errors.

So whilst I agree with Adam there is an element of it depends on the situation.



-
Posted by simonsabin | with no comments

 

I've been hitting F9 in outlook all day waiting for the email to tell me if I had been reawarded. It finally came just before the letters were worn off the F9 key :)

If you want to know more about the MVP programme you can read more at the MVP site http://mvp.support.microsoft.com/Default.aspx.



-
Posted by simonsabin | 1 comment(s)

 

Sorry about the scripting and layout problems over the past week or so. Since we upgraded to community server 2007 some of the custom stuff I have broke.

It should all be now fixed.

If you find any more problems let me know.



-
Posted by simonsabin | 1 comment(s)

 

In a recent post SQL Server 2008 - HierarchyId whats the point? I talk about the scenarios that the new hieararchyId can be applied to. Here we expand on this further and also provide a solution for moving hierarchies around.

The limitation is that you can't do something like this to change the parent of a node.

declare @newParent = (select node from Org where OrgId = 1234)

update Org

   set node.parent = @newparent

 where OrdId = 4567

Similarly if you want to move all the children nodes of one node to another node you can't in a simple way.

The reason is that the hierarchyId is a path enumeration model and if node 457 has children the children have the path to node 4567 embedded in them, i.e. they reference the 4567 node by path.

Look at this situation with nodes "/", "/1/" and "/1/1/", "/1/1/1/, "/2/" and "/2/1/", "/2/1/1", "/2/2/" and "/2/2/1" represent the root and two sets of a grand parent, parent and child nodes. See diagram.

We want to move the sub tree of  "/1/1/" to the new parent "/2/", by subtree I mean all the nodes below and including "/1/1/". Those with bold outlines below

If we change the parent of "/1/1/" to "/2/" using the reparent function then we will have a conflict in our hierarchy. As there will now be two nodes with the same path.

Whats more if we don't change the path of the nodes below "/1/1/"  then these nodes will reference an invalid path (an ancestor will be missing).

So we have to reparent all the nodes in the subtree. Unfortunately that makes the issue of nodes with the same path even worse. (See nodes with a red outline)

 

So what we have to do is find the last child of our new parent, in this example "/2/2" in order that we can move the sub tree to a new path that doesn't conflict with an existing one, "/2/3/"

As we are maintaining the subtree below the node we are moving we have to reparent the child nodes using this new parent record.

Nicely the reparent function on the hierarchyId type if you specify a old parent node that is itself it will just return the new parent i.e. just changing its own path. (Note the reparent is not a mutator so you have to assign the result of the reparent function). Reparent unlike its name suggests doesn't do anything but return a node with a path based on the inputs.

So thats the logic so how do you go about doing it in code. The following includes a few more nodes to show moving multiple trees in one go.To setup the tables and data use the script at the end.

Firstly lets find the maximum for the child we want to move 

declare @NewParent hierarchyId = '/3/'

declare @OldParent hierarchyId = '/1/'

 

--Find the max child node that belongs to the new parent node. The nodes being moved will be

--appended after those that already belong to the new parent

declare @Maxchild hierarchyId = (select max(node)

                                   from HTest

                                  where @NewParent.IsDescendant(node) = 1

                                    and node.GetLevel() = @NewParent.GetLevel() + 1)

You could use GetAncestor which is equivalent to the code above, I think the above is more readble.

Unfortunately there is no method to return the child index of a node, the reason we need this is that we are going to build the path directly and so need the child index so we can add values to it.

declare @ChildCount int = (select substring(left(@MaxChild.ToString(),LEN(@MaxChild.ToString()) - charindex('/',reverse(@MaxChild.ToString()),2) ),2,10))

So now the crunch, what we need to do is find all the nodes we are going to move. We will be moving two types of node. Top level nodes and child nodes, it is the top level nodes we that need to have there position calculated. The position of the children don't change. If in the example above the child "/1/1/1/" is still the 1st child of its parent the only difference is its parent is now "/2/3/".

We therefore need to calculate the position for these parents. This can be used by using a window aggregate. The dense_rank() allows us to return the same position for the parent as well as the children. To do this for the children we use the GetAncestor and GetLevel methods to find the parent node that is moving (it actually works for the parent as well so we can use the same code in the order by). This dense_rank returns an incrementing value starting at 1. We can then add that to the max child position calculated above. In conjunction with the new parent path we have the new path for the top nodes we are moving.

We can then use Reparent method to return the new paths of the nodes.

The following is a SELECT statement that returns the reparented nodes so you can see what is going on.

--Query to calculate the new path by using the reparent function along with a calculation to work out the correct position in

--the children of the new parent.

select  node.Reparent(oldNodeParent, @NewParent.ToString() + cast(@ChildCount + NewChildPos as varchar(10)) + '/').ToString(), *

from (

select node

     , dense_rank() over (order by node.GetAncestor(node.GetLevel() - @OldParent.GetLevel())) NewChildPos

     , node.GetAncestor(node.GetLevel() - @OldParent.GetLevel()).ToString() oldNodeParent

     , value, node.ToString() nodeText, node.GetAncestor(1).ToString() Parent -- these are for debugging aren't need to calculate the new path.

from HTest

where value like '1.1.%'

or value like '1.2.%') d

This shows how the above can be used in an update statement.

update HTest

  set node = d.node.Reparent(oldNodeParent, @NewParent.ToString() + cast(@ChildCount + NewChildPos as varchar(10)) + '/').ToString()

from (

select node

     , dense_rank() over (order by node.GetAncestor(node.GetLevel() - @OldParent.GetLevel())) NewChildPos

     , node.GetAncestor(node.GetLevel() - @OldParent.GetLevel()).ToString() oldNodeParent

from HTest

--This is the criteria for selecting the nodes to move.

where value like '1.1.%'

or value like '1.2.%'

) d

where d.node = HTest.node

To setup the tables and data use the following script

set nocount on

go

use test

go

drop table Htest

go

create table HTest (node hierarchyId, value varchar(10))

go

create unique index IX_HTest_node on HTest(Node)

go

declare @h1     hierarchyId = HierarchyID::GetRoot()

declare @h1_1   hierarchyId = @h1.GetDescendant(null,null)

declare @h1_2   hierarchyId = @h1.GetDescendant(@h1_1,null)

declare @h1_3   hierarchyId = @h1.GetDescendant(@h1_2,null)

declare @h1_4   hierarchyId = @h1.GetDescendant(@h1_3,null)

declare @h1_5   hierarchyId = @h1.GetDescendant(@h1_4,null)

declare @h1_1_1 hierarchyId = @h1_1.GetDescendant(null,null)

declare @h1_1_2 hierarchyId = @h1_1.GetDescendant(@h1_1_1,null)

declare @h1_1_2_1 hierarchyId = @h1_1_2.GetDescendant(null,null)

declare @h1_1_3 hierarchyId = @h1_1.GetDescendant(@h1_1_2,null)

declare @h1_2_1 hierarchyId = @h1_2.GetDescendant(null,null)

declare @h1_2_2 hierarchyId = @h1_2.GetDescendant(@h1_2_1,null)

declare @h1_2_3 hierarchyId = @h1_2.GetDescendant(@h1_2_2,null)

declare @h1_3_1 hierarchyId = @h1_3.GetDescendant(null,null)

declare @h1_3_2 hierarchyId = @h1_3.GetDescendant(@h1_3_1,null)

declare @h1_3_3 hierarchyId = @h1_3.GetDescendant(@h1_3_2,null)

 

insert into HTest values(@h1    ,'1')

insert into HTest values(@h1_1  ,'1.1')

insert into HTest values(@h1_2  ,'1.2')

insert into HTest values(@h1_3  ,'1.3')

insert into HTest values(@h1_4  ,'1.4')

insert into HTest values(@h1_5  ,'1.5')

insert into HTest values(@h1_1_1,'1.1.1')

insert into HTest values(@h1_1_2,'1.1.2')

insert into HTest values(@h1_1_2_1,'1.1.2.1')

insert into HTest values(@h1_1_3,'1.1.3')

insert into HTest values(@h1_2_1,'1.2.1')

insert into HTest values(@h1_2_2,'1.2.2')

insert into HTest values(@h1_2_3,'1.2.3')

insert into HTest values(@h1_3_1,'1.3.1')

insert into HTest values(@h1_3_2,'1.3.2')

insert into HTest values(@h1_3_3,'1.3.3')

 

go

select *, node.ToString() from HTest

 

 

 

 



-

 

I've recently been deliverying training sessions on SQL Server 2008 and doing launch events. The training covers all aspects of SQL Server development.

For most features I can provide a strong argument for the feature with the exception of one. HierarchyId.

HierarchyId is the new data type that allows you to store a hierarchy in a single column. This is using a path enumeration model, just like the path to a file on the file system.

The benefit of HierarchyId is that the storage structure is very effiecient it uses a thing called Ord Path.

With this efficient data type querying for the children of a parent performs very well.

However the one thing that just baffles me is the amendment of hierarchies and moving of nodes in a hierarchy. It baffles me because this I foresee to be one of the major requirements for a hierarchical data, yet the data type doesn't really support it.

Ok so you will see a method Reparent(). However this doesn't do anything to cater for existing data in your hierarchy. Essentially all the methods for the data type know about  only the instance of the type (i.e. column value or variable) that the function i being used with. It doesn't do anything to check the other rows in the table.

You will find that if you want to add a node to a parent. You have to find the last child of that parent and then add the new node to the parent but after this last child. This isn't difficult but cubersome code that you will have to repeat again and again. A bit like the standard code you need for a Service Broker activation procedure.

Whats more none of the functions are set based and so if you want to add two children you have to add one at a time because to add the second you need to add it after the first.

So all in all I'm a bit disappointed by the HierarchyId type I do hope they make it more functional in future releases.

If youre interested in set based Hierarchy Id solutions I have a few but they aren;t pretty. I will blog about them over the next few days.

 

  



-
Posted by simonsabin | 1 comment(s)
Filed under: ,

 

We are venturing into afternoon usergroup meetings for thos of you that can't make it in the evenings. We are fortunate to kick off the first afternoon meeting with Rob Farley on the 7th April. Fortunate as he lives in Australia and is only in the UK for a short period of time.

Rob Farley, a mentor/trainer with Solid Quality Learning, has been consulting in IT since completing a Computer Science degree with first class honours in 1997. He lives in Adelaide, where he heads up the Adelaide SQL Server User Group and sits on the South Australian Branch Executive Committee of the Australian Computer Society. He holds several Microsoft certifications including MCSD, MCDBA, MCTS, MCPD and MCITP. He is a Microsoft Certified Trainer and is a recipient of the Microsoft MVP Award for SQL Server. He blogs at http://msmvps.com/blogs/robfarley

For more details of what Robs going to be covering go to http://sqlserverfaq.com/?eid=113 



-
Posted by simonsabin | 1 comment(s)
Filed under:

 

James Rowland Jones is kindly organising the next Usergroup meeting in the London on the 17th April.

For the event details go to http://sqlserverfaq.com/?eid=110



-
Posted by simonsabin | with no comments

 

I'm currently on the "I like LINQ to SQL" side of the fence. I think it is very much a marmite question, you love it our you hate it. But importantly you often hate it even if you haven't tried it.

I've been digging around with LINQ recently and whilst it is very nice and easy to do things. This, as with many situations, can cause problems if you don't know what is being done under the hood.

Mike Taulty is font of knowledge when it comes to LINQ and Entity Framework and he recently posted about the use of stored procedures in LINQ. (LINQ to SQL, Stored Procs, Output Params, Readers Reading)

One thing you find out is that LINQ may not execute a query on the line of code you think it will, i.e.

var query = ctx.GetCustomersByCountry("UK");

Will not execute the procedure associated with the GetCustomersByCountry method. It will only get executed when you do something with query.

Whats more when the procedure returns a resultset and you then iterate over that resultset the results are read from the server one at a time. This means that if you do complex processing in your loop you can tie up resources and a connection to your database.

For more details look at Mike's blog post (LINQ to SQL, Stored Procs, Output Params, Readers Reading)

Whilst this isn't as bad as serverside cursors in the old ADO days it is still a gotcha you have to be careful of.



-
Posted by simonsabin | 3 comment(s)
Filed under: ,

 

I recently presented on using CLR in SQL at DevWeek and commented on how TSQL Udfs just don't perform. A common scenario people in the audience were using TSQL Udfs for was splitting delimited strings.

One guy asked me afterwards what the best way to do it was, he had addresses in one field he wanted to split into x columns. On the spot I cam up with the following with uses a number of well know tehniques.

This uses the nuber table solution to split the delimited string into rows and then uses pivot to along with row_number to position the string tokens into columns. It also uses a cross join to have the process work for each row in the table.

Note: this is for a num table starting at 0.

declare @m table (id int, d varchar(100))

insert into @m values (1,'simon,sabin,sql')

insert into @m values (2,'steve,Jones,data')

 

 

select p.*

from (

select id

     , substring(d, start+2, endPos-Start-2) token

     , row_number() over(partition by id

                         order by start) n

      from (

select id

     , d

     , n start

     , charindex(',',d,n+2) endPos

               from num

        cross join (select id

                         , ',' + d +',' d

                      from @m) m

              where n < len(d)-1

                and substring(d,n+1,1) = ',')d

      ) pvt

Pivot ( max(token)for n in ([1],[2],[3],[4]))p

The great thing here is it works for multiple rows in a table and so you can replace @m with your table of choice.

You have fixed the number of columns so if you want this to be dynamic then its more complex..

I was going to wait on posting this but thought it would follow on well from Madhivanan's post http://sqlblogcasts.com/blogs/madhivanan/archive/2008/03/15/splitting-csv-to-columns.aspx



-

 

Its all about Sparse columns and they are a great feature of Katmai

Imagine you have a products table, you will have attributes for some products that don't apply to others i.e. cup size applies to bras but not to socks.

There are a number of ways of modelling this each with downsides. With sparse columns you can have a column per attribute per product. The benefit is that you are following proper modelling by not mixing meaning of a column and you have strongly type data. The other aspect is that you save space because sparse columns are stored in a similar fashion to variable length fields. Thus if the value is NULL you will save space especially on fixed type columns.

I can imagine the select and insert limits haven't been increased because you also have the ability of having a column set which combines all the sparse columns into an XML document containing them. You are abe to select this (if you do a select * and you have this you will only get this not the individual columns) but you are also able to update and insert through it.

Aaron recently posted on the increase in the column limits, 30,000 sparse columns are now possible. (http://sqlblog.com/blogs/aaron_bertrand/archive/2008/03/11/increased-limits-for-columns-indexes-statistics.aspx) He also asks why. Well this is it.



-

One of the frequently asked questions in the newsgroups is how do you convert a string representation of a binary value to the equivalent binary value and the reverse.

The latter is often required to be able to print out the binary representation.

There have been a few undocumented functions and many home brew solutions.

In SQL Server 2008 CONVERT can now do it for you. Thanks to Itzik for pointing this out to me last night.

So you can now do the following

select convert(varchar(8), 0x023454, 1)

select convert(binary(3), '0x023454', 1)

Which is great.



-