How do we go about calculating paths between two connecting nodes; say we have a table with two columns – FromNode and ToNode; our query has a starting node and a destination node and we need to calculate all possible paths between the start and destination and return them to the user. This comes from a question I answered on the Microsoft NNTP news groups.

Example

**Note, the script to build the table of nodes and data is attached at the bottom of this blog entry.

From                 To

100035      100046

100035      100327

100327      100046

etc…

The output needs to look like this, basically the relational node structure pairs flattened into an enumerated list (probably a CSV or XML in the real world).

p0          p1          p2          p3          p4          p5          edges

----------- ----------- ----------- ----------- ----------- ----------- -----------

100035      100046      -1          -1          -1          -1          1

100035      100327      100046      -1          -1          -1          2

100035      100334      100046      -1          -1          -1          2

100035      100351      100046      -1          -1          -1          2

100035      100036      100046      -1          -1          -1          2

100035      100065      100046      -1          -1          -1          2

100035      100179      100046      -1          -1          -1          2

100035      100182      100046      -1          -1          -1          2

100035      100076      100046      -1          -1          -1          2

100035      100078      100046      -1          -1          -1          2

100035      100094      100046      -1          -1          -1          2

100035      100125      100046      -1          -1          -1          2

100035      100150      100046      -1          -1          -1          2

100035      100158      100046      -1          -1          -1          2

100035      100267      100046      -1          -1          -1          2

100035      100372      100046      -1          -1          -1          2

100035      100036      100150      100046      -1          -1          3

100035      100036      100158      100046      -1          -1          3

100035      100036      100125      100046      -1          -1          3

So, how do we go about doing this in SQL Server? SQL Server 2005 introduced a new operator on the FROM clause – the APPLY operator; this allows us to take a row from the left table joining and passing a value into a table value function, the table value function runs and returns a recordset for each row passed in; you have CROSS and OUTER functionality, the CROSS is a bit confusing – to me its more like an INNER JOIN in that no rows are produced if the table valued function doesn’t return anything whereas the OUTER you always get a row from the table on the left hand side of the join.

What does that mean in English? Lets look at some examples.

Take our table valued function that really does the biz for us…

CREATE FUNCTION tvfn_get_nodes_for(

@FromNode as int,

@StartAtNode int,

@StopAtNode int )

RETURNS TABLE

AS

RETURN(

SELECT ToNode

FROM Paths

WHERE FromNode = @FromNode

AND FromNode <> @StopAtNode

AND ToNode   <> @StartAtNode

)

GO

When we run our table valued function in isolation we get 873 rows…

SELECT *

FROM tvfn_get_nodes_for( 100035, 100035, 100046 )

So, that’s the nodes that 100035 is connected to, we need to do it recursively for each path through the tree, we can use the OUTER APPLY functionality to do that, so take the following example.

declare @nodes table (

start_node int not null

)

insert @nodes values( 100036 )

insert @nodes values( 100041 )

insert @nodes values( 100042 )

insert @nodes values( 100043 )

insert @nodes values( 100044 )

SELECT n.start_node, count(*)

FROM @nodes n

OUTER APPLY tvfn_get_nodes_for( n.start_node, 100035, 100046 )

GROUP BY n.start_node

We are driving the table valued function from the @nodes table variable, the table valued function will execute 5 times, once for 100036, then 100041 etc… The output is shown below…

start_node  rows

----------- -----------

100036      738

100041      5

100042      25

100043      20

100044      24

(5 row(s) affected)

Cool or what! The biggest use I have for this CROSS APPLY stuff is for the dynamic management views to get at what SQL Server is doing – a lot of them have been implemented as table valued functions.

So that’s the guts of it, a real query that gives 5 node hops is shown below, increasing the number of hops is simple – you just add more OUTER APPLY’s and the other bits.

DECLARE @FromNode int,

@ToNode   int

SET @FromNode = 100035

SET @ToNode   = 100046

SELECT *,

edges = 5 - ( CASE WHEN p2 = -1 THEN 4 WHEN p3 = -1 THEN 3 WHEN p4 = -1 THEN 2 WHEN p5 = -1 THEN 1 ELSE 0 END )

FROM (

SELECT p0 = p1.FromNode,

p1 = COALESCE( p1.ToNode, -1 ),

p2 = COALESCE( p2.ToNode, -1 ),

p3 = COALESCE( p3.ToNode, -1 ),

p4 = COALESCE( p4.ToNode, -1 ),

p5 = COALESCE( p5.ToNode, -1 )

FROM paths AS p1

OUTER APPLY tvfn_get_nodes_for( p1.ToNode, @FromNode, @ToNode ) AS p2

OUTER APPLY tvfn_get_nodes_for( p2.ToNode, @FromNode, @ToNode ) AS p3

OUTER APPLY tvfn_get_nodes_for( p3.ToNode, @FromNode, @ToNode ) AS p4

OUTER APPLY tvfn_get_nodes_for( p4.ToNode, @FromNode, @ToNode ) AS p5

WHERE p1.FromNode = @FromNode

) AS d

WHERE ( p1 = @ToNode OR p2 = @ToNode OR p3 = @ToNode OR p4 = @ToNode OR p5 = @ToNode )

AND ( p1 NOT IN ( p2, p3, p4, p5 )

OR p1 = -1 )

AND ( p2 NOT IN ( p1, p3, p4, p5 )

OR p2 = -1 )

AND ( p3 NOT IN ( p1, p2, p4, p5 )

OR p3 = -1 )

AND ( p4 NOT IN ( p1, p2, p3, p5 )

OR p4 = -1 )

AND ( p5 NOT IN ( p1, p2, p3, p4 )

OR p5 = -1 )

ORDER BY edges

The usual recommendations apply, functions are usually resource intensive but in this situation I can’t honestly think of a better way of doing it without resorting to more linear approaches; of course – if the data was stored x.x.x.x. in the first place we wouldn’t have this problem.

Filed under: