Referencing Parent nodes in XQuery is expensive

 Recently a post on a newsgroup caught my attention. "OPENXML vs XQuery Performance", it caught my attention because I did a bucket load of performance testing of OPENXML way back when for SQL Server 2000 XML Distilled and hadn't touched XQuery really.

What the poster found was that OPENXML was faster than XQuery for large documents but slower than small documents. This made some sense in that OPENXML is just a wrapper on the MSXML objects and so there is considerable overhead in just creating an XML document to process whatever the size, XQuery however does not have that overhead (to my knowledge).

I was however interested why XQuery might be slower for large documents. I thought the issue might be related to querying attributes from parents nodes as this is really expensive in OPENXML and thought the same might be true in XQuery. The query was something like this

FROM OPENXML (@idoc, '/root/customer[someattribute="somevalue"]/order',1)
        WITH (customerID  varchar(10)'../@customerId', 
              orderDate datetime, 
              orderId int)

 As you can see its the order nodes being selected but the customerId is being retrieved off the parent customer node. The simple XQuery would look something like this.

SELECT orders.node.value('../@customerId' ,'varchar(10)') as customerId
       orders.node.value('@orderDate' ,'datetime') as orderDate
       orders.node.value('@orderId' ,'int') as orderId
 FROM @xmlDoc.nodes('/root/customer[someattribute="somevalue"]/order') orders(node)

The solution to the performance issue with OPENXML was to read the data twice and then relate the two datasets together using the ids generated by the metaproperties @mp:id and @mp:parentid.

You don't have access to metaproperties in xQuery (not that I can find) so the same approach is messy. However there is a very neat solution using the new CROSS APPLY method.

Rather than getting the order nodes, you get the customer nodes. Once you have the customer node you then get the orders for that node. Something like

SELECT customer.node.value('@customerId' ,'varchar(10)') as customerId
       orders.node.value(
'@orderDate' ,'datetime') as orderDate
       orders.node.value(
'@orderId' ,'int') as orderId
 FROM @xmlDoc.nodes('/root/customer[someattribute="somevalue"]') customer(node)
 CROSS APPLY customer.node.nodes('order') orders(node)

XQuery is really powerful and provides numerous ways to solve your problems, some will perform well some may not. If you have a problem think about your other options. In the case of this post the improvement was from "takes for ever" to 2 seconds.



-
Published Tuesday, November 14, 2006 5:32 PM by simonsabin

Comments

No Comments