Wednesday, 15 February 2012

sub-tree performance - part 3

This is the third part of the series about performance of tree operations in SQL Server in which I show how the tests were performed and more interestingly what were the results.
Part 1 - the brief introduction to the problem
Part 2 - test environment setup with all four methods explained

The complete code is also available as a github gist here.

We have now the test table and four test user defined function, each implemented using different technique, namely: an old fashioned loop, CTE, Xml and HierarchyId. First let's have a look at the execution plans. Three functions (Loop, Xml and HierarchyId) have very similar and simple plan. Here is how it looks like for the loop function:

But the plan for function using CTE is completely different and looks like that:

For a batch of all four functions in question relative cost is as follow:
  • 1% GetTreeWithLoop()
  • 97% GetTreeWithCte()
  • 1% GetTreeWithXml()
  • 1% GetTreeWithHierarchyId()
But these are only estimated execution plans, and yet to be "put to the test". To compare real performance, we need a more quantitative approach. So, I selected 6 test nodes:
  • node 2 with sub-tree of 1 and max depth of 0
  • node 5 with sub-tree of 16 and max depth of 3
  • node 8 with sub-tree of 176 and max depht of 6
  • node 11 with sub-tree of 1792 and max depth of 9
  • node 14 with sub-tree of 8192 and max depth of 12
  • node 1 with sub-tree of 69632 and max depth of 14
and computed sub-tree count for each one of them 15 times with each of 4 functions. For each node and function I recorded the average server's CPU time needed to process the request.

Here are the results:

As you may see there wasn't much difference until around 1500 nodes. Beyond that point XML version's time goes through the roof and for that reason I excluded it from further computation of relative values.

From this graph you may clearly see that the old school Loop solution can still compete with the two more modern approaches as its relative performance to the others is stable regardless of the sub-tree size. More so, it seems to be exactly in the middle between the two. You may also see that CTE based solution seems to be fastest for small trees and HierarchyId for large ones with transition point around 1500 nodes and 8 levels. But you must not forget that in this test scenario depth grows with size as well, and in business scenarios we typically have only few levels.

So let's see how the performance compares for sub-trees of different sizes but equal depths.

It is worth noticing all the solutions have almost linear relation between size and time. Again XML method despite looking all right with small sub-trees goes through the roof at the other and of the graph. And again for that reason I excluded it from further computation of relative values.

Here you may clearly see that time-wise, out of the three CTE approach is the worst yet we are talking about 0.2-0.3 second difference to the best option at any size so it still may be preferable solution due to it's clean and compact syntax.

And how about Loop vs. HierarchyId? On free level sub-trees their performance is very close. with Loop having small advantage for sub-trees up to around 2500 nodes and HierarchyId on bigger ones. So is the overhead worth implementing HierarchyId? On such shallow trees probably not, but when you look at previous graphs it looks like the new type has a lot to offer when it comes to much deeper structures.

2005 FOR XML queries on MSDN
HierarchyId on MSDN
More about HierarchyId on Technet
Excelent article about comparing performance of queries on SQL Server

No comments:

Post a Comment