Wednesday, 1 February 2012

sub-tree performance - part 1

Graph theory is omnipresent in computing and graph structures are very often used to model and solve problems. But despite being so theoretically common, graphs very often prove to be problematic. Not because they are inherently difficult, but seems so, because they are neglected and avoided whenever possible.

It is true that for many years it was hard to implement common tree functions in relational database systems but in 2005 MS SQL Server implemented Common Table Expressions (defined by SQL:1999 standard) which made truly recursive queries possible. In that same version XML data type has been added and later in version 2008 HierarchyId data type emerged, designed especially to make operations on trees easier.

Nowadays Transact-SQL gives us plenty of tools to deal with data structured as trees: sub-queries, recursive stored procedures, cursors, XML data type, temporary tables, CTE, HierarchyId data type; thus the only problem remaining is to pick the right one.

This is the first post out of a series of three based on an article I wrote on Experts-Exchange over two years ago. The time has gone by but the data and more importantly the findings are still relevant to SQL 2008 R2 as they were then for SQL 2008 and for that reason I decided to include it in my new blog.

In the next two posts I will setup a test environment and then attempt to demonstrate and compare four different approaches to get descendants of a node from a tree implemented as a self-referencing table in an SQL Server.

Part 2 - test environment setup with all four methods explained
Part 3 - testing, results and the conclusion

The complete code is also available as a github gist here.

2005 FOR XML queries on MSDN
HierarchyId on MSDN
More about HierarchyId on Technet
Excelent article about comparing performance of queries on SQL Server

No comments:

Post a Comment