您的位置:首页 > 数据库

More Trees & Hierarchies in SQL

2008-09-16 13:34 267 查看
Hierarchies are sometimes difficult to store in SQL tables...things like
trees, threaded forums, org charts and the like...and it's usually even harder
to retrieve the hierarchy once you do store it. Here's a method that's easy to
understand and maintain, and gives you the full hierarchy (or any piece of it)
very quickly and easily.

While XML handles hierarchical data quite well, relational SQL doesn't. There
are several different ways to model a hierarchical structure. The most common
and familiar is known as the adjacency model, and it usually works like this:

The table would look like this:

EmployeeIDNameBossID
1001Denis Eaton-HoggNULL
1002Bobbi Flekman1001
1003Ian Faith1002
1004David St. Hubbins1003
1005Nigel Tufnel1003
1006Derek Smalls1003
And the org chart/indented list looks like this:
Denis Eaton-Hogg
Bobbi Flekman
Ian Faith
David St. Hubbins
Nigel Tufnel
Derek Smalls
It's called the "adjacency" model because the parent (boss) data is stored in
the same row as the child (employee) data, in an adjacent column. It's a pretty
straightforward design that's easily understood by everyone...no deep relational
theory needed. You can find a person's boss easily, and you can find their
coworkers by querying the BossID column.

The trouble begins when you want to list several levels of a hierarchy. To
find the boss's boss, you would need to join the Employees table to itself, like
this: SELECT BigBoss.Name BigBoss, Boss.Name Boss, Employees.Name Employee
FROM Employees
INNER JOIN Employees AS Boss ON Employees.BossID=Boss.EmployeeID
INNER JOIN Employees BigBoss ON Boss.BossID=BigBoss.EmployeeIDAnd you'd
get the following:

BigBossBossEmployee
Denis Eaton-HoggBobbi FlekmanIan Faith
Bobbi FlekmanIan FaithDavid St. Hubbins
Bobbi FlekmanIan FaithNigel Tufnel
Bobbi FlekmanIan FaithDerek Smalls
For each level, you'd need to join the table to itself...not an attractive
option if you have 5 or more levels! It would be great if it could join itself
as many times as needed. This is called a recursive join, and though some
database products support it (Oracle has the CONNECT BY syntax) SQL Server is
not one of them.

If you look in Books Online under "expanding hierarchies" you'll find a
stored procedure that runs through an adjacency table to expand the hierarchy.
While it works, it's a procedural method that requires a stack (using a temp
table) and can take a while to run with large hierarchies. It also PRINTs out
the indented list, so you'd need to modify it to use ANOTHER temp table if you
wanted the results as a table/query.

If you've followed Joe Celko's columns or bought his books, he recommends the
nested set
model for representing trees in SQL (he's posted it on
SQL Team a few
times). It's very well detailed in the following articles, Part I, II, III, IV, and also in his book, SQL For
Smarties, and I recommend checking it out. It's very efficient and makes it
extremely easy to pull out trees/subtrees from the table.

However (you knew this was coming!) one of the issues I have with nested sets
is the complexity required to do relatively simple tasks, like adding, deleting,
or moving nodes in the tree. Even finding an employee's immediate supervisor or
subordinates requires 3 self-joins AND a subquery! Joe admits this shortcoming
in his book...and it's interesting that the solution ONLY appears in his book,
I've never seen him post it online.

Although there's a very seductive logic to nested sets, and it's easy to do
complicated tree operations with them, I find them less intuitive than the
adjacency model. It's harder for me to visualize a hierarchy or org chart with
them. You may be able to use them more easily than I can, but if you also find
them daunting, read on.

So how to represent a hierarchy, using adjacency, and avoiding recursion
wherever possible? It's pretty easy really...you build it and store it in the
table! (I've posted this method in this thread a
while back, and I'm elaborating on it here)

Here's the table definition for the Tree: CREATE TABLE Tree (
Node int NOT NULL IDENTITY(100, 1),
ParentNode int,
EmployeeID int NOT NULL,
Depth tinyint,
Lineage varchar(255) )
I'm keeping the Tree table separate for a few good reasons I'll discuss
later, but you could simply add the Depth and Lineage columns to the Employees
table above, and substitute BossID for ParentNode. (I also didn't really WANT to
use an identity column, but most people will anyway) The terms "node" and
"lineage" might seem unfamiliar, but I wanted to generalize them a little more
than "child", "parent" and "hierarchy".

Based on the Employees table, here's how the Tree will be filled:

NodeParentNodeEmployeeIDDepthLineage
100NULL1001NULLNULL
101NULL1002NULLNULL
102NULL1003NULLNULL
103NULL1004NULLNULL
104NULL1005NULLNULL
105NULL1006NULLNULL
The first thing to do is to populate the parent nodes, which is unecessary if
you use a single table, but it's easy to do in any case: UPDATE T SET T.ParentNode=P.Node
FROM Tree T
INNER JOIN Employees E ON T.Employe
4000
eID=E.EmployeeID
INNER JOIN Employees B ON E.BossID=B.EmployeeID
INNER JOIN Tree P ON B.EmployeeID=P.EmployeeID
And you'll get this:

NodeParentNodeEmployeeIDDepthLineage
100NULL1001NULLNULL
1011001002NULLNULL
1021011003NULLNULL
1031021004NULLNULL
1041021005NULLNULL
1051021006NULLNULL
This will only need to be done once, and afterwards you won't need to
maintain the BossID column in the Employees table. The next part is to find the
root node of the tree, also known as the top-level, or big boss man, etc. in an
org chart. That's the node that has no parent (Null), so we will start there and
set the Lineage column as the root: UPDATE Tree SET Lineage='/', Depth=0 WHERE ParentNode Is Null
Once that's done, we can then update the rows who are immediate children of
the root node: UPDATE T SET T.depth = P.Depth + 1,
T.Lineage = P.Lineage + Ltrim(Str(T.ParentNode,6,0)) + '/'
FROM Tree AS T
INNER JOIN Tree AS P ON (T.ParentNode=P.Node)
WHERE P.Depth>=0
AND P.Lineage Is Not Null
AND T.Depth Is NullIn fact, we can just put a loop on this to run through
all of the children/grandchildren etc. of the tree:
WHILE EXISTS (SELECT * FROM Tree WHERE Depth Is Null)
UPDATE T SET T.depth = P.Depth + 1,
T.Lineage = P.Lineage + Ltrim(Str(T.ParentNode,6,0)) + '/'
FROM Tree AS T
INNER JOIN Tree AS P ON (T.ParentNode=P.Node)
WHERE P.Depth>=0
AND P.Lineage Is Not Null
AND T.Depth Is Null

Don't worry about the loop, it runs once for each level in the hierarchy...10
loops for 10 levels or generations. For a corporation, 10 layers of management
is pretty deep; for a family tree, you could trace an American family back to
the Revolutionary War! And under normal circumstances, you'd also only have to
run this procedure once. The final result is:

NodeParentNodeEmployeeIDDepthLineage
100NULL10010/
10110010021/100/
10210110032/100/101/
10310210043/100/101/102/
10410210053/100/101/102/
10510210063/100/101/102/
You'll notice that for each node, the entire lineage back to the root is
stored. This means that finding someone's boss, or their boss' boss, doesn't
require any self-joins or recursion to create an indented list. In fact, it can
be accomplished with a single SELECT! SELECT Space(T.Depth*2) + E.Name AS Name
FROM Employees E
INNER JOIN Tree T ON E.EmployeeID=T.EmployeeID
ORDER BY T.Lineage + Ltrim(Str(T.Node,6,0))If you kept everything in one
table you would not even need the JOIN! The Depth column comes in handy for
performing the indent by using the Space() function. Using ORDER BY
Lineage...etc. will sort the org chart properly, with each subordinate nesting
underneath their parent. Sort order is maintained by Node values, and can be
changed simply by updating the node value. Inserting or deleting a new node does
not affect the rest of the tree, unlike the nested set model. The lineage column
can be maintained automatically using triggers, so moving or promoting a node is
a no-brainer.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息