开发者

optimizing a view for hierarchical data

开发者 https://www.devze.com 2023-01-09 21:22 出处:网络
I currently have a query that contains a self-join to query all the direct and indirect managers of an employee from a table that stores company organization info using the nested sets model.In this S

I currently have a query that contains a self-join to query all the direct and indirect managers of an employee from a table that stores company organization info using the nested sets model. In this SQL notation, numbers preceded by a colon (e.g. :1) are variables:

select parent.empid, parent.depth from RelationshipMgr as node join
RelationshipMgr as parent on node.lft between parent.lft and parent.rgt
and node.empid = :1 order by parent.lft

I can trivially return only the ID of a manager n levels above the employee by adding parent.depth = node.depth - :2 to either the join condition or a where clause (side question: which is faster?).

The problem: I'm trying to turn this query into a view, and I'm not having much luck. The problem is that most or all of my variables are in the join condition of my query. My current best plan is to break those parts out into columns which I can then use the where clause on when I query the view, for example this:

select node.EmpID, parent.empid as MgrID, parent.depth as MgrDepth,
node.depth - parent.depth as MgrRelativeAltitude from RelationshipMgr as node
join RelationshipMgr as parent on node.lft between parent.lft and parent.rgt

You can see I've had to invent the MgrRelativeAltitude column to be able to find the ID of a manager n levels above the employee, but that is hardly the biggest problem. I worry that this will have serious performance problems, since SQL Server appears to do the full join as specified by the join conditions, and then filter it by the where clause, rather than intelligently using the where clause to limit the join. Is there a better way to create the view? Should I leave this as a query and forget about making a view? Would I gain anything by making it a stored procedure instead of a view?

And please don't say "premature optimization is evil"... it's not premature. The implementation I'm replacing used something like a bastardized adjacency list that had a record relating an employee to ever one of his direct and indirect managers... worst case O(n^2) reco开发者_开发知识库rds, and predictably ran into serious performance problems when we had more than about 300000 employees in the hierarchy. My new nested-sets implementation will alleviate those performance problems, except for this one query... if you do a select * on the proposed view, the results will be nearly identical to the old table I'm trying to replace, and that concerns me very much.


You are trying to determine the hierarchical relationship of non-adjacent nodes. As you've found, this is a relatively expensive run time calculation, view or regular query. Instead, if frequently run, I'd suggest creating what's known as a bridge table - either as a real table updated via trigger or as a indexed view in SQL Server 2005+ (albeit haven't tried the indexed view approach). That noted, nested set provides superior read times compared to adjacency list anyway.

The trade off is a table with significantly more rows than the source because it effectively represents relationships between all nodes that slows down writes as it updates any time nodes are added, deleted, or parent ids change. In return you can index it and achieve fast retrieval times. An optimization, if updating the bridge proves a bottleneck is accessing this through a stored procedure where the bridge serves as a cache for frequently run combinations of inputs, but calculates infrequent cases at run-time. You'll need to evaluate read vs. write frequency of the underlying node table to make that determination.

An overview of options for representing hierarchical data in a RDBMS is available here.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号