开发者

How to find distinct root-nodes of newest nodes in trees (hold in closure table)?

开发者 https://www.devze.com 2023-02-09 23:13 出处:网络
I try to store tree of messages as closure table on MySQL. Mostly learned about this method from Bill Karwin\'s presentation Models for hierarchical data. Problem is: i want to find distinct 3 root no

I try to store tree of messages as closure table on MySQL. Mostly learned about this method from Bill Karwin's presentation Models for hierarchical data. Problem is: i want to find distinct 3 root nodes (=nodes witho开发者_JS百科ut ancestors), which have most recent nodes in their trees. NB! Even if some root-node has 10 newest node in it's subtree, it would count just once.

All nodes have their modification time, for simplicity lets say that node ids are representing also time of their last modification (but we can't use ids in querys as time), first is 1st and last is 17th.

1
 2
  8
   15
   16
  17
 7
3
 4
  5
 6
9
 12
10
 14
11
 13

In closure table i have 3 columns (ancestor, descendant, depth), so tree is presented like that:

| ancestor | descendant | depth |
+----------+------------+-------+
|        1 |          1 |     0 | 
|        1 |          2 |     1 | 
|        1 |          7 |     1 | 
|        1 |          8 |     2 | 
|        1 |         15 |     3 | 
|        1 |         16 |     3 | 
|        1 |         17 |     2 | 
|        2 |          2 |     0 | 
|        2 |          8 |     1 | 
|        2 |         15 |     2 | 
|        2 |         16 |     2 | 
|        2 |         17 |     1 | 
|        3 |          3 |     0 | 
|        3 |          4 |     1 | 
|        3 |          5 |     2 | 
|        3 |          6 |     1 | 
|        4 |          4 |     0 | 
|        4 |          5 |     1 | 
|        5 |          5 |     0 | 
|        6 |          6 |     0 | 
|        7 |          7 |     0 | 
|        8 |          8 |     0 | 
|        8 |         15 |     1 | 
|        8 |         16 |     1 | 
|        9 |          9 |     0 | 
|        9 |         12 |     1 | 
|       10 |         10 |     0 | 
|       10 |         14 |     1 | 
|       11 |         11 |     0 | 
|       11 |         13 |     1 | 
|       12 |         12 |     0 | 
|       13 |         13 |     0 | 
|       14 |         14 |     0 | 
|       15 |         15 |     0 | 
|       16 |         16 |     0 | 
|       17 |         17 |     0 | 

I could get newest subtrees like that:

SELECT c.ancestor, MAX(time) AS t 
FROM closure c 
    JOIN nodes n ON (c.descendant = n.node AND c.ancestor <> n.node) 
GROUP BY c.ancestor ORDER BY t desc;

But how could i get distinct 3 root nodes having newest postings (1, 10 and 11 in this case)? Is this possible (and rational) with one query?


Edit: i put sample tables to pastebin


This thread is fairly old but I stumbled across it before coming up with a solution that does not require additional columns besides the standard ancestor & descendant, you don't even need the time, because the OP himself states the problem: you want ancestors that are descendants of no other. Below is the final query, and below that is test data to try it yourself.

select a.node_name, a.node_id
from test.hier a left outer join 
             (select coo.descendant /* coo = CHILD OF OTHER */
              from test.closure_tree coo right outer join test.closure_tree ro
                    on coo.ancestor <> ro.descendant /* ignore its self reference */
                    and coo.descendant = ro.descendant /* belongs to another node besides itself */)lo 
on a.node_id = lo.descendant
where lo.descendant is null /* wasn't found to be a child of another node besides itself */
group by a.node_name, a.node_id

Test data scripts to load up this test hierarchy:

--create table test.hier (
--  node_name varchar(10), 
--  node_id int identity (1,1) primary key
--)     

--insert into test.hier (node_name)
--values ('ROOT1')
--insert into test.hier (node_name)
--values ('ROOT2')
--insert into test.hier (node_name)
--values ('ROOT3')
--insert into test.hier (node_name)
--values ('ChildOf1')
--insert into test.hier (node_name)
--values ('ChildOf1')
--insert into test.hier (node_name)
--values ('ChildOf1')
--insert into test.hier (node_name)
--values ('ChildOf1')
--insert into test.hier (node_name)
--values ('ChildOf1')
--insert into test.hier (node_name)
--values ('ChildOf2')
--insert into test.hier (node_name)
--values ('ChildOf2')
--insert into test.hier (node_name)
--values ('ChildOf3')
--insert into test.hier (node_name)
--values ('ChildOf3')
--insert into test.hier (node_name)
--values ('ChildOf3')
--insert into test.hier (node_name)
--values ('ChildOf3')
--insert into test.hier (node_name)
--values ('LeafOf3')
--insert into test.hier (node_name)
--values ('LeafOf3')
--insert into test.hier (node_name)
--values ('LeafOf3')
--insert into test.hier (node_name)
--values ('LeafOf3')
--insert into test.hier (node_name)
--values ('LeafOf1')
--insert into test.hier (node_name)
--values ('LeafOf2')

--create table test.closure_tree (
--  ancestor int, 
--  descendant int, 
--  PRIMARY KEY (ancestor, descendant), 
--  constraint fk_test_a FOREIGN KEY (ancestor) references test.hier (node_id), 
--  constraint fk_test_d FOREIGN KEY (descendant) references test.hier (node_id)
--)

-- SELF REFERENCES 
--insert into test.closure_tree (ancestor, descendant)
--select node_id as a, node_id as d
--from test.hier

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ROOT1' 
--      and b.node_name = 'ChildOf1'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ROOT2' 
--      and b.node_name = 'ChildOf2'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ROOT3' 
--      and b.node_name = 'ChildOf3'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ChildOf3' 
--      and b.node_name = 'LeafOf3'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ROOT3' 
--      and b.node_name = 'LeafOf3'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ROOT1' 
--      and b.node_name = 'LeafOf1'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ChildOf1' 
--      and b.node_name = 'LeafOf1'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'ChildOf2' 
--      and b.node_name = 'LeafOf2'

--insert into test.closure_tree (ancestor, descendant)
--select a.node_id, b.node_id 
--from test.hier a join test.hier b
--      on a.node_name = 'Root2' 
--      and b.node_name = 'LeafOf2'


---- Test read of hierarchy with weird ordering for human readability
--select a.node_name, b.node_name as descendant_node_name 
--from test.hier a join test.closure_tree c
--  on a.node_id = c.ancestor
--  join test.hier b
--  on c.descendant = b.node_id
--order by right(a.node_name, 1), left(a.node_name, 1) desc


I got kind of solution. "Kind of", because i had to use additional column in nodes-table: root. It says whether node is root-node or not. Using this additional bit i can compose such query:

SELECT c.ancestor, MAX(n.time) AS t FROM closure c
    JOIN nodes n ON (c.descendant = n.node AND c.ancestor <> n.node)
    JOIN nodes n2 ON (c.ancestor = n2.node AND n2.root = 1) 
    GROUP BY c.ancestor ORDER BY t desc LIMIT 3;

Seems to me, it performs pretty well. It scales too. I generated tree with 100000 node and it took around 1 sec to get results (max tree depth was 18).

I attach the perl script (and table schema) for content generation, so maybe some could tune this query to perform better.

#!/usr/bin/perl --

use strict;
use warnings;
use Data::Random qw(:all);
my ($maxnode, $node) = ();

my $dbh = !DATABASE INIT!

foreach ( 1 .. $ARGV[0] ) {
    $node = ($_ == 1) ? 0 : int( rand(4) );

    if (!$node) {
        $maxnode = &RootNode(1);
    }
    else {
        $maxnode = &Node($maxnode);
    }
}


##
## 
sub Node {
my $parent = int( rand($_[0]) ) + 1;

my $id = &RootNode(0, $parent);

my $insert = qq|INSERT INTO test.closure (ancestor, descendant, depth) 
        SELECT ancestor, $id, depth + 1 
        FROM test.closure WHERE descendant = ?|;
$dbh->do($insert, undef, $parent);
return $id;

}
##


##
## 
sub RootNode {
my $min_datetime = $_[0] 
        ? '2008-9-21 4:0:0' 
        :  $dbh->selectrow_array( "SELECT time 
                FROM test.nodes WHERE node = ?", undef, $_[1] );
my $label = join( "", rand_chars( set => 'alpha', min => 5, max => 20 ) );
my $time = rand_datetime( min => $min_datetime, max => 'now' );

my $insert = qq|INSERT INTO test.nodes (label, time, root) VALUES (?, ?, ?)|;
$dbh->do($insert, undef, $label, $time, $_[0]);
my ($id) = $dbh->selectrow_array("SELECT LAST_INSERT_ID()");

$insert = qq|INSERT INTO test.closure (ancestor, descendant, depth) 
        VALUES (?, ?, 0)|;
$dbh->do($insert, undef, $id, $id);

return $id;
}
##

__DATA__
USE test

DROP TABLE IF EXISTS `closure`;
DROP TABLE IF EXISTS `nodes`;

CREATE TABLE `nodes` (
`node` int(11) NOT NULL auto_increment,
`label` varchar(20) NOT NULL,
`time` datetime default NULL,
`root` tinyint(1) unsigned default NULL,
PRIMARY KEY  (`node`)
) ENGINE=InnoDB;

CREATE TABLE `closure` (
`ancestor` int(11) NOT NULL,
`descendant` int(11) NOT NULL,
`depth` tinyint(3) unsigned default NULL,
PRIMARY KEY  (`ancestor`,`descendant`),
KEY `descendant` (`descendant`),
CONSTRAINT `closure_ibfk_1` FOREIGN KEY (`ancestor`) REFERENCES `nodes` (`node`),
CONSTRAINT `closure_ibfk_2` FOREIGN KEY (`descendant`) REFERENCES `nodes` (`node`)
) ENGINE=InnoDB;


You may create one top level element, which is only for reference and all of its descendants will be the root nodes.

  • top
    • root
      • sub1
      • sub2
      • sub3
    • root2
    • root3
    • root4


I tried to simulate it into a database, and I generated this query to find the last 3 root nodes having newest postings. I'm not really sure that I'd undesrstood all your requests, but if I don't, please tell me and as soon as possible I'll try to help you.

My query is the following one:


SELECT TOP 3 QRY_GROUP_ALL_OF_THEM.MínDeancestor, Max(QRY_GROUP_ALL_OF_THEM.descendant) AS MáxDedescendant
FROM (  SELECT Min(closure.ancestor) AS MínDeancestor, [QRY_LAST_INSERTIONS].[descendant]
    FROM closure,   (SELECT DISTINCT closure.descendant 
        FROM closure 
        GROUP BY closure.descendant, closure.depth, closure.ancestor, closure.descendant 
        HAVING  (((closure.descendant>12 And closure.descendant<>[closure].[ancestor]) AND (closure.depth<>0)) 
            OR ((closure.descendant<>[closure].[ancestor]) AND (closure.depth<>0)))
        ) AS QRY_LAST_INSERTIONS
    GROUP BY closure.descendant, [QRY_LAST_INSERTIONS].[descendant]
    HAVING (((closure.descendant)=[QRY_LAST_INSERTIONS].[descendant]))
) AS QRY_GROUP_ALL_OF_THEM
GROUP BY QRY_GROUP_ALL_OF_THEM.MínDeancestor
ORDER BY Max(QRY_GROUP_ALL_OF_THEM.descendant) DESC;

As you can see, there are three queries in the same one. If it works to you, just tell me, and I will explain how it works tomorrow.

Best regards, Jordi Mas


Here you have the same code without the quotes in the alias, check it and tell me if it works with you, please. I tried under Microsoft SQL Server, because I don't have MySQL Server on my laptop, but if it doesn't work, tell me and I will install and try it.

Query:

SELECT TOP 3 QRY_GROUP_ALL_OF_THEM.MinAncestor, Max(QRY_GROUP_ALL_OF_THEM.descendant) AS MaxDescendant
FROM (  
SELECT Min(closure.ancestor) AS MinAncestor, [QRY_LAST_INSERTIONS].[descendant]     
FROM closure, (
    SELECT DISTINCT closure.descendant          
    FROM closure          
    GROUP BY closure.descendant, closure.depth, closure.ancestor, closure.descendant          
    HAVING  (((closure.descendant>12 And closure.descendant<>[closure].[ancestor]) 
        AND (closure.depth<>0))              
        OR ((closure.descendant<>[closure].[ancestor]) AND (closure.depth<>0)))         
) AS QRY_LAST_INSERTIONS     
GROUP BY closure.descendant, [QRY_LAST_INSERTIONS].[descendant]     
HAVING (((closure.descendant)=[QRY_LAST_INSERTIONS].[descendant])) ) AS QRY_GROUP_ALL_OF_THEM 
GROUP BY QRY_GROUP_ALL_OF_THEM.MinAncestor 
ORDER BY Max(QRY_GROUP_ALL_OF_THEM.descendant) DESC; 

The result of this query, with your data is the following one:

MinAncestor: 1, 10, 11

MaxDescendant: 17, 14, 13

I hope it will help you.


After your comment about TOP statement (it doesn't work on MySQL), the final query had to be this one:

SELECT 
    QRY_GROUP_ALL_OF_THEM.MinAncestor, 
    Max(QRY_GROUP_ALL_OF_THEM.descendant) AS MaxDescendant LIMIT 0,3
FROM 
    (  
        SELECT 
            Min(closure.ancestor) AS MinAncestor, 
            [QRY_LAST_INSERTIONS].[descendant]     
        FROM closure, 
            (
                SELECT DISTINCT closure.descendant 
                FROM   closure 
                GROUP  BY closure.descendant, 
                          closure.depth, 
                          closure.ancestor, 
                          closure.descendant 
                HAVING ( ( ( closure.descendant > 12 
                             AND closure.descendant <> [closure].[ancestor] ) 
                           AND ( closure.depth <> 0 ) ) 
                          OR ( ( closure.descendant <> [closure].[ancestor] ) 
                               AND ( closure.depth <> 0 ) ) )        
            ) AS QRY_LAST_INSERTIONS     
        GROUP BY 
            closure.descendant, 
            [QRY_LAST_INSERTIONS].[descendant]     
        HAVING (((closure.descendant)=[QRY_LAST_INSERTIONS].[descendant])) 
    ) AS QRY_GROUP_ALL_OF_THEM 
GROUP BY QRY_GROUP_ALL_OF_THEM.MinAncestor 
ORDER BY Max(QRY_GROUP_ALL_OF_THEM.descendant) DESC;


select x.ancestor
from nodes n
join closure c on (c.descendant = n.node)
join (
-- all root node
   select ancestor 
   from closure
   group by descendant 
   having count(*) = 1
) x ON x.ancestor = c.ancestor
where c.depth = 1
order by n.time desc
limit 3
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号