Someone I know went to an interview and was given the following problem to solve. I've thought about it for a few hours and believe that it's not possible to do without using some database-specific extensions or features from recent standards that don't have wide support yet.
I don't remember the story behind what is being represented, but it's not relevant. In simple terms, you're trying to represent chains of unique numbers:
chain 1: 1 -> 2 -> 3
chain 2: 42 -> 78
chain 3: 4
chain 4: 7 -> 8 -> 9
...
This information is already stored for you in the following table structure:
id | parent
---+-------
1 | NULL
2 | 1
3 | 2
42 | NULL
78 | 42
4 | NULL
7 | NULL
8 | 7
9 | 8
There could be millions of such chains and each chain can have an unlimited number of entries. The goal is to create a second table that would contain the exact same information, but with a third column that contains the starting point of the chain:
id | parent | start
---+--------+------
1 | NULL | 1
2 | 1 | 1
3 | 2 | 1
42 | NULL | 42
78 | 42 | 42
4 | NULL | 4
7 | NULL | 7
8 | 7 | 7
9 | 8 | 7
The claim (made by the interviewers) is that this can be achieved with just 2 SQL queries. The hint they provide is to first populate the destination table (I'll call it dst) with the root elements, like so:
INSERT INTO dst SELECT id, parent, id FROM src WHERE parent IS NULL
This will give you the following content:
id | parent | start
---+--------+------
1 | NULL | 1
42 | NULL | 42
4 | NULL | 4
7 | NULL | 7
They say that you can now execute just one more query to get to the goal shown above.
In my opinion, you can do one of two things. Use recursion in the source table to get to the front of each chain, or continuously execute some version of SELECT start FROM dst WHERE dst.id = src.parent
after each update to dst (i.e. can't cache the results).
I don't think either of these situations is supported by common databases like MySQL, PostgreSQL, SQLite, etc. I do know that in PostgreSQL 8.4 you can achieve recursion using WITH RECURSIVE
query, and in Oracle you have START WITH
and CONNECT BY
clauses. The point is that these things are specific to database type and version.
Is there any way to achieve the desired result using regular SQL92 in just one query? The best I could do is fill-in the start column for the first child with the following (can also use a LEFT JOIN to achieve the same result):
INSERT INTO dst
SELECT s.id, s.parent,
(SELECT start FROM d开发者_运维百科st AS d WHERE d.id = s.parent) AS start
FROM src AS s
WHERE s.parent IS NOT NULL
If there was some way to re-execute the inner select statement after each insert into dst, then the problem would be solved.
It can not be implemented in any static SQL that follows ANSI SQL 92.
But as you said it can be easy implemented with oracle's CONNECT BY
SELECT id,
parent,
CONNECT_BY_ROOT id
FROM table
START WITH parent IS NULL
CONNECT BY PRIOR id = parent
In SQL Server you would use a Common Table Expression (CTE).
To replicate the stored data I've created a temporary table
-- Create a temporary table
CREATE TABLE #SourceData
(
ID INT
, Parent INT
)
-- Setup data (ID, Parent, KeyField)
INSERT INTO #SourceData VALUES (1, NULL);
INSERT INTO #SourceData VALUES (2, 1);
INSERT INTO #SourceData VALUES (3, 2);
INSERT INTO #SourceData VALUES (42, NULL);
INSERT INTO #SourceData VALUES (78, 42);
INSERT INTO #SourceData VALUES (4, NULL);
INSERT INTO #SourceData VALUES (7, NULL);
INSERT INTO #SourceData VALUES (8, 7);
INSERT INTO #SourceData VALUES (9, 8);
Then I create the CTE to compile the data result:
-- Perform CTE
WITH RecursiveData (ID, Parent, Start) AS
(
-- Base query
SELECT ID, Parent, ID AS Start
FROM #SourceData
WHERE Parent IS NULL
UNION ALL
-- Recursive query
SELECT s.ID, s.Parent, rd.Start
FROM #SourceData AS s
INNER JOIN RecursiveData AS rd ON s.Parent = rd.ID
)
SELECT * FROM RecursiveData WHERE Parent IS NULL
Which will output the following:
id | parent | start
---+--------+------
1 | NULL | 1
42 | NULL | 42
4 | NULL | 4
7 | NULL | 7
Then I clean up :)
-- Clean up
DROP TABLE #SourceData
There is no recursive query support in ANSI-92, because it was added in ANSI-99. Oracle has had it's own recursive query syntax (CONNECT BY
) since v2. While Oracle supported the WITH clause since 9i, SQL Server is the first I knew of to support the recursive WITH/CTE syntax -- Oracle didn't start until 11gR2. PostgreSQL added support in 8.4+. MySQL has had a request in for WITH support since 2006, and I highly doubt you'll see it in SQLite.
The example you gave is only two levels deep, so you could use:
INSERT INTO dst
SELECT a.id,
a.parent,
COALESCE(c.id, b.id) AS start
FROM SRC a
LEFT JOIN SRC b ON b.id = a.parent
LEFT JOIN SRC c ON c.id = b.parent
WHERE a.parent IS NOT NULL
You'd have to add a LEFT JOIN for the number of levels deep, and add them in proper sequence to the COALESCE function.
精彩评论