I'm trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:
Assume the following table structure (should work in SQLite, PostgreSQL, MySQL):
CREATE TABLE a (
a_id INT PRIMARY KEY
);
INSERT INTO a (a_id) VALUES (1), (2), (3), (4);
CREATE TABLE b (
b_id INT PRIMARY KEY,
a_id INT,
name VARCHAR(255) NOT NULL
);
INSERT INTO b (b_id, a_id, name)
VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 1, 'asdf'),
(4, 2, 'foo'), (5, 2, 'bar'), (6, 3, 'foo');
Now my problem is to find entries in a that are missing name entries in table b. E.g. I need to make sure each entry in a has at least the name entries "foo" and "bar" in table b. Hence the query should return something similar to:
a_id = 3 is missing name "bar"
a_id = 4 is missing name "foo" and "bar"
Since both tables are potentially huge in Akonadi, performance is of utmost importance.
One solution in MySQL would be:
SELECT a.a_id,
CONCAT('|', GROUP_CONCAT(name ORDER BY NAME ASC SEPARATOR '|'), '|') as names
FROM a
LEFT JOIN b 开发者_开发知识库USING( a_id )
GROUP BY a.a_id
HAVING names IS NULL OR names NOT LIKE '%|bar|foo|%';
I have yet to measure the performance tomorrow, but severly doubt it's any fast for tens of thousand of entries in a and thrice as many in b. Furthermore we want to support SQLite and PostgreSQL where to my knowledge the GROUP_CONCAT function is not available.
Thanks, good night.
This should work with any SQL standard RDBMS:
SELECT
a.a_id,
Foo.b_id as Foo_Id,
Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
a.a_id = Bar.a_id
WHERE
Foo.a_id IS NULL
OR Bar.a_id IS NULL
Well, you could do with some definition in the database of which are the required elements. So I'll create one:
CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');
(this could be a temporary table or just an inline union of constants if it's dynamic)
Now the set of rows we expect to find in b is given by:
SELECT a.a_id, required.name FROM a CROSS JOIN required;
So we outer join this set against b to determine what's present and what's not:
SELECT a.a_id, required.name, b.b_id
FROM a
CROSS JOIN required
LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;
or alternatively:
SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);
Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.
I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:
SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
FROM a
LEFT JOIN b USING (a_id)
GROUP BY a.a_id
HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;
turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.
加载中,请稍侯......
精彩评论