SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows_问答_开发者

SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows

开发者 https://www.devze.com 2022-12-30 17:43 出处：网络

I\'m trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:

相关专题：postgresql sql

I'm trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:

Assume the following table structure (should work in SQLite, PostgreSQL, MySQL):

CREATE TABLE a (
  a_id INT PRIMARY KEY
);

INSERT INTO a (a_id) VALUES (1), (2), (3), (4);

CREATE TABLE b (
  b_id INT PRIMARY KEY,
  a_id INT,
  name VARCHAR(255) NOT NULL
);

INSERT INTO b (b_id, a_id, name)
       VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 1, 'asdf'),
              (4, 2, 'foo'), (5, 2, 'bar'), (6, 3, 'foo');

Now my problem is to find entries in a that are missing name entries in table b. E.g. I need to make sure each entry in a has at least the name entries "foo" and "bar" in table b. Hence the query should return something similar to:

a_id = 3 is missing name "bar"
a_id = 4 is missing name "foo" and "bar"

Since both tables are potentially huge in Akonadi, performance is of utmost importance.

One solution in MySQL would be:

SELECT a.a_id,
       CONCAT('|', GROUP_CONCAT(name ORDER BY NAME ASC SEPARATOR '|'), '|') as names
  FROM a
  LEFT JOIN b 开发者_开发知识库USING( a_id )
  GROUP BY a.a_id
  HAVING names IS NULL OR names NOT LIKE '%|bar|foo|%';

I have yet to measure the performance tomorrow, but severly doubt it's any fast for tens of thousand of entries in a and thrice as many in b. Furthermore we want to support SQLite and PostgreSQL where to my knowledge the GROUP_CONCAT function is not available.

Thanks, good night.

This should work with any SQL standard RDBMS:

SELECT 
   a.a_id, 
   Foo.b_id as Foo_Id,
   Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
   a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
   a.a_id = Bar.a_id
WHERE
   Foo.a_id IS NULL
   OR Bar.a_id IS NULL

Well, you could do with some definition in the database of which are the required elements. So I'll create one:

CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');

(this could be a temporary table or just an inline union of constants if it's dynamic)

Now the set of rows we expect to find in b is given by:

SELECT a.a_id, required.name FROM a CROSS JOIN required;

So we outer join this set against b to determine what's present and what's not:

SELECT a.a_id, required.name, b.b_id
FROM a
     CROSS JOIN required
     LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;

or alternatively:

SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);

Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.

I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:

SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
  FROM a
  LEFT JOIN b USING (a_id)
  GROUP BY a.a_id
  HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;

turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.