SQL - query inside NOT IN takes longer than the complete query?_问答_开发者

I'm using NOT IN inside my SQL query.

For example:

select columnA 
from table1
where columnA not in (
select columnB
from table2)

How is it possible that this part of the query

select columnB
from table2

takes 30sec to complete, but the whole query above takes 0.1sec to complete?? Shouldn't the complete query take 30sec + ?

BTW, both queries return valid results.

Thanks!

Answers to Comments

Is it because the second query hasn't actually completed but has only returned back the first 'x' rows (out of a very large table?)

No, the query is completed after 30 seconds, not to many rows returned (eg. 50).

But @Aleksandar wondered why the question con开发者_如何学编程gaing the performance killer was so fast.

my point exactly

Also how long does select distinct columnB from table2 take to execute?

actually, the original query is "select distinct...

It seems you are thinking that your main query implies the following steps:

(1)  Run the subquery
(2)  Check each row in table1 against the result set from the subquery.

Therefore, you think that running the subquery separately must take less time than running the whole query.

But SQL is not a procedural language, and the structure of the query does not necessarily imply the steps that will be followed to execute the query.

As Guffa answered, the optimizer will come up with (what it believes is) an optimal plan to execute each query. These execution plans are not always obvious from looking at the query, and in some cases can indeed be quite counter-intuitive.

I think that it is most likely, in this case, that the optimizer has come up with a quicker method for checking whether a value exists in table2 than simply querying all of table2 at once. It could be the transformation Guffa showed (although that still does not tell you the exact execution plan being used).

I would guess that table1 has significantly fewer rows than table2, and an index exists on table2.columnB. So all it has to do is fetch the rows from table1, then probe the index for each of those value to check for existence. But this is only one possibility.

Also, as Michael Buen pointed out, differences in the size of the result set returned can also impact your perceived performance. My intuition is that this is secondary to the execution plan differences, but it can be significant.

It's because the query optimiser turns the query to something that looks completely different. The actual query should be the same as what's produced by a query like this:

select columnA 
from table1
left join table2 on ColumnA = ColumnB
where ColumnB is null

If the database can use indexes to join the tables, perhaps it doesn't have to query the entire table2, or even touch the table itself.

A dramatic comparison, let's say this...

select columnB
from table2

...has billion rows (30 seconds), many data travels the wire and presented to user.

And this...

select columnA 
from table1

...has only one row.

RDBMS won't do a dumb job of pulling the data of table2 from server to client if you don't intend to display the data of table2. So no network bandwidth or I/O will be involved much when doing data presence testing, it all happens at the server, the only thing that would be pulled from server to client is the one row of table1 only.

select columnA 
from table1
where columnA not in (
select columnB
from table2)

And things shall be especially fast if your columnA and columnB happen to have an index

Things that would make a database operation slow are twofold: first is when you pull too much data from the server to client, second is when you don't have an index on pertinent fields

When it can make use of indexes and the number of returned results is small. It might be. Returning the results can cause execution time.

just going to chime in, make sure that you are aware of the difference between NOT In and NOT EXISTS.

If "columnA" is NULL, it won't be returned via the NOT IN solution you are looking at, but the LEFT antijoin examples already presented will behave as NOT EXISTS.

Also, make sure that TOAD/SQL Developer are not 'just showing the top 50' which they like to do (do a select count(*) from table1 to see if 50 is indeed the query result).

do an EXPLAIN PLAN on the query and see if it highlights anything that looks egregious -- check your indexes and also see if the columns allow NULLS -- lack of indexes may be the culprit but a full table scan from NULLS may be causing the nightmare).

NOT is a performance killer.

On some SQL engine does first the in(...) into a temporary table the re-run the query doing a NOT on the datas in the temporary table.

If you can, you should use IN only !