锘??xml version="1.0" encoding="utf-8" standalone="yes"?>
絎竴縐嶆牸寮忔槸浣跨敤IN鎿嶄綔絎︼細
... where column in(select * from ... where ...);
絎簩縐嶆牸寮忔槸浣跨敤EXIST鎿嶄綔絎︼細
... where exists (select 'X' from ...where ...);
鎴戠浉淇$粷澶у鏁頒漢浼氫嬌鐢ㄧ涓縐嶆牸寮忥紝鍥犱負瀹冩瘮杈冨鏄撶紪鍐欙紝鑰屽疄闄呬笂絎簩縐嶆牸寮忚榪滄瘮絎竴縐嶆牸寮忕殑鏁堢巼楂樸傚湪Oracle涓彲浠ュ嚑涔庡皢鎵鏈夌殑IN鎿嶄綔絎﹀瓙鏌ヨ鏀瑰啓涓轟嬌鐢‥XISTS鐨勫瓙鏌ヨ銆?
絎簩縐嶆牸寮忎腑錛屽瓙鏌ヨ浠モ榮elect 'X'寮濮嬨傝繍鐢‥XISTS瀛愬彞涓嶇瀛愭煡璇粠琛ㄤ腑鎶藉彇浠涔堟暟鎹畠鍙煡鐪媤here瀛愬彞銆傝繖鏍蜂紭鍖栧櫒灝變笉蹇呴亶鍘嗘暣涓〃鑰屼粎鏍規嵁绱㈠紩灝卞彲瀹屾垚宸ヤ綔錛堣繖閲屽亣瀹氬湪where璇彞涓嬌鐢ㄧ殑鍒楀瓨鍦ㄧ儲寮曪級銆傜浉瀵逛簬IN瀛愬彞鏉ヨ錛孍XISTS浣跨敤鐩歌繛瀛愭煡璇紝鏋勯犺搗鏉ヨ姣擨N瀛愭煡璇㈠洶闅句竴浜涖?
閫氳繃浣跨敤EXIST錛孫racle緋葷粺浼氶鍏堟鏌ヤ富鏌ヨ錛岀劧鍚庤繍琛屽瓙鏌ヨ鐩村埌瀹冩壘鍒扮涓涓尮閰嶉」錛岃繖灝辮妭鐪佷簡鏃墮棿銆侽racle緋葷粺鍦ㄦ墽琛孖N瀛愭煡璇㈡椂錛岄鍏堟墽琛屽瓙鏌ヨ錛屽茍灝嗚幏寰楃殑緇撴灉鍒楄〃瀛樻斁鍦ㄥ湪涓涓姞浜嗙儲寮曠殑涓存椂琛ㄤ腑銆傚湪鎵ц瀛愭煡璇箣鍓嶏紝緋葷粺鍏堝皢涓繪煡璇㈡寕璧鳳紝寰呭瓙鏌ヨ鎵ц瀹屾瘯錛屽瓨鏀懼湪涓存椂琛ㄤ腑浠ュ悗鍐嶆墽琛屼富鏌ヨ銆傝繖涔熷氨鏄嬌鐢‥XISTS姣斾嬌鐢↖N閫氬父鏌ヨ閫熷害蹇殑鍘熷洜銆?
鍚屾椂搴斿敖鍙兘浣跨敤NOT EXISTS鏉ヤ唬鏇縉OT IN錛屽敖綆′簩鑰呴兘浣跨敤浜哊OT錛堜笉鑳戒嬌鐢ㄧ儲寮曡岄檷浣庨熷害錛夛紝NOT EXISTS瑕佹瘮NOT IN鏌ヨ鏁堢巼鏇撮珮銆?br />
EXISTS媯鏌ユ槸鍚︽湁緇撴灉錛屽垽鏂槸鍚︽湁璁板綍錛岃繑鍥炵殑鏄竴涓竷灝斿瀷錛圱RUE/FALSE錛夈?br />IN鏄緇撴灉鍊艱繘琛屾瘮杈冿紝鍒ゆ柇涓涓瓧孌墊槸鍚﹀瓨鍦ㄤ簬鍑犱釜鍊肩殑鑼冨洿涓紝鎵浠?EXISTS 姣?IN 蹇?/p>
涓昏鍖哄埆鏄?
exists涓昏鐢ㄤ簬鐗囬潰鐨?鏈夋弧瓚充竴涓潯浠剁殑鍗沖彲,
in涓昏鐢ㄤ簬鍏蜂綋鐨勯泦鍚堟搷浣?鏈夊灝戞弧瓚蟲潯浠?
exists鏄垽鏂槸鍚﹀瓨鍦ㄨ繖鏍風殑璁板綍錛?br />in鏄垽鏂煇涓瓧孌墊槸鍚﹀湪鎸囧畾鐨勬煇涓寖鍥村唴銆?br />exists蹇竴浜涘惂 銆?/p>
in閫傚悎鍐呭琛ㄩ兘寰堝ぇ鐨勬儏鍐碉紝exists閫傚悎澶栬〃緇撴灉闆嗗緢灝忕殑鎯呭喌銆?/p>
鍦ˋSKTOM鐨勮瑙o細
Well, the two are processed very very differently.
Select * from T1 where x in ( select y from T2 )
is typically processed as:
select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;
The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then
joined to the original table -- typically.
As opposed to
select * from t1 where exists ( select null from t2 where y = x )
That is processed more like:
for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop
It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).
So, when is where exists appropriate and in appropriate?
Lets say the result of the subquery
( select y from T2 )
is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is very very fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.
Lets say the result of the subquery is small -- then IN is typicaly more
appropriate.
If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.
EXISTS
and NOT EXISTS
If a subquery returns any values at all, then EXISTS <subquery>
is TRUE
, and NOT EXISTS <subquery>
is FALSE
. For example:
SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
Traditionally an EXISTS
subquery starts with SELECT *
but it could begin with SELECT 5
or SELECT column1
or anything at all -- MySQL ignores the SELECT
list in such a subquery, so it doesn't matter.
For the above example, if t2
contains any rows, even rows with nothing but NULL
values, then the EXISTS
condition is TRUE
. This is actually an unlikely example, since almost always a [NOT] EXISTS
subquery will contain correlations. Here are some more realistic examples.
Example: What kind of store is present in one or more cities?
SELECT DISTINCT store_type FROM Stores WHERE EXISTS (SELECT * FROM Cities_Stores WHERE Cities_Stores.store_type = Stores.store_type);
Example: What kind of store is present in no cities?
SELECT DISTINCT store_type FROM Stores WHERE NOT EXISTS (SELECT * FROM Cities_Stores WHERE Cities_Stores.store_type = Stores.store_type);
Example: What kind of store is present in all cities?
SELECT DISTINCT store_type FROM Stores S1 WHERE NOT EXISTS ( SELECT * FROM Cities WHERE NOT EXISTS ( SELECT * FROM Cities_Stores WHERE Cities_Stores.city = Cities.city AND Cities_Stores.store_type = Stores.store_type));
The last example is a double-nested NOT EXISTS
query -- it has a NOT EXISTS
clause within a NOT EXISTS
clause. Formally, it answers the question ``does a city exist with a store which is not in Stores?''. But it's easier to say that a nested NOT EXISTS
answers the question ``is x TRUE for all y?''.