This will give you a wrong result. This will end up selecting all the rows from EMP table since for each employee there are chances of finding more than one non matching DEPT_CODE from DEPT table. While his actual requirement is to select only those employees from EMP table who doesn't have a matching DEPT_CODE in DEPT table. The sub query is the best method to select it. The above query will result in a cartesian product.
Joined: 22 Nov 2005 Posts: 700 Location: Troy, Michigan USA
What is your objection to using sub queries? They can be very efficient if used correctly.
When you use the ?IN? processing, for every row in the main query DB2 must do a select of every row of the result table from the sub query or until a match is made. For ?NOT IN? DB2 must query every row in the result table. This can be very resource intensive if the result table is very large. Also, if the sub query references values in the main query, this will now be very inefficient because DB2 will have to rebuild the result table for every main query row.
Using the ?EXISTS? or ?NOT EXISTS? can be very efficient if the columns you are comparing are indexed. You target on a single row, and either it EXISTS, or it does NOT EXISTS. Now, if the columns you are comparing are not indexed, the ?EXISTS? and ?NOT EXISTS? will have to query rows as with the ?IN? and ?NOT IN?.
Which is going to be more efficient? Only you will know. If the columns are indexed, the ?EXISTS? will be efficient. If the columns are not indexed and the result table is small, the ?IN? will be efficient, Otherwise both will be inefficient.
One way you can make this more efficient sometimes is to utilize a ?GLOBAL TEMPORARY TABLE? in your processing. if your sub query table is very large and not indexed, and your result table is moderate to large in size, you can use the populate the Temporary table with the result table, index it and then use ?EXISTS? processing.