Portal | Manuals | References | Downloads | Info | Programs | JCLs | Mainframe wiki | Quick Ref
IBM Mainframe Forum Index
 
Register
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Profile Log in to check your private messages Log in
 
DB2 Query runs very long
Goto page 1, 2  Next
 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DB2
View previous topic :: :: View next topic  
Author Message
John F Dutcher

New User


Joined: 02 Apr 2008
Posts: 10
Location: Harrisburg, PA

PostPosted: Wed Aug 21, 2019 11:44 pm    Post subject: DB2 Query runs very long
Reply with quote

Short query below runs very long. I include the cols in the table and the only index info. I was advised to replace the current approach to obtaining the table row with the 'latest' timestamp with code that uses the index. How would I do that.

The table:
Code:

HP_ID                      CHAR (9)
HP_ADDR_TKN         INTEGER
NWK_ENR_TKN        INTEGER
ENR_PAR_OPT_CD   CHAR (3)
ENR_PAR_OPT_EFF_DATE  DATE
ENR_PAR_OPT_CNL_DATE DATE
ANW_LGN_ID          CHAR(8)
ANW_TSP_GRP        TIMESTAMP

Index:
Code:

     NWK_ENR_TKN               1  A INTEGER         4 
     ENR_PAR_OPT_CD           2  A CHAR             3 
     ENR_PAR_OPT_EFF_DT    3  A DATE              4
     ANW_TSP_GRP                4  A TIMESTMP     10 


Existing query:


Code:

SELECT A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD                                     
FROM   T401ANPO A                                           
WHERE  A.ENR_PAR_OPT_CAN_DT        >  CURRENT DATE         
  AND  A.ENR_PAR_OPT_EFF_DT        <> A.ENR_PAR_OPT_CAN_DT 
  AND  A.ANW_TSP_GRP               =                       
       (SELECT MAX (B.ANW_TSP_GRP)                         
        FROM   T401ANPO B                                   
        WHERE  A.HP_ID             = B.HP_ID               
          AND  A.HP_ADDR_TKN       = B.HP_ADDR_TKN         
          AND  A.ENR_PAR_OPT_CD    = B.ENR_PAR_OPT_CD)     
ORDER BY A.HP_ID                                           
       , A.ENR_PAR_OPT_CD;                         
Back to top
View user's profile Send private message

Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Thu Aug 22, 2019 12:00 am    Post subject:
Reply with quote

Code:
SELECT
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD         
     , MAX (A.ANW_TSP_GRP)                             
FROM   T401ANPO A                                           
WHERE  A.ENR_PAR_OPT_CAN_DT        >  CURRENT DATE         
  AND  A.ENR_PAR_OPT_EFF_DT       <> A.ENR_PAR_OPT_CAN_DT 
Group By
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD 
ORDER BY
         A.HP_ID                                           
       , A.ENR_PAR_OPT_CD;   

Also, Do a RUNSTAT and REORG , Check with DBA for that.
You will need additional Index on A.ENR_PAR_OPT_CAN_DT column which will be a combination of A.ENR_PAR_OPT_CAN_DT and A.ENR_PAR_OPT_EFF_DT.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 12:32 am    Post subject:
Reply with quote

The difference between two queries may be if there are multiple equal values of
MAX (A.ANW_TSP_GRP)
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Thu Aug 22, 2019 1:21 am    Post subject:
Reply with quote

Chances are 0.0001% to have TIMESTAMP duplicate and second, MAX gets either one if its duplicates and at that point it doesn't matter which one it picks.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 2:11 am    Post subject:
Reply with quote

Rohit Umarjikar wrote:
Chances are 0.0001% to have TIMESTAMP duplicate and second, MAX gets either one if its duplicates and at that point it doesn't matter which one it picks.
...unless the initial idea was: to select ALL rows with MAX values...
The source, and the role of this particular TIMESTAMP is unknown.
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Thu Aug 22, 2019 2:13 am    Post subject:
Reply with quote

MAX will always gives you one row regardless of any idea.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 6:35 pm    Post subject:
Reply with quote

Rohit Umarjikar wrote:
MAX will always gives you one row regardless of any idea.

...but database may (or may not) include multiple rows with exactly the same value equal to this MAX.

The original query would extract all those rows if they only existed. Not sure if this was the intention of the author, but the query structure looks like an attempt to do so (in a very inefficient way).

The modified query always returns only one single row for each MAX value.

This is the difference.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 7:56 pm    Post subject:
Reply with quote

If the TS really didn't want to get multiple rows with all TIMESTAMP values equal to their local MAX, then there would be absolutely no need to create the subquery in the initial sample.

If multiple rows with equal TIMESTAMPs are really needed then I would suggest to re-organize the query itself to a more reasonable process, - before trying to optimize indexes, etc. etc. etc.
Code:
SELECT A.HP_ID                                                 
     , A.HP_ADDR_TKN                                           
     , A.ENR_PAR_OPT_CD                                         
 FROM T401ANPO                   A ,                           
      (SELECT HP_ID                                             
            , HP_ADDR_TKN                                       
            , ENR_PAR_OPT_CD                                   
            , MAX(ANW_TSP_GRP) AS MAX_TSP                       
         FROM T401ANPO                                         
        WHERE ENR_PAR_OPT_CAN_DT >  CURRENT DATE               
          AND ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT ) B     
 WHERE A.HP_ID          = B.HP_ID                               
   AND A.HP_ADDR_TKN    = B.HP_ADDR_TKN                         
   AND A.ENR_PAR_OPT_CD = B.ENR_PAR_OPT_CD                     
   AND A.ANW_TSP_GRP    = B.MAX_TSP                             
 ORDER BY A.HP_ID                                               
        , A.ENR_PAR_OPT_CD;


The original example would run its inner subquery many times - again, and again for every row selected by the outer query. This seems to be a sort of suicide for big tables.

The modified example here runs its subquery only once to create so to speak a temporary table B, and then the outer query performs sequential JOIN operation between tables A, and B.
This must be much more efficient approach.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 8:17 pm    Post subject:
Reply with quote

One more observation.

In the initial sample the field under question is named
ANW_TSP_GRP
I dare to guess, this stands for "(something) Timestamp Group", isn't it?

If so, then it must be almost obvious that groups of rows with equal TIMESTAMPs are expected.
Hence, my concerns about multiple rows to be selected are getting real.

I hope the TS has any idea about the actual intentions of those who created the original query? This understanding must be always the first step before trying any mechanical/automated "optimization" of any query, or any program code at all.
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Thu Aug 22, 2019 9:12 pm    Post subject:
Reply with quote

Quote:
This must be much more efficient approach.
Not really. Because temp B has no keys when joined against A, you are missing Group By within B for MAX. You are building B say for 1M Rows and now again you are trying to JOIN that against 1M with possibly no keys since its temp table.
Quote:
...but database may (or may not) include multiple rows with exactly the same value equal to this MAX.
That's exactly what MAX issued to get one if they are same.
Quote:
This is the difference.
The original query and modified query results will always same if only one timestamp row present then you get that as MAX and if none present then no rows comes out from both the queries.

As suggested in my first reply, add the INDEX (A.HP_ID A.HP_ADDR_TKN needs to have indexes too if they are key) and do a RUNSTAT/REORG then problem should go away.

Try this and let us know most of the times <> creates problems too.
Code:
SELECT
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD         
     , MAX (A.ANW_TSP_GRP)                             
FROM   T401ANPO A           
WHERE
NOT EXISTS (
SELECT 1 FROM  T401ANPO B
       B.ENR_PAR_OPT_CAN_DT      <=  CURRENT DATE         
  AND  B.ENR_PAR_OPT_EFF_DT       = A.ENR_PAR_OPT_CAN_DT
  AND  B.HP_ID                    = A.HP_ID                                               
  AND  B.HP_ADDR_TKN              = A.HP_ADDR_TKN                                       
  AND  B.ENR_PAR_OPT_CD           = A.ENR_PAR_OPT_CD
           )                                 
Group By
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD 
ORDER BY
         A.HP_ID                                           
       , A.ENR_PAR_OPT_CD; 
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 9:27 pm    Post subject:
Reply with quote

Rohit Umarjikar wrote:
Quote:
This must be much more efficient approach.
Not really. Because temp B has no keys when joined against A, you are missing Group By within B for MAX. You are building B say for 1M Rows and now again you are trying to JOIN that against 1M with possibly no keys since its temp table.

I bet the approach with JOIN instead of multiple subqueries on each outer query's row should win the game, even without special indexes optimization.

Rohit Umarjikar wrote:
Quote:
...but database may (or may not) include multiple rows with exactly the same value equal to this MAX.
That's exactly what MAX issued to get one if they are same.

I'm sorry, but this is full misunderstanding of the original issue.

Rohit Umarjikar wrote:
Quote:
This is the difference.
The original query and modified query results will always same if only one timestamp row present then you get that as MAX and if none present then no rows comes out from both the queries.

Again, the long explanation has been given that possibility of multiple rows is the reason for the complex original query.

Rohit Umarjikar wrote:
As suggested in my first reply, add the INDEX (A.HP_ID A.HP_ADDR_TKN needs to have indexes too if they are key) and do a RUNSTAT/REORG then problem should go away.

Again: step #1 should be always verification of the real concept of the code, and only the final step - any mechanical optimization of the process.
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Thu Aug 22, 2019 9:32 pm    Post subject:
Reply with quote

Quote:
I bet the approach with JOIN instead of multiple subqueries on each outer query's row should win the game, even without special indexes optimization.

That will give syntax error for not using GROUP BY at first attempt. Second additional self-joins to all the columns of temp B is needed when joined with A to even make it optimized to the imagination.
Quote:
Again, the long explanation has been given that possibility of multiple rows is the reason for the complex original query.
Original query is quite simple and straightforward and let us keep it simple here than complicating and confusing it with any possibilities which don't exists.
Quote:
Again: step #1 should be always verification of the real concept of the code, and only the final step - any mechanical optimization of the process.
No. If the code is working fine and one starts seeing the performance issues then nothing is wrong with the code but wrong with INDEX(s)/REORG/RUNSTAT. DBA should be the first resort and then the code. Without having correct indexes ,RUNSTATS/REORGs done (specially when INSERTS and DELETEs)however you mold the query it will give performance issues at some point for sure since the underlying data is incremental.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 9:39 pm    Post subject:
Reply with quote

Lewis Carrol wrote:

"Two added to one - if that could but be done,"
It said, "with one's fingers and thumbs!"
Recollecting with tears how, in earlier years,
It had taken no pains with its sums.

"Taking Three as the subject to reason about -
A convenient number to state -
We add Seven, and Ten, and then multiply out
By One Thousand diminished by Eight.

"The result we proceed to divide, as you see,
By Nine Hundred and Ninety Two:
Then subtract Seventeen, and the answer must be
Exactly and perfectly true.

"The method employed I would gladly explain,
While I have it so clear in my head,
If I had but the time and you had but the brain -
But much yet remains to be said.

2 + 1 = (3 + 7 + 10) * (1000 - 8) / 992 - 17 = 3 !!!
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Thu Aug 22, 2019 10:09 pm    Post subject:
Reply with quote

My intention was, to give the idea of the more efficient approach to the same task, but not to present a ready-to-copy-paste code to run.

At the experts forum this should not be a problem to fix a syntax error in the suggested process. I really missed GROUP BY part, but it doesn't change anything in my post. My mistake, it should really look like this
Code:
SELECT A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD                                     
 FROM T401ANPO                   A ,                       
      (SELECT HP_ID                                         
            , HP_ADDR_TKN                                   
            , ENR_PAR_OPT_CD                               
            , MAX(ANW_TSP_GRP) AS MAX_TSP                   
         FROM T401ANPO                                     
        WHERE ENR_PAR_OPT_CAN_DT >  CURRENT DATE           
          AND ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT     
       GROUP BY HP_ID                                       
              , HP_ADDR_TKN                                 
              , ENR_PAR_OPT_CD ) B                         
 WHERE A.HP_ID          = B.HP_ID                           
   AND A.HP_ADDR_TKN    = B.HP_ADDR_TKN                     
   AND A.ENR_PAR_OPT_CD = B.ENR_PAR_OPT_CD                 
   AND A.ANW_TSP_GRP    = B.MAX_TSP                         
 ORDER BY A.HP_ID                                           
        , A.ENR_PAR_OPT_CD; 


Again: I bet it should work much, much faster than the one from initial post.

Again: the possible problem with multiple TIMESTAMP in a group of equal values can be easily concluded from all previous detailed explanation, if not to ignore them.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Fri Aug 23, 2019 7:22 pm    Post subject:
Reply with quote

In order to verify my statements about the difference in performance for two approaches to this type of queries I have chosen a similar table from my existing database, with about 600K+ rows in it; just to run the performance test.

The running of the original query (adapted to my own table names, and field names which I do not display here)
Code:
SELECT A.HP_ID                                             
      , A.HP_ADDR_TKN                                       
      , A.ENR_PAR_OPT_CD                                     
 FROM   T401ANPO A                                           
 WHERE  A.ENR_PAR_OPT_CAN_DT        >  CURRENT DATE         
   AND  A.ENR_PAR_OPT_EFF_DT        <> A.ENR_PAR_OPT_CAN_DT 
   AND  A.ANW_TSP_GRP               =                       
        (SELECT MAX (B.ANW_TSP_GRP)                         
         FROM   T401ANPO B                                   
         WHERE  A.HP_ID             = B.HP_ID               
           AND  A.HP_ADDR_TKN       = B.HP_ADDR_TKN         
           AND  A.ENR_PAR_OPT_CD    = B.ENR_PAR_OPT_CD)     
 ORDER BY A.HP_ID                                           
        , A.ENR_PAR_OPT_CD;   

- it took approx. 1:30 minutes.

The running of the updated query (also adapted to the same table names, and field names)
Code:
SELECT A.HP_ID                                             
      , A.HP_ADDR_TKN                                       
      , A.ENR_PAR_OPT_CD                                     
  FROM T401ANPO                   A ,                       
       (SELECT HP_ID                                         
             , HP_ADDR_TKN                                   
             , ENR_PAR_OPT_CD                               
             , MAX(ANW_TSP_GRP) AS MAX_TSP                   
          FROM T401ANPO                                     
         WHERE ENR_PAR_OPT_CAN_DT >  CURRENT DATE           
           AND ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT     
        GROUP BY HP_ID                                       
               , HP_ADDR_TKN                                 
               , ENR_PAR_OPT_CD ) B                         
  WHERE A.HP_ID          = B.HP_ID                           
    AND A.HP_ADDR_TKN    = B.HP_ADDR_TKN                     
    AND A.ENR_PAR_OPT_CD = B.ENR_PAR_OPT_CD                 
    AND A.ANW_TSP_GRP    = B.MAX_TSP                         
  ORDER BY A.HP_ID                                           
         , A.ENR_PAR_OPT_CD; 

- it produced the same results (including multiple rows with equal MAX(TIMESTAMP) values!), and it took only 0:55 mins (e.g. 55 seconds).
The difference in performance of two approaches is almost 2 times without any special optimization, or adjustment of settings.

I expect that for tables of 100M+ rows the difference in performance would be even bigger. I have my own experience with performance issues; and always optimization of used methods of processing gave much better results compared to only optimization of indexes, and other dilettantish methods.
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Fri Aug 23, 2019 8:14 pm    Post subject:
Reply with quote

This is what I would do first.
Quote:
I was advised to replace the current approach to obtaining the table row with the 'latest' timestamp with code that uses the index.
This indeed means that only one value of unique max timestamp is expected and in any case of duplicates it shall be an error. let TS confirms the understanding first before assuming the designed query is for the sole purpose of duplicate timestamp values.

Option1:
1. Do the REORG/RUNSTAT and rerun to see the performance change.
2. If proper index can be allowed to added based on EXPLAIN then that's much better.
OPTION2:
Code:
SELECT
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD         
     , MAX (A.ANW_TSP_GRP)                             
FROM   T401ANPO A                                           
WHERE  A.ENR_PAR_OPT_CAN_DT        >  CURRENT DATE         
  AND  A.ENR_PAR_OPT_EFF_DT       <> A.ENR_PAR_OPT_CAN_DT 
Group By
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD 
ORDER BY
         A.HP_ID                                           
       , A.ENR_PAR_OPT_CD;

OPTION3 :
Code:
SELECT
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD         
     , MAX (A.ANW_TSP_GRP)                             
FROM   T401ANPO A           
WHERE
NOT EXISTS (
SELECT 1 FROM  T401ANPO B
       B.ENR_PAR_OPT_CAN_DT      <=  CURRENT DATE         
  AND  B.ENR_PAR_OPT_EFF_DT       = A.ENR_PAR_OPT_CAN_DT
  AND  B.HP_ID                    = A.HP_ID                                               
  AND  B.HP_ADDR_TKN              = A.HP_ADDR_TKN                                       
  AND  B.ENR_PAR_OPT_CD           = A.ENR_PAR_OPT_CD
           )                                 
Group By
       A.HP_ID                                             
     , A.HP_ADDR_TKN                                       
     , A.ENR_PAR_OPT_CD 
ORDER BY
         A.HP_ID                                           
       , A.ENR_PAR_OPT_CD;

Quote:
The difference in performance of two approaches is almost 2 times without any special optimization, or adjustment of settings.
Might be the case that Results from the first query is still present in cashes and second query actually picking it form there than actually going to physical Db. Two sites have different TS/TB attributes , for e.g. Bufferpool value.

TS has many options to try now and come back which one works or none works.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Fri Aug 23, 2019 11:31 pm    Post subject:
Reply with quote

Rohit Umarjikar wrote:

Option1:

OPTION2:

OPTION3 :



Option 1 - can be considered, but there is a little chance to get serious performance improvement. I'd recommended it to be done after improving the code algorithm itself, not starting with this step.

Option2 - I tried with my similar test table. It really runs about 3-4 seconds with my database, but multiple rows with equal values of detected MAX(TIMESTAMP_GROUP) are missing from output.
This type of query is definitely very fast because it has ignored the essential part of the original query: obvious attempt to extract possible duplicate values equal to each MAX(TIMESTAMP_GROUP)

Option3 - there are syntax errors:
- (obvious one) keyword WHERE is missing
- (not obvious one) need to compare
B.ENR_PAR_OPT_EFF_DT = B.ENR_PAR_OPT_CAN_DT
instead of
B.ENR_PAR_OPT_EFF_DT = A.ENR_PAR_OPT_CAN_DT
When those errors fixed it runs about 2:15 minutes, producing the same results as option2 (e.g. with missing duplicate values).
The reason of bad performance is obvious without any RUNSTAT: all intermediate data produced in the middle of this query are significantly bigger than any other example given in this topic.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Fri Aug 23, 2019 11:42 pm    Post subject:
Reply with quote

sergeyken wrote:
updated query
Code:
SELECT A.HP_ID                                             
      , A.HP_ADDR_TKN                                       
      , A.ENR_PAR_OPT_CD                                     
  FROM T401ANPO                   A ,                       
       (SELECT HP_ID                                         
             , HP_ADDR_TKN                                   
             , ENR_PAR_OPT_CD                               
             , MAX(ANW_TSP_GRP) AS MAX_TSP                   
          FROM T401ANPO                                     
         WHERE ENR_PAR_OPT_CAN_DT >  CURRENT DATE           
           AND ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT     
        GROUP BY HP_ID                                       
               , HP_ADDR_TKN                                 
               , ENR_PAR_OPT_CD ) B                         
  WHERE A.HP_ID          = B.HP_ID                           
    AND A.HP_ADDR_TKN    = B.HP_ADDR_TKN                     
    AND A.ENR_PAR_OPT_CD = B.ENR_PAR_OPT_CD                 
    AND A.ANW_TSP_GRP    = B.MAX_TSP                         
  ORDER BY A.HP_ID                                           
         , A.ENR_PAR_OPT_CD; 

- it produced the same results (including multiple rows with equal MAX(TIMESTAMP) values!), and it took only 0:55 mins (e.g. 55 seconds).

I detected a hidden bug in this SQL code. In a very rare case it can miss a few rows. (So far I found only one such missing row out of my 600K+ test table).

Correct version should be as follows
Code:
SELECT A.HP_ID                                             
      , A.HP_ADDR_TKN                                       
      , A.ENR_PAR_OPT_CD                                     
  FROM T401ANPO                   A ,                       
       (SELECT HP_ID                                         
             , HP_ADDR_TKN                                   
             , ENR_PAR_OPT_CD                               
             , MAX(ANW_TSP_GRP) AS MAX_TSP                   
          FROM T401ANPO                                     
         WHERE ENR_PAR_OPT_CAN_DT >  CURRENT DATE           
           AND ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT     
        GROUP BY HP_ID                                       
               , HP_ADDR_TKN                                 
               , ENR_PAR_OPT_CD ) B                         
  WHERE A.HP_ID          = B.HP_ID                           
    AND A.HP_ADDR_TKN    = B.HP_ADDR_TKN                     
    AND A.ENR_PAR_OPT_CD = B.ENR_PAR_OPT_CD                 
    AND A.ANW_TSP_GRP    = B.MAX_TSP                         
    AND A.ENR_PAR_OPT_EFF_DT <> A.ENR_PAR_OPT_CAN_DT     
  ORDER BY A.HP_ID                                           
         , A.ENR_PAR_OPT_CD; 
Back to top
View user's profile Send private message
Rohit Umarjikar

Senior Member


Joined: 21 Sep 2010
Posts: 2244
Location: NY,USA

PostPosted: Fri Aug 23, 2019 11:45 pm    Post subject:
Reply with quote

Quote:
ENR_PAR_OPT_CAN_DT > CURRENT DATE
You need to add condition as well to your outer SELECT, it is missing right now in the correct version.
Quote:
It really runs about 3-4 seconds with my database
That's awesome.
Quote:
Option3 - there are syntax errors:
My bad
Thanks for running them.
Quote:
The reason of bad performance is obvious without any RUNSTAT:
RUNSTAT and REORG is done once a week or twice if table has INSERTs and DELETEs everyday. So the table that you have might have both done on time and TS don't. Also the EXPLAIN in each of the query in your case may not be same in TS case. To know the exact savings TS must run at their site and come back to us with which works.. If TS care about duplicates timestamp then probably I need to twist a bit and post back but will only do if TS really cares about that.
Back to top
View user's profile Send private message
sergeyken

Active Member


Joined: 29 Apr 2008
Posts: 541
Location: Maryland

PostPosted: Mon Aug 26, 2019 8:57 pm    Post subject:
Reply with quote

Rohit Umarjikar wrote:
Quote:
ENR_PAR_OPT_CAN_DT > CURRENT DATE
You need to add condition as well to your outer SELECT, it is missing right now in the correct version.

This is not technical/programming question, but rather a business rule question.

The answer is: rather no than yes, - depending on the assumed business rules implemented by this query.

This extra condition is not used in the initial subquery provided by TS.
E.g. the MAX value is chosen by the inner subquery from a larger subset of rows. So, in some cases adding this extra condition to the inner subquery would change the MAX value found by it to some lesser value.

1) When the condition
ENR_PAR_OPT_CAN_DT > CURRENT DATE
is used only in the outer query, there is a chance that the MAX value returned by the inner query does not belong to the more limited subset of rows found so far by the outer query, and those rows will be excluded from the final selection.

2) When the condition
ENR_PAR_OPT_CAN_DT > CURRENT DATE
is used in both inner, and outer queries, this situation cannot happen: the same subset of rows is used to find the MAX value, and to match rows from outer query.

So the final selection when this condition is used in both queries in general may be larger, compared to the same selection in only outer query.

BTW: the same problem may happen with another extra condition,
ENR_PAR_OPT_EFF_DT <> ENR_PAR_OPT_CAN_DT

Most likely, they both must be used only in the outer query, to perform final selection.
But again: this in fact depends on the business rules to be implemented - this is a question to TS who seems to be not interested in the problem he raised?
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DB2 All times are GMT + 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Query to Truncate value in table bhaskar_kanteti DB2 2 Mon Aug 12, 2019 3:26 pm
No new posts Verify WITH UR present inside program... priyankakir CLIST & REXX 23 Wed Jun 12, 2019 3:37 pm
No new posts Query on updating the control cards. abdulrafi DFSORT/ICETOOL 2 Tue Feb 26, 2019 2:23 pm
This topic is locked: you cannot edit posts or make replies. Query on updating the control cards. abdulrafi JCL & VSAM 2 Mon Feb 25, 2019 7:43 pm
No new posts Build SQL query using SORT balaji81_k DFSORT/ICETOOL 15 Mon Feb 18, 2019 9:56 am

Facebook
Back to Top
 
Job Vacancies | Forum Rules | Bookmarks | Subscriptions | FAQ | Polls | Contact Us