IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

2 files comparison , complex match criteria


IBM Mainframe Forums -> COBOL Programming
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Wed May 12, 2010 8:30 am
Reply with quote

I'm unable to get around to finding a concrete solution for this one.

I have a file , say FileA that has a million records, and another File , say FileB that can have 50,000 records. What is happening today is , a table which is subscripted to have 50,000 instances, gets loaded from FileB. The matching happens between FileA and the table.

Now, the match criteria is not a straight forward one that checks for equalities. It is complex , for example like this one :

1. A = B AND
2. (C = D OR C = Spaces) AND
3. (E = F OR E = Spaces) AND
4. G > H AND
5. I < J

Let us say A,C,E,G,I are from FileA and B,D,F,H,J are from the table.

Note that both files are sorted on A,B, C,D,E,G, H, I & J.

If the match criteria is satisfied, then a matching report is generated. An additional field prefixed on the table is updated to Y for a match.

If NOT, thats where my problem is. The program checks to see the following :

1. A < B AND E < 500
2. C < D AND E < 500
3. E < F AND E < 500
4. G < H AND E < 500
5. I > J AND E < 500

if any record matches the above condition, then an unmatched report is generated and the next record on FileA is read & the search process begins again, starting from the 1st occurrence of the table.

On the other hand, if records meet the following condition :

1. A < B AND E > 500
2. C < D AND E > 500
3. E < F AND E > 500
4. G < H AND E > 500
5. I > J AND E > 500

500 is deducted from E (which is from FileA) , and the search starts all over again, from the begining of the table.

If records statisfy :

1. A > B
2. C > D
3. E > F
4. G > H
5. I < J

Then the next element on the table is accessed for searching , and the process continues.

The icing on the cake is , at the end of FileA, the entire table is unloaded, and during the table unload, the match indicator on the Table is checked for value Y. If it does then nothing, if not, another unmatched report is generated.

The real problem in production is that this program runs for 22 hours, and the CPU consumption is 5 hours. We are trying to tune this program by eliminating the use of an intermediate table and using just 2 sequential files to process.

Please suggest on the best way of doing it. My problem is the part where 500 is deducted from a value in FileA and the search process restarting all over again on the table. Is there a way around this ??

Thanks
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed May 12, 2010 9:04 am
Reply with quote

Hello and welcome to the forum,

Suggest you review the code for gross inefficiencies. The 50,000 records are not by chance being read over and over for each record in the main input. . .? This would be a mistake, but could be happening causing most of the lost/wasted time.

Suggest the way the table/array is searched be looked at. Eliminate things as quickly as possible to reduce the number of unneeded compares.

Suggest that maybe multiple tables/arrays be used instead of only one?

It will help someone help you if you get rid of the alphabet soup and post some fieldnames that people can relate to. Also, post some input records, a sample table, and the output wanted when that input data and sample table are processed.

Take heart - some of my current processes read between 10 and 100 million very large records and do considerable array processing and run in only a couple of hours (depending on the system load) icon_smile.gif
Back to top
View user's profile Send private message
Binop B

Active User


Joined: 18 Jun 2009
Posts: 407
Location: Nashville, TN

PostPosted: Wed May 12, 2010 10:03 am
Reply with quote

Hi Deeptha...

First of all got to appreciate the way you have told us the requirement/problem... You have taken the effort to put most of the details here in an ordered way... Including Dick's suggestions will make it perfect... icon_smile.gif

Adding onto Dick's suggestions... my suggestion would be to try and understand the business functionality of this program... Probably this code was written by some amateur long back and once you know the business perspective it might help... icon_wink.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Thu May 13, 2010 4:40 am
Reply with quote

Hello,

If you do not follow up, it is nearly impossible for us to help. . .

Have you determined that the file to build the array is only opened once?

How is the array defined? Searched?
Back to top
View user's profile Send private message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Thu May 13, 2010 6:56 am
Reply with quote

Dick :

You would'nt believe it if I say I am still trying to get my way around solving this. The table like I initially mentioned is built even before the matching routine starts with FileB and it is capable of having 50,000 instances. How it is searched - It is almost like a Perform inside a loop, there is no 'SEARCH' sentence as such. It has a PERFORMs the match routine, and when the criteria is not met with , it then does a GO TO on the same match routine. So several iterations get by for the routine.

Whats making my work tough is, I have this variable, FILEA-LOC-CODE from the driver file, that is checked for being greater than 500, if it is, then 500 is deducted from FILEA-LOC-CODE and the searching process begins all over again on the table (from the 1st element). If the value is less than 500 & it does not even go thru the match, then it is considered as unmatched.

In one of my wild attempts to get somewhere, I tried using couple of more SORT steps before the program runs (ironically noticed the file that loads the table is not sorted). When I tried to do a few date manipulations with SORT, I could achieve a CPU reduction 11%.

Today's mission is to try singling out the matching process on SORT / ICETOOL and let the program handle only the non-matching process. Am keeping my fingers crossed. I was busy the whole of yesterday trying to get somewhere (which I did, but not the the extent I want to), so I did not get a chance to give you file / table structures. I will do so today.

Any help is MOST welcome.

Thanks
Back to top
View user's profile Send private message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Thu May 13, 2010 6:58 am
Reply with quote

dick scherrer wrote:
Hello,

If you do not follow up, it is nearly impossible for us to help. . .

Have you determined that the file to build the array is only opened once?

How is the array defined? Searched?



And yes, both files are only opened once. Thanks to small blessings!
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Thu May 13, 2010 7:52 am
Reply with quote

If your site has STROBE or another performance analysis tool, use it. STROBE can tell you exactly what line(s) of code the program is hitting the most, and which line(s) of code take the most CPU time -- not always the same lines, either.

If you don't have a performance analysis tool available, you can do it yourself. Get the counts for the various conditions, and organize your code to place the most common conditions first. Rewrite the code to reduce the IF statements as much as possible. For example,
Code:
1. A < B AND E < 500
2. C < D AND E < 500
3. E < F AND E < 500
4. G < H AND E < 500
5. I > J AND E < 500
is equivalent to
Code:
IF E < 500
    IF A < B
    OR C < D
    OR E < F
    OR G < H
    OR I > J
        .
        .
        .
    END-IF
END-IF
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Thu May 13, 2010 8:16 am
Reply with quote

Hello,

Quote:
You would'nt believe it if I say I am still trying to get my way around solving this
Ummm, ahhh, welllll - yes, i would icon_smile.gif

I should have been more clear when i mentioned "search" - i was sure that the search or search all were not being used. . .

It will help if you post the array definition and an business explanation of what the code is intended to do.

A question - once an array entry is marked with a Y is there any reason to use this entry again?

I suspect that there is more "table searching" going on than is necessary to accomplish the actual requirement (but i don't believe i understand the requirement yet).
Back to top
View user's profile Send private message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Thu May 13, 2010 3:18 pm
Reply with quote

So here is where I went today , at 3:12 PM IST.

The COBOL program that runs in PROD does a few date manipulations on a few fields on the table : Eg:

IF TABLE-CENTURY = '0' MOVE '20' to WORK-DATE-CC and the match criteria uses the WORK-DATE to match against FIleA's WORK-DATE.

The small progress I made today was to push all these date manipulations into a SORT card, and make sure the COBOL code was free from this. I ran a test on this, with a test input driver file having 50,000 records and these are the stats :

PROD version of the program having 50,000 records in the input

CPU Time : 00:00:27.51
CPU Units : 1,045,690
Elapsed Time : 00:00:31.06

Test version of the program having 50,000 records in the input

CPU Time : 00:00:20.20
CPU Units : 767,773
Elapsed Time : 00:00:26.84

Dont know if I should feel good about this because of the 26.5% of benefit I get in CPU time. I'd like to do better than this...still trying...I give myself a day's time...if nothing, I am throwing in my towel!
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Thu May 13, 2010 4:43 pm
Reply with quote

Quote:
Dont know if I should feel good about this because of the 26.5% of benefit I get in CPU time. I'd like to do better than this...still trying...I give myself a day's time...if nothing, I am throwing in my towel!
Throw in the towel now and stop wasting our time. Performance improvement is usually an iterative process and typically takes days or weeks to accomplish the desired results. You make one change, test it, then make another change, test it, and proceed until performance is acceptable or there's no changes that can help. Devoting a single day to this effort is basically useless.
Back to top
View user's profile Send private message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Thu May 13, 2010 5:40 pm
Reply with quote

Robert :

Wish I had all the time in the world to tune this program. Unfortunately we are caught in a system where deadlines ARE the end of the world. I'm sorry you consider this a waste of your time.

Saying goodbye to this forum.
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Thu May 13, 2010 7:24 pm
Reply with quote

Deeptha wrote:
Saying goodbye to this forum.


talk about shooting the messenger!!
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri May 14, 2010 7:56 am
Reply with quote

Hello,

Quote:
Wish I had all the time in the world to tune this program. Unfortunately we are caught in a system where deadlines ARE the end of the world. I'm sorry you consider this a waste of your time.
If it takes "all the time in the world" if is being approached wrong. . . I suspect that many of us "old guys" deal with deadlines far more severe than your "deadline" for this problem. . . Will the company cease to functoin if this is not tuned by tomorrow?

It is not a waste mof time to fix things that are broken. It is a waste of time to run around in circles. You still have not posted the definition of the array or explained the business requirement/process (as has been requested). It is most unlikely that you have a requirement that no one else has implemented in the past.

If you want someone to help, you have to humor them and provide exactly what is requested even if you don't believe it will help.
Back to top
View user's profile Send private message
Deeptha

New User


Joined: 11 May 2010
Posts: 6
Location: Bangalore

PostPosted: Fri May 14, 2010 8:16 am
Reply with quote

It is unfortunate to see one of the 'Global Moderators' break the rule of respect that is being mentioned on your forum rules - give respect, be patient and help - is that your way of giving respect ? Throw in the towel now and stop wasting our time. I think that comment was premature and definitely by no stretch was it respectful.

I dont see why I should humor any of you.

Thanks for you time.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri May 14, 2010 8:46 am
Reply with quote

Hello,

deeeptha wrote:
I give myself a day's time...if nothing, I am throwing in my towel!


robert wrote:
Throw in the towel now and stop wasting our time. I think that comment was premature and definitely by no stretch was it respectful.


Uh, well, Robert didn't post those words first - you did. . . If you really believe that it is proper to "throw in the towel" if you are not successful in a day, what else would it be but a waste of time. Some things take longer than a day. No matter how badly we want them. If you don't show respect for other people's time and effort why would you expect respect in return?

Quote:
I dont see why I should humor any of you.
Obviously, you do not see why. . . Maybe someday you will.

d
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Sat May 15, 2010 5:56 pm
Reply with quote

Before one can design the process, one must understand the data.
goes without saying,
can't debug or improve the performance of a process,
without understanding the data.

This abstract
A, B, C, D <>= Q, M, J
nonsense
gives no-one a chance to participate fully.

That the TS has taken his ball and gone home is his problem,
but hopefully future TSs will describe their process a little better.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> COBOL Programming

 


Similar Topics
Topic Forum Replies
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Merge two VSAM KSDS files into third ... JCL & VSAM 6
No new posts Joinkeys - 5 output files DFSORT/ICETOOL 7
No new posts How to append a PS file into multiple... JCL & VSAM 3
Search our Forums:

Back to Top