IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Count unique values in a column based on primary field.


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Deepakgoyal2005

New User


Joined: 22 Mar 2007
Posts: 57
Location: India

PostPosted: Fri Jul 25, 2014 5:49 pm
Reply with quote

I need help in figuring out if JCL could help to achieve following requirement,

Input: (4 columns sorted on field-1)
Code:
111 1 AAA AAA
111 2 BBB 0
111 2 AAA CCC
111 2 AAA CCC
111 2 0   0
111 3 AAA BBB
222 1 AAA AAA
222 2 BBB 0
222 2 AAA BBB
222 2 DDD CCC


Logic:
For unique value of Field-1, if the value of field-2 = 2, count the unique values of field-3 and field-4 (ignoring 0).

Required Output:
Code:
111 2 1
222 3 2


My last call was to create a Cobol code with a logic that, with every read, fill an array for field-3 and field-4 with unique record and when field-1 changes, get the array size. But since my input file has millions of records, doing a search for each record doesn't seems to be a good idea. I just hope if there is something in JCL which could be more suitable.

Let me know if I had missed any important information. Your suggestion could be very helpful. Thanks.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Jul 25, 2014 6:44 pm
Reply with quote

Well, assuming that by "JCL" you mean by "SORT", I'd consider sticking to the COBOL if the task is to be carried out often (daily, weekly).

I can't see that the COBOL code should be too complex or time-consuming.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Fri Jul 25, 2014 8:08 pm
Reply with quote

The reason for that is that it will require the file to be sorted, twice, when for your COBOL program the order is already adequate.

You can use this to check the output from the COBOL program during testing. This will only work if your SORT package supports JNFnCTNL files. Give it a whirl with the data you showed.

Code:
  OPTION COPY
  JOINKEYS F1=INA,FIELDS=(1,3,A,4,3,A)
  JOINKEYS F2=INB,FIELDS=(1,3,A,4,3,A)
  JOIN UNPAIRED
  REFORMAT FIELDS=(F1:1,3,4,3,F2:1,3,4,3)
                                                           
  INREC IFTHEN=(WHEN=INIT,
                    OVERLAY=(13:C'00')),
        IFTHEN=(WHEN=(4,3,CH,NE,C'   ',
                     AND,
                      10,3,CH,NE,C'   '),
                    OVERLAY=(13:C'11')),
        IFTHEN=(WHEN=(4,3,CH,NE,C'   '),
                    OVERLAY=(13:C'1')),
        IFTHEN=(WHEN=(10,3,CH,NE,C'   '),
                    OVERLAY=(1:7,3,14:C'1'))
  OUTFIL REMOVECC,
         NODETAIL,
         SECTIONS=(1,3,
                   TRAILER3=(1,3,
                        TOT=(13,1,ZD,TO=ZD,LENGTH=2),
                        TOT=(14,1,ZD,TO=ZD,LENGTH=2)))
                                                           
//JNF1CNTL DD *
 INCLUDE COND=(5,1,CH,EQ,C'2',
              AND,
               7,3,CH,NE,C'0  ')
 INREC BUILD=(1,3,7,3)
 SUM FIELDS=NONE
                                                           
//JNF2CNTL DD *
 INCLUDE COND=(5,1,CH,EQ,C'2',
              AND,
               11,3,CH,NE,C'0  ')
 INREC BUILD=(1,3,11,3)
 SUM FIELDS=NONE



Specify the same input file for INA and INB (can change those names if you want).

The JNF1CNTL drops the non-2s and the zero values, does a bit of formatting and drops duplicates based on the key and the third value.

The JNF2CNTL does the same, but using the fourth value, not the third.

The JOINKEYS matches the records. We're not too worried about how.

The IFTHENs set up a count field, which will have zero or one depending on absence/presence of data.

OUTFIL reporting features are then use to total-up the ones for each key (as defined in the SECTIONS).

Output is:

Code:
1110201
2220302


Which is a little rough, but you are only using it for testing/once-off and you can knock the format about yourself.
Back to top
View user's profile Send private message
Deepakgoyal2005

New User


Joined: 22 Mar 2007
Posts: 57
Location: India

PostPosted: Mon Jul 28, 2014 7:07 pm
Reply with quote

Thank you so much for your valuable time. As suggested by you I would opt to use Cobol.

I would be using 2 arrays for storing unique values of field-3 and field-4 for every particular value of field-1. At every change in field-1, write the array count in output file and initialize the array. For this the code would need to search array for every record to check if it there in array or not and update array if it is new entry. Probably doing a search for millions of input record into array which could have length of around a million would take time. So was just looking for any better options. With your suggestion I am no more confused now. Thanks icon_smile.gif

The test JCL provided by you would be of great help. Thanks again.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Jul 28, 2014 7:15 pm
Reply with quote

Unless you are badly misrepresenting your data, the task is routine.

With records of type you want and not zero, stick the two values into two separate (COBOL internal) tables.

To do the sticking, you see if the value already exists. If it doesn't you add an entry for it.

On change of key, number of entries in your two tables gives you the answer you want.

I can see nothing complicated or in any way time-consuming (beyond what is needed) in that outline. No sorting needed, no fancy techniques, just plain stuff.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts INCLUDE OMIT COND for Multiple values... DFSORT/ICETOOL 5
No new posts To get the count of rows for every 1 ... DB2 3
No new posts Replace Multiple Field values to Othe... DFSORT/ICETOOL 12
No new posts To find whether record count are true... DFSORT/ICETOOL 6
No new posts How to load to DB2 with column level ... DB2 6
Search our Forums:

Back to Top