Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Count unique values in a column based on primary field.

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
Deepakgoyal2005

New User


Joined: 22 Mar 2007
Posts: 57
Location: India

PostPosted: Fri Jul 25, 2014 5:49 pm    Post subject: Count unique values in a column based on primary field.
Reply with quote

I need help in figuring out if JCL could help to achieve following requirement,

Input: (4 columns sorted on field-1)
Code:
111 1 AAA AAA
111 2 BBB 0
111 2 AAA CCC
111 2 AAA CCC
111 2 0   0
111 3 AAA BBB
222 1 AAA AAA
222 2 BBB 0
222 2 AAA BBB
222 2 DDD CCC


Logic:
For unique value of Field-1, if the value of field-2 = 2, count the unique values of field-3 and field-4 (ignoring 0).

Required Output:
Code:
111 2 1
222 3 2


My last call was to create a Cobol code with a logic that, with every read, fill an array for field-3 and field-4 with unique record and when field-1 changes, get the array size. But since my input file has millions of records, doing a search for each record doesn't seems to be a good idea. I just hope if there is something in JCL which could be more suitable.

Let me know if I had missed any important information. Your suggestion could be very helpful. Thanks.
Back to top
View user's profile Send private message

Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7236

PostPosted: Fri Jul 25, 2014 6:44 pm    Post subject: Reply to: Count unique values in a column based on primary f
Reply with quote

Well, assuming that by "JCL" you mean by "SORT", I'd consider sticking to the COBOL if the task is to be carried out often (daily, weekly).

I can't see that the COBOL code should be too complex or time-consuming.
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7236

PostPosted: Fri Jul 25, 2014 8:08 pm    Post subject: Reply to: Count unique values in a column based on primary f
Reply with quote

The reason for that is that it will require the file to be sorted, twice, when for your COBOL program the order is already adequate.

You can use this to check the output from the COBOL program during testing. This will only work if your SORT package supports JNFnCTNL files. Give it a whirl with the data you showed.

Code:
  OPTION COPY
  JOINKEYS F1=INA,FIELDS=(1,3,A,4,3,A)
  JOINKEYS F2=INB,FIELDS=(1,3,A,4,3,A)
  JOIN UNPAIRED
  REFORMAT FIELDS=(F1:1,3,4,3,F2:1,3,4,3)
                                                           
  INREC IFTHEN=(WHEN=INIT,
                    OVERLAY=(13:C'00')),
        IFTHEN=(WHEN=(4,3,CH,NE,C'   ',
                     AND,
                      10,3,CH,NE,C'   '),
                    OVERLAY=(13:C'11')),
        IFTHEN=(WHEN=(4,3,CH,NE,C'   '),
                    OVERLAY=(13:C'1')),
        IFTHEN=(WHEN=(10,3,CH,NE,C'   '),
                    OVERLAY=(1:7,3,14:C'1'))
  OUTFIL REMOVECC,
         NODETAIL,
         SECTIONS=(1,3,
                   TRAILER3=(1,3,
                        TOT=(13,1,ZD,TO=ZD,LENGTH=2),
                        TOT=(14,1,ZD,TO=ZD,LENGTH=2)))
                                                           
//JNF1CNTL DD *
 INCLUDE COND=(5,1,CH,EQ,C'2',
              AND,
               7,3,CH,NE,C'0  ')
 INREC BUILD=(1,3,7,3)
 SUM FIELDS=NONE
                                                           
//JNF2CNTL DD *
 INCLUDE COND=(5,1,CH,EQ,C'2',
              AND,
               11,3,CH,NE,C'0  ')
 INREC BUILD=(1,3,11,3)
 SUM FIELDS=NONE



Specify the same input file for INA and INB (can change those names if you want).

The JNF1CNTL drops the non-2s and the zero values, does a bit of formatting and drops duplicates based on the key and the third value.

The JNF2CNTL does the same, but using the fourth value, not the third.

The JOINKEYS matches the records. We're not too worried about how.

The IFTHENs set up a count field, which will have zero or one depending on absence/presence of data.

OUTFIL reporting features are then use to total-up the ones for each key (as defined in the SECTIONS).

Output is:

Code:
1110201
2220302


Which is a little rough, but you are only using it for testing/once-off and you can knock the format about yourself.
Back to top
View user's profile Send private message
Deepakgoyal2005

New User


Joined: 22 Mar 2007
Posts: 57
Location: India

PostPosted: Mon Jul 28, 2014 7:07 pm    Post subject:
Reply with quote

Thank you so much for your valuable time. As suggested by you I would opt to use Cobol.

I would be using 2 arrays for storing unique values of field-3 and field-4 for every particular value of field-1. At every change in field-1, write the array count in output file and initialize the array. For this the code would need to search array for every record to check if it there in array or not and update array if it is new entry. Probably doing a search for millions of input record into array which could have length of around a million would take time. So was just looking for any better options. With your suggestion I am no more confused now. Thanks icon_smile.gif

The test JCL provided by you would be of great help. Thanks again.
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7236

PostPosted: Mon Jul 28, 2014 7:15 pm    Post subject: Reply to: Count unique values in a column based on primary f
Reply with quote

Unless you are badly misrepresenting your data, the task is routine.

With records of type you want and not zero, stick the two values into two separate (COBOL internal) tables.

To do the sticking, you see if the value already exists. If it doesn't you add an entry for it.

On change of key, number of entries in your two tables gives you the answer you want.

I can see nothing complicated or in any way time-consuming (beyond what is needed) in that outline. No sorting needed, no fancy techniques, just plain stuff.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Removing Duplicates based on certain ... chandracdac DFSORT/ICETOOL 8 Fri Dec 09, 2016 4:40 am
No new posts Performing arithmetic on input field zh_lad DFSORT/ICETOOL 24 Tue Dec 06, 2016 8:04 pm
No new posts Add PD field from 2nd file to PD in 1st Sushant Garje DFSORT/ICETOOL 6 Thu Dec 01, 2016 4:32 pm
No new posts SQZ pipe delimited file with 1 column... Rick Silvers DFSORT/ICETOOL 2 Sun Nov 06, 2016 8:11 pm
No new posts Overlaying one set of charater values... Kevin Lindsley SYNCSORT 7 Sat Nov 05, 2016 3:21 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us