IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Matching files with different length


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 2:34 pm
Reply with quote

Hi all,
i have two input files with different length and number of records.

First file (700.000 records and LRECL=10000)
Second file (25.000 records and LRECL=322)


Example:

File 1:

Keys
111111
222222
333333
444444
555555

File 2:
111111
333333
444444



The output files will be:


File MATCH:
111111
333333
444444


File NOMATCH:
222222
555555


This is my JCL
Code:

//MATCH01 EXEC PGM=SORT                                             
//SYSOUT   DD SYSOUT=*                                             
//SORTWK1  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK2  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK3  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK4  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK5  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK6  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//IN1      DD DISP=SHR,DSN=TNPH.CREDIPX                             
//IN2      DD DISP=SHR,DSN=&&TNGP2855                               
//MATCH    DD DISP=OLD,DSN=TNPH.MATCH                               
//NOMATCH  DD DISP=OLD,DSN=TNPH.NOMATCH                             
//SYSIN DD *                                                       
  JOINKEYS F1=IN1,FIELDS=(36,48,A)                                 
  JOINKEYS F2=IN2,FIELDS=(01,48,A)                                 
  JOIN UNPAIRED,F1                                                 
  REFORMAT FIELDS=(F1:1,10000,?)                                   
  OPTION COPY                                                       
  OUTFIL FNAMES=MATCH,INCLUDE=(10001,1,CH,EQ,C'B'),BUILD=(1,10000) 
  OUTFIL FNAMES=NOMATCH,INCLUDE=(10001,1,CH,EQ,C'1'),BUILD=(1,10000)





Now, the total of record in both output files is greater then the total of the input files. Why? Where is the mistake?

Can you help me?

Thank you
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 2:39 pm
Reply with quote

If in the first file there are 100.000 records, the total of the output files would be 100.000.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 3:18 pm
Reply with quote

I think you'll find you have duplicate keys on one or both input files.

Also, you may want to consider dynamic allocation of SORTWK files.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 3:28 pm
Reply with quote

Please can u give me an example or the solution to resolve my problem.
I don't understand what you say.
Thank you
Back to top
View user's profile Send private message
rinsio

New User


Joined: 16 Feb 2015
Posts: 13
Location: Madrid, Spain

PostPosted: Wed Mar 04, 2015 3:39 pm
Reply with quote

You have in the second file duplicate keys. when both keys match, the output file (match) reflect all the instances of the key.

The resolve the problem depend what you want in the output file.

Regards
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 3:42 pm
Reply with quote

i would like:

File 1:

Keys
111111
222222
333333
444444
555555

File 2:
111111
333333
444444



The output files will be:


File MATCH:
111111
333333
444444


File NOMATCH:
222222
555555
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 4:07 pm
Reply with quote

No. That is what your existing code deals with. You need to show what you want to happen when there are duplicate key values withing either or both of your input files.

You also show your sample data in key order. Is that correct? If so, specify SORTED on the JOINKEYS statements and get rid of the SORTWKn files altogether.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 4:13 pm
Reply with quote

I don't want duplicate key.
If i have 1000 records in the FILE 1 (master file), the output files will be:

File Match: 300
File No Match: 700

Total records both output files: 1000 (like input FILE 1)


p.s.

I don't want to use SORT with SUM FIELDS=NONE
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 4:41 pm
Reply with quote

You've avoided the question about whether your input is in sequence.

If you have to SORT your input data (it happens by default for each JOINKEYS) then what do you have against SUM FIELDS=NONE?

If you don't have to SORT (so you specify SORTED on the JOINKEYS) then you can use SEQNUM with RESTART= for the key, and have INCLUDE= on your first OUTFILs to just get the first record of each key.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 4:57 pm
Reply with quote

I think that both input files are not sorted by key-field
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 5:10 pm
Reply with quote

So your sample data should represent that. Unsorted, both files, duplicates possible (both files?) and then the output you require.

If you need to SORT for JOINKEYS, easiest thing to do to get rid of duplicate keys is SUM FIELDS=NONE. That would be in a JNFnCNTL dataset.

Then the rest of the code does not need to change. Although you could make a change, to use OUTFIL SAVE on the second OUTFIL, so all data not on the first OUTFIL would appear on the second.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 5:11 pm
Reply with quote

Is there another way to do the same thing (another kind of sort) without duplicate records in output files?
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 5:26 pm
Reply with quote

Maybe I don't explain the problem or i don't understand what you say (I'm sorry)

With my jcl i have about 725.000 (700.000 in the file-1 (master file) and 25.000 in the second file)

Now, the sum of the records of the output files are about 750.000. I'm expect 725.000.

So, i can sort both file by key-field but I think that the i wuold have 750.000 and not 725.000 records.

I think the this kind of sort-match doesn't need another step of SORT with SUM FIELDS=NONE.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 7:04 pm
Reply with quote

Code:
 JOIN UNPAIRED,F1


That bit says "give me all the matched records, and all the unmatched records from F1 as well".

If you have no duplicate keys on F1 and no duplicate keys on F2, you will have, combined, on your two OUTFILs, exactly the same number of records as on F1 (they must be either matched, or unmatched).

With the figures of 700,000 + 25,000 you should get 700,000 records.

If you expect 725,000 records, then you need to explain why.

The only way you will get more than 700,000 records with what you have shown is if there are duplicate keys on F1 or F2 and a match between the files on some duplicate keys.

Code:
File 1:

111111
222222
333333
555555
444444
555555
111111


File 2:
333333
111111
444444


If you use that data you will get:

Code:
111111
111111
333333
444444

222222
555555
555555


Seven records, from your input of six, because the 111111 is duplicate and matched.

There is no need for an extra step, you just need a //JNFnCNTL dataset to include the SUM FIELDS=NONE for the data you have to SORT anyway (in the JOINKEYS).

Which SORT product do you use? And which version? It should say in the sysout from a SORT step.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 7:23 pm
Reply with quote

Sorry, i'm expecting the total of the output files 700.000 records not 725.000
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 7:32 pm
Reply with quote

Run your SORT with the sample input I've shown. How many records do you get?

Add this to your step:

Code:
//JNF1CNTL DD *
   SUM FIELDS=NONE


Run it again. How many records?

If you use ICETOOL's SELECT operator on your files you can easily report on duplicate keys. If they are there and you think they shouldn't be, you need to find out why.
Back to top
View user's profile Send private message
danylele74

New User


Joined: 03 Jul 2014
Posts: 28
Location: Italy

PostPosted: Wed Mar 04, 2015 9:30 pm
Reply with quote

it's doesn't work


File1 (master) 2.396.705
File2 24.844


total records output (matched and no-matched): 106802

This is the jcl:

Code:

//MATCHX01 EXEC PGM=SORT                                             
//SYSOUT   DD SYSOUT=*                                               
//SORTWK1  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK2  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK3  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK4  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK5  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//SORTWK6  DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)               
//IN1      DD DISP=SHR,DSN=TNPH.CREDIP1X                             
//IN2      DD DISP=SHR,DSN=&&TNGP2855                               
//MATCHX   DD DISP=OLD,DSN=TNPH.MATCHX                               
//NOMATCHX DD DISP=OLD,DSN=TNPH.NOMATCHX                             
//SYSIN DD *                                                         
  JOINKEYS F1=IN1,FIELDS=(36,48,A)                                   
  JOINKEYS F2=IN2,FIELDS=(01,48,A)                                   
  JOIN UNPAIRED,F1                                                   
  REFORMAT FIELDS=(F1:1,10000,?)                                     
  OPTION COPY                                                       
  OUTFIL FNAMES=MATCHX,INCLUDE=(10001,1,CH,EQ,C'B'),BUILD=(1,10000) 
  OUTFIL FNAMES=NOMATCHX,INCLUDE=(10001,1,CH,EQ,C'1'),BUILD=(1,10000)
//JNF1CNTL DD *                                                     
   SUM FIELDS=NONE
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Mar 04, 2015 9:33 pm
Reply with quote

Please paste the sysout from the step. It worked.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Store the data for fixed length COBOL Programming 1
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts PARSE Syntax for not fix length word ... JCL & VSAM 7
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts Compare only first records of the fil... SYNCSORT 7
Search our Forums:

Back to Top