|
View previous topic :: View next topic
|
| Author |
Message |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
Hi all,
i have two input files with different length and number of records.
First file (700.000 records and LRECL=10000)
Second file (25.000 records and LRECL=322)
Example:
File 1:
Keys
111111
222222
333333
444444
555555
File 2:
111111
333333
444444
The output files will be:
File MATCH:
111111
333333
444444
File NOMATCH:
222222
555555
This is my JCL
| Code: |
//MATCH01 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTWK1 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK2 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK3 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK4 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK5 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK6 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//IN1 DD DISP=SHR,DSN=TNPH.CREDIPX
//IN2 DD DISP=SHR,DSN=&&TNGP2855
//MATCH DD DISP=OLD,DSN=TNPH.MATCH
//NOMATCH DD DISP=OLD,DSN=TNPH.NOMATCH
//SYSIN DD *
JOINKEYS F1=IN1,FIELDS=(36,48,A)
JOINKEYS F2=IN2,FIELDS=(01,48,A)
JOIN UNPAIRED,F1
REFORMAT FIELDS=(F1:1,10000,?)
OPTION COPY
OUTFIL FNAMES=MATCH,INCLUDE=(10001,1,CH,EQ,C'B'),BUILD=(1,10000)
OUTFIL FNAMES=NOMATCH,INCLUDE=(10001,1,CH,EQ,C'1'),BUILD=(1,10000)
|
Now, the total of record in both output files is greater then the total of the input files. Why? Where is the mistake?
Can you help me?
Thank you |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
| If in the first file there are 100.000 records, the total of the output files would be 100.000. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I think you'll find you have duplicate keys on one or both input files.
Also, you may want to consider dynamic allocation of SORTWK files. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
Please can u give me an example or the solution to resolve my problem.
I don't understand what you say.
Thank you |
|
| Back to top |
|
 |
rinsio
New User
Joined: 16 Feb 2015 Posts: 13 Location: Madrid, Spain
|
|
|
|
You have in the second file duplicate keys. when both keys match, the output file (match) reflect all the instances of the key.
The resolve the problem depend what you want in the output file.
Regards |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
i would like:
File 1:
Keys
111111
222222
333333
444444
555555
File 2:
111111
333333
444444
The output files will be:
File MATCH:
111111
333333
444444
File NOMATCH:
222222
555555 |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
No. That is what your existing code deals with. You need to show what you want to happen when there are duplicate key values withing either or both of your input files.
You also show your sample data in key order. Is that correct? If so, specify SORTED on the JOINKEYS statements and get rid of the SORTWKn files altogether. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
I don't want duplicate key.
If i have 1000 records in the FILE 1 (master file), the output files will be:
File Match: 300
File No Match: 700
Total records both output files: 1000 (like input FILE 1)
p.s.
I don't want to use SORT with SUM FIELDS=NONE |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
You've avoided the question about whether your input is in sequence.
If you have to SORT your input data (it happens by default for each JOINKEYS) then what do you have against SUM FIELDS=NONE?
If you don't have to SORT (so you specify SORTED on the JOINKEYS) then you can use SEQNUM with RESTART= for the key, and have INCLUDE= on your first OUTFILs to just get the first record of each key. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
| I think that both input files are not sorted by key-field |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
So your sample data should represent that. Unsorted, both files, duplicates possible (both files?) and then the output you require.
If you need to SORT for JOINKEYS, easiest thing to do to get rid of duplicate keys is SUM FIELDS=NONE. That would be in a JNFnCNTL dataset.
Then the rest of the code does not need to change. Although you could make a change, to use OUTFIL SAVE on the second OUTFIL, so all data not on the first OUTFIL would appear on the second. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
| Is there another way to do the same thing (another kind of sort) without duplicate records in output files? |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
Maybe I don't explain the problem or i don't understand what you say (I'm sorry)
With my jcl i have about 725.000 (700.000 in the file-1 (master file) and 25.000 in the second file)
Now, the sum of the records of the output files are about 750.000. I'm expect 725.000.
So, i can sort both file by key-field but I think that the i wuold have 750.000 and not 725.000 records.
I think the this kind of sort-match doesn't need another step of SORT with SUM FIELDS=NONE. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
That bit says "give me all the matched records, and all the unmatched records from F1 as well".
If you have no duplicate keys on F1 and no duplicate keys on F2, you will have, combined, on your two OUTFILs, exactly the same number of records as on F1 (they must be either matched, or unmatched).
With the figures of 700,000 + 25,000 you should get 700,000 records.
If you expect 725,000 records, then you need to explain why.
The only way you will get more than 700,000 records with what you have shown is if there are duplicate keys on F1 or F2 and a match between the files on some duplicate keys.
| Code: |
File 1:
111111
222222
333333
555555
444444
555555
111111
File 2:
333333
111111
444444 |
If you use that data you will get:
| Code: |
111111
111111
333333
444444
222222
555555
555555 |
Seven records, from your input of six, because the 111111 is duplicate and matched.
There is no need for an extra step, you just need a //JNFnCNTL dataset to include the SUM FIELDS=NONE for the data you have to SORT anyway (in the JOINKEYS).
Which SORT product do you use? And which version? It should say in the sysout from a SORT step. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
| Sorry, i'm expecting the total of the output files 700.000 records not 725.000 |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Run your SORT with the sample input I've shown. How many records do you get?
Add this to your step:
| Code: |
//JNF1CNTL DD *
SUM FIELDS=NONE |
Run it again. How many records?
If you use ICETOOL's SELECT operator on your files you can easily report on duplicate keys. If they are there and you think they shouldn't be, you need to find out why. |
|
| Back to top |
|
 |
danylele74
New User
Joined: 03 Jul 2014 Posts: 28 Location: Italy
|
|
|
|
it's doesn't work
File1 (master) 2.396.705
File2 24.844
total records output (matched and no-matched): 106802
This is the jcl:
| Code: |
//MATCHX01 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTWK1 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK2 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK3 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK4 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK5 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//SORTWK6 DD UNIT=SYSDA,SPACE=(CYL,(1000,1000),RLSE)
//IN1 DD DISP=SHR,DSN=TNPH.CREDIP1X
//IN2 DD DISP=SHR,DSN=&&TNGP2855
//MATCHX DD DISP=OLD,DSN=TNPH.MATCHX
//NOMATCHX DD DISP=OLD,DSN=TNPH.NOMATCHX
//SYSIN DD *
JOINKEYS F1=IN1,FIELDS=(36,48,A)
JOINKEYS F2=IN2,FIELDS=(01,48,A)
JOIN UNPAIRED,F1
REFORMAT FIELDS=(F1:1,10000,?)
OPTION COPY
OUTFIL FNAMES=MATCHX,INCLUDE=(10001,1,CH,EQ,C'B'),BUILD=(1,10000)
OUTFIL FNAMES=NOMATCHX,INCLUDE=(10001,1,CH,EQ,C'1'),BUILD=(1,10000)
//JNF1CNTL DD *
SUM FIELDS=NONE
|
|
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
| Please paste the sysout from the step. It worked. |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|