Join records from 2 files with No Duplicates using DFSORT

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi!
This is my first post in the forum. I am working on a SORT card in which i am reading records from 2 input files and joining them on a 15 byte key. But the issue is, in both the files few records have duplicate key. So due to these duplicate keys it is creating cartesian product of records in output file. My requirement is to write one to one matching record. I am using DFSORT and for this i have searched in the forum that SORTED,NOSEQCK should be used to accomplish my requirement. But even after using this option in my SORT i am not getting the expected output. I can really use some help here as i need to deliver this tomorrow. Please let me know what i am missing here. Many Thanks !

enrico-sorichetti · Posted: Mon Aug 28, 2017 1:37 am

do not post images, a plain TEXT cut and paste from Your emulator window is more than enough.

the images will be deleted shortly

magesh23586 · Posted: Mon Aug 28, 2017 3:35 am

magesh23586 · Posted: Mon Aug 28, 2017 3:54 am

Assuming both the files are in sorted order.

change the joinkeys as follows

UNTESTED

magesh23586 · Posted: Mon Aug 28, 2017 4:10 am

NOTE

In case you have more than one record for a key in file1 and only one matching key/record in file2, then only one record would be extracted from the input, the other records would be skipped.

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi Enrico,

I will keep this in mind going forward. Thanks !

Hi Magesh,

The sort card you provided has worked and i am getting the exact results what i was expecting. I will post the results shortly. Thank you very much for this

I want to understand the meaning of what you coded :
JOINKEYS FILES=F1,FIELDS=(01,15,A,174,4,A),SORTED,NOSEQCK
JOINKEYS FILES=F2,FIELDS=(01,15,A,068,4,A),SORTED,NOSEQCK

Que - In above card for File F1 and File F2, is it sorting on 4 bytes from 174 and 068 ? I am confused about this part. Please guide here.

//JNF1CNTL DD *
INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,15),PUSH=(174:SEQ=4))
//JNF2CNTL DD *
INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,15),PUSH=(068:SEQ=4))

Que : what does it means WHEN=GROUP in above card and does the PUSH keyword inserting a sequence no. of 4 bytes here ? If yes, then i will be having more than 6 lakhs records in my files so is it correct if i need to increase the length of SEQ from 4 to 6 ?

Nic Clouston · Posted: Mon Aug 28, 2017 3:23 pm

"lakhs" is not understood - please use English terminology.

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi Nic,

I will be having 600,000 records in my file.

Thanks !

sergeyken · Posted: Mon Aug 28, 2017 7:45 pm

sergeyken · Posted: Mon Aug 28, 2017 8:39 pm

magesh23586 · Posted: Mon Aug 28, 2017 8:45 pm

Rohit Umarjikar · Posted: Mon Aug 28, 2017 9:04 pm

Welcome!!

sergeyken · Posted: Mon Aug 28, 2017 9:26 pm

Rohit Umarjikar · Posted: Tue Aug 29, 2017 10:12 pm

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi All,

Thanks for all your responses. It was really helpful and informative. Let me reiterate my requirement and before that i accept that i didnt mention in my first post that in 2nd file when the key is same for 2 records but in these 2 records data is not same after the key. I did attached the screenshot of my File 2 which is showing different data for each row when there is duplicate key.

My requirement is - I have 2 files. Each files having a key of first 15 byte (starting from column 1 to 15). First file is having record length of 173 and second file has the record length of 251.

Both the files have exact same number of records. Suppose if file 1 has 1000 records then file 2 will also have 1000 records. In both the files, data only on first 15 bytes will be same, which is also key to my JOIN condition here.

In the output file, i have write all the 173 bytes of data from the File 1 and from File 2 i have to write the data of length 39 and starting from 28th byte.

In the File 1, when there is any record which is having duplicate key, the rest of the data in each record is also same whereas in File 2 whenever there is any duplicate key, the data in each record after the key (after the 15th byte) is not same and as per my requirement both files have exact same number of records so i have to join each row of both files and that is why i could not use the SUM FIELDS=NONE here because in File 2 it will remove the second record with same key but the second record have different data after the first 15 bytes from the first record with same key. I have attached the pics of File 2 data earlier. Pasting the data of File 2 below as well to show that it has different data in each row because i dont know how paste the image of file here. I tried using Img button, nothing is happening, i am sorry.

2307914558,,,,,,,,,,,,MD,1,2013,-62,,FVT,21117,LNL,20100
2307914558,,,,,,,,,,,,MD,1,2013,-93,,FVT,21117,LNL,8500G

Thank you magesh23586 for providing the resolution. The results are coming exactly as per the attached image from my first post, named as expected output.

Thanks all for your responses !!

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

sergeyken · Posted: Wed Sep 06, 2017 2:24 am

magesh23586 · Posted: Wed Sep 06, 2017 7:57 am

Try adding Join statement after joinkeys.

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi Magesh,

Thanks for answering. The requirement is to Join the first record for any key from File1 to the first record of that key from File 2 (2nd File will have only 1 matching key/record) and 1st File can have more than 1 records per key, then those records should also be merged in the output file with the Joined records in the same serial as it was giving in 1st File.

I believe i also need to add the below code as well in SORT as i still have multiple records for 1 key in both files for which i need to do 1 to 1 matching (first record for a key from File 1 should be joined with the first record of File 2 and 2nd record for the same key should only join with the 2nd record of File 2 , there should be no cartesian product of records while joining (as i posted in my first requirement)).

magesh23586 · Posted: Wed Sep 06, 2017 6:43 pm

sergeyken · Posted: Wed Sep 06, 2017 8:58 pm

With your requirements, the options SORTED,NOSEQCK will not work as you wish.

sergeyken · Posted: Wed Sep 06, 2017 10:06 pm

If you don't care about the final order of records, and/or alignment of your fields/commas, then this approach must work

Poha Eater · New User Joined: 31 Aug 2016 Posts: 74 Location: India

Hi Sergeyken,

Thanks a lot for your generous efforts in resolving my query. The below line in my output, it was just for representation purpose to show what values are coming at which column.