Hi!
This is my first post in the forum. I am working on a SORT card in which i am reading records from 2 input files and joining them on a 15 byte key. But the issue is, in both the files few records have duplicate key. So due to these duplicate keys it is creating cartesian product of records in output file. My requirement is to write one to one matching record. I am using DFSORT and for this i have searched in the forum that SORTED,NOSEQCK should be used to accomplish my requirement. But even after using this option in my SORT i am not getting the expected output. I can really use some help here as i need to deliver this tomorrow. Please let me know what i am missing here. Many Thanks !
In case you have more than one record for a key in file1 and only one matching key/record in file2, then only one record would be extracted from the input, the other records would be skipped.
The sort card you provided has worked and i am getting the exact results what i was expecting. I will post the results shortly. Thank you very much for this
I want to understand the meaning of what you coded :
JOINKEYS FILES=F1,FIELDS=(01,15,A,174,4,A),SORTED,NOSEQCK
JOINKEYS FILES=F2,FIELDS=(01,15,A,068,4,A),SORTED,NOSEQCK
Que - In above card for File F1 and File F2, is it sorting on 4 bytes from 174 and 068 ? I am confused about this part. Please guide here.
Que : what does it means WHEN=GROUP in above card and does the PUSH keyword inserting a sequence no. of 4 bytes here ? If yes, then i will be having more than 6 lakhs records in my files so is it correct if i need to increase the length of SEQ from 4 to 6 ?
It is not obvious that this approach will work, because the topic starter did not provide enough information on his requirements.
What is the role of each of joined files in his example?
For instance, both datasets have two records each with the same key field
Code:
2307914558..023
(assuming non-printed bytes '..' are the same? Or not?)
In terms of defined JOIN operation that means that the output file should include (2 * 2) = 4 output records with the same key; every record of F1 is matching with every record of F2. The given example demonstrates exactly the result which is expected by JOIN definition.
In the suggested solution with sequentially numbered records, for the given example the first matching F1 record will match with the first of F2 record, etc., but not clear if this is what the TS wanted from the beginning?
Let's say there are 100 same key records in F1, and 2 same key records in F2; in that case the SEQ= solution will produce 2 of the first matching records, and the rest 98 records of F1 should be silently ignored.
When using the original TS solution, this case should produce (100 * 2) = 200 output records.
Again: from the TS description it is not clear what is the actual task, and/or what is the role of each F1/F2 in his job?
I am using DFSORT and for this i have searched in the forum that SORTED,NOSEQCK should be used to accomplish my requirement.
I would recommend, before you get some basic knowledge about SORT/JOIN/MERGE processing, and used methods, - not to use the NOSEQCK option. Please, postpone even use of the SORTED option - at least until you're able to create yourself (not via copy-paste!) at least primitive, but really working, jobs for DFSORT/SYNCSORT.
Que - In above card for File F1 and File F2, is it sorting on 4 bytes from 174 and 068 ? I am confused about this part. Please guide here.
Poha Eater wrote:
what does it means WHEN=GROUP in above card
Refer DFSORT Application programming guide.
Quote:
does the PUSH keyword inserting a sequence no. of 4 bytes here ? If yes, then i will be having more than 6 lakhs records in my files so is it correct if i need to increase the length of SEQ from 4 to 6 ?
No, Not required, unless if you have more than 10K records having same key value.
I belive even 2 bytes is sufficient, i dont think you will have more than 99 records having same key.
A have to repeat again: it is not clear what is the role of each F1/F2 in the original task? Why multiple duplicate keys can appear in both F1+F2? What to do with duplicated key in each of them?
Eliminating duplicate keys via SUM FIELDS=NONE will create a single matching record produced, but (in general case) created from an unpredicted pair of matching keys.
Matching the pairs of records via use of re-numbered (via SEQ=) groups of records will create matching pairs from sequential duplicate key records, up to the minimum in each of F1, or F2; while the rest of same key records will be ignored.
The TS must clearly explain what his real requirements are, before trying to use (unknown to him) options of SORT control statements. Before clear understanding of the task, no advice from the Forum, and no control statement option would help. Period.
Eliminating duplicate keys via SUM FIELDS=NONE will create a single matching record produced, but (in general case) created from an unpredicted pair of matching keys.
He could use SUM FIELDS on one ds to get what he wants.
Poha Eater, please respond to to clear the air of doubts in the requirements to get it further moving else it can be locked.
Thanks for all your responses. It was really helpful and informative. Let me reiterate my requirement and before that i accept that i didnt mention in my first post that in 2nd file when the key is same for 2 records but in these 2 records data is not same after the key. I did attached the screenshot of my File 2 which is showing different data for each row when there is duplicate key.
My requirement is - I have 2 files. Each files having a key of first 15 byte (starting from column 1 to 15). First file is having record length of 173 and second file has the record length of 251.
Both the files have exact same number of records. Suppose if file 1 has 1000 records then file 2 will also have 1000 records. In both the files, data only on first 15 bytes will be same, which is also key to my JOIN condition here.
In the output file, i have write all the 173 bytes of data from the File 1 and from File 2 i have to write the data of length 39 and starting from 28th byte.
In the File 1, when there is any record which is having duplicate key, the rest of the data in each record is also same whereas in File 2 whenever there is any duplicate key, the data in each record after the key (after the 15th byte) is not same and as per my requirement both files have exact same number of records so i have to join each row of both files and that is why i could not use the SUM FIELDS=NONE here because in File 2 it will remove the second record with same key but the second record have different data after the first 15 bytes from the first record with same key. I have attached the pics of File 2 data earlier. Pasting the data of File 2 below as well to show that it has different data in each row because i dont know how paste the image of file here. I tried using Img button, nothing is happening, i am sorry.
Thank you magesh23586 for providing the resolution. The results are coming exactly as per the attached image from my first post, named as expected output.
In case you have more than one record for a key in file1 and only one matching key/record in file2, then only one record would be extracted from the input, the other records would be skipped.
Hi Magesh,
Due to enhancement in my requirement, now i have more than 1 record for a key in File 1 but only 1 matching record in File 2. So is it possible by any means to still join the one to one record based on same key and also get all the records form file 1 for the same key in output file ?
Now i don't have equal number of records in File 1 and File 2.
Due to enhancement in my requirement, now i have more than 1 record for a key in File 1 but only 1 matching record in File 2. So is it possible by any means to still join the one to one record based on same key and also get all the records form file 1 for the same key in output file ?
Now i don't have equal number of records in File 1 and File 2.
2.1 create your own code to perform the task
2.2 perform test run
2.3 verify the results, and/or error codes
2.4 fix obvious errors
2.5 present to the forum what is really not clear to yourself
Please, do not try to do the above said in opposite order!
Thanks for answering. The requirement is to Join the first record for any key from File1 to the first record of that key from File 2 (2nd File will have only 1 matching key/record) and 1st File can have more than 1 records per key, then those records should also be merged in the output file with the Joined records in the same serial as it was giving in 1st File.
I believe i also need to add the below code as well in SORT as i still have multiple records for 1 key in both files for which i need to do 1 to 1 matching (first record for a key from File 1 should be joined with the first record of File 2 and 2nd record for the same key should only join with the 2nd record of File 2 , there should be no cartesian product of records while joining (as i posted in my first requirement)).
Thanks for answering. The requirement is to Join the first record for any key from File1 to the first record of that key from File 2 (2nd File will have only 1 matching key/record) and 1st File can have more than 1 records per key, then those records should also be merged in the output file with the Joined records in the same serial as it was giving in 1st File.
I believe i also need to add the below code as well in SORT as i still have multiple records for 1 key in both files for which i need to do 1 to 1 matching (first record for a key from File 1 should be joined with the first record of File 2 and 2nd record for the same key should only join with the 2nd record of File 2 , there should be no cartesian product of records while joining (as i posted in my first requirement)).
Thanks a lot for your generous efforts in resolving my query. The below line in my output, it was just for representation purpose to show what values are coming at which column.
The control card which Magesh provided is working fine for my requirement. The only issue i am facing is that both the files are in unsorted order so when i am using the below code, it is only working till it is finding the key in sorted order. When it encounters the record whose key is not according to the sort, it is not joining the records.
I wanted to ask how can i modify the below sort card so that it can join the records when both the files are in unsorted order.