View previous topic :: View next topic
|
Author |
Message |
jebbin
New User
Joined: 13 Nov 2009 Posts: 6 Location: Bangalore
|
|
|
|
I need to remove records which have similar value in a particular position based on the value in another field. For example, if there are two records which have a value of "AAAAAAA" in first 8 bytes, i need to remove the record which has got the greater value in 20th byte. Also i need to update the trailer count depening on how many records are left in the output file. Would this be possible without out sorting the file i.e when the records with values "AAAAAAAA" in first 8 bytes are not grouped together? |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10873 Location: italy
|
|
|
|
if the unordered sequence is so important not to disturbed by a sort
how will You define which one of the duplicates should be kept ?? |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Because the solution for sort related questions may vary from product to product, please ensure that you state clearly which sort product you are using.
If you are not sure, then by running a simple sort step shown below, you will be able to find out for yourself.
If the messages start with ICE then your product is DFSORT. Please also post the output of the complete line which has a message code ICE201I, as this will enable our DFSORT experts to determine which release of DFSORT that you have installed. This may also affect the solution offered.
If the messages start with WER then the product is SYNCSORT and should be posted in the JCL forum. Please also post the information telling which version of SYNCSORT is installed, as this may also affect the solution offered.
Thank you for taking your time to ensure that the valuable time of others is not wasted by offering inappropriate solutions which are not relevant due to the sort product being used and/or the release that is installed in you site.
Code: |
//SORTSTEP EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
ABC
//SORTOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=COPY |
|
|
Back to top |
|
|
jebbin
New User
Joined: 13 Nov 2009 Posts: 6 Location: Bangalore
|
|
|
|
The file is sorted based on another criteria which is important not to disturb. However if we do find duplicates, I want to keep the first occurrence of the record and delete the rest. |
|
Back to top |
|
|
jebbin
New User
Joined: 13 Nov 2009 Posts: 6 Location: Bangalore
|
|
|
|
The message start with ICE. The line you requested is :
Code: |
ICE201I F RECORD TYPE IS F - DATA STARTS IN POSITION 1
|
|
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
jebbin,
Please show an example of the records in your input file (relevant fields only) and what you expect for output. Explain the "rules" for getting from input to output. Give the starting position, length and format of each relevant field. Give the RECFM and LRECL of the input file. |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
Hi,
If you could show how the file looks like, then I think this can be achieved.
A sample input & output test file,
Quote: |
The file is sorted based on another criteria |
Not a problem
First we can add seq numbers to all records,
then do a sort based on the fields which needs duplicate elimination,
Sort the recs back to the original order based on the seq number added earlier. |
|
Back to top |
|
|
vasanthz
Global Moderator
Joined: 28 Aug 2007 Posts: 1742 Location: Tirupur, India
|
|
|
|
He he he .. U beat me to the reply. with the same comments.
Although Posts: 169 is no where near Posts: 5667 , But
I cant resist saying "Wise men think alike". |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
I always preferred:
Great minds "sink" in the same channels. |
|
Back to top |
|
|
jebbin
New User
Joined: 13 Nov 2009 Posts: 6 Location: Bangalore
|
|
|
|
Hi, following is sample data of my file:
Code: |
00000UHL1091109JBHCF
1000167578932000003896161000000000000200704041101103
1000167578932000001424032000000000000200612011101001
1000567578932000001628561000020061203200612011101103
1000567578932000001722282000020061202200612011101112
1000567578932000001424032000020060406200604061101103
1000567578932000001430882000020060406200604061101005
99999UTL100000006 |
The relevant fields that make the record a duplicate are from position 6 to 25. So it would be SORT FIELDS = (6,20,CH). The first 19 characters of these is alphanumeric and the last field is numeric. But for identifying duplicates, they can be considered together as CH. The first occureence of a record that has a duplicate should be kept and the rest deleted. Header can be identified either with first 5 char = 00000 or next 4 = UHL1. And trailer with first 5 char = 99999 or next 4 = UTL1. The RECFM = FB and LRECL = 60. Please let me know if you need any further information. |
|
Back to top |
|
|
Frank Yaeger
DFSORT Developer
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
Please show what you expect for the output records. |
|
Back to top |
|
|
|