View previous topic :: View next topic
|
Author |
Message |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Hi. I have a file that has duplicate records only and the requestor wants to remove only the first occurence and leave all the other duplicates. So if there are 3 duplicate records, remove 1 and leave 2.
Can this be done using SYNCSORT? If so, what commands would I need for that?
By the way, my file only has dupes so really just need to drop one dupe records and leave others. Doesn't neccesarily have to be the first one.
Thanks |
|
Back to top |
|
|
Rohit Umarjikar
Global Moderator
Joined: 21 Sep 2010 Posts: 3049 Location: NYC,USA
|
|
|
|
May be you can use the below approach,
1) Take the input file and using Syncsort add numbers at the every last like 1,2,3 and so on for every unique chance of the record
2) And finally add a condition to remove the record which has last number added euqals to 1.
3) By this you will remove first entry of every duplicate.
E.g.
1) As per #1
Code: |
AAAA1234.340001
AAAA1234.340002
AAAA1234.340003
BBBB1234.340001
BBBB1234.340002
BBBB1234.340003
BBBB1234.340004 |
Out put as per #2 above
Code: |
AAAA1234.34
AAAA1234.34
BBBB1234.34
BBBB1234.34
BBBB1234.34 |
However to help you by other experts you need to provide all the necessary details and sample input data otherwise none can be helpful. |
|
Back to top |
|
|
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
That would work well. How do I add such a counter at the end as in your first example?
My data looks like this. Some have just 2 dupes, some have more than 2:
Code: |
0000000024343090800074433902
0000000024343090800074433902
0000000024351661261958120101
0000000024351661261958120101
0000000024352050300074377903
0000000024352050300074377903
0000000024352050300074377903 |
|
|
Back to top |
|
|
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Got it to work using SUM FIELDS=NONE,XSUM |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Good to hear it is working - thank you for letting us know and posting your solution
d |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
What is "working" is not what is described in the question.
This will retain one record with a duplicate key for each key. The record retained depends on EQUALS (the first) or NOEQUALS (can't predict which) and the discarded records will be written to the XSUM DD.
hailashwin has the correct approach. A sequence number with a RESTART= for the key in question, then OUTFIL OMIT=(sequencenumbersisone).
There is your manual, there are examples here. |
|
Back to top |
|
|
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Sorry not following because XSUM did give me what I needed. As I stated in my question it didn't necessarily need to be the first record dropped with others kept. Just needed to drop 1 dupe and keep the rest in a separate file. XSUM achieved that for me.
Yes, the other approach would have worked as well, but XSUM was quicker and easier for me.
Regards |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I see now that your question says both, and doesn't say anything about needing to keep the records which have been dropped.
However, you show one instance where you have three duplicates, so two will be dropped. Doesn't fit the "only one" from any interpretation of your question.
Using SUM with XSUM are you SORTing the records? Were you SORTing them anyway?
I suppose it may take up to a minute to code differently, but you'll save many, many minutes by not having to SORT the file.
To collect together the dropped records by the other method suggested, you'd just need a second OUTFIL with SAVE. There would be no duplicate keys on that file, unlike your XSUM file.
Looking at the sample data you have shown, it is irrelevant which record is dropped, because the duplicates are identical to each other.
If you are happy with XSUM, fine, just don't pretend it satisfied what you asked. |
|
Back to top |
|
|
|