Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Remove Duplicate When position is not known

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Fri Jun 29, 2007 6:48 pm    Post subject: Remove Duplicate When position is not known
Reply with quote

Hi,
I am having an input file whose layout is unknown to us - that means we knwo the file format will be like -

AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
AAAAAAAAA,BBBBBBBBBBBBB

But the file is of fixed block. In the file itself it is possible to have the first data of a record can be more than one [As Record 1 and Record 4 are having the same AAAAAAAAA record at the beginning]. Now I want to remove such kind of duplicates and write it down to a new output file. But as I don't know the position of ','.

Can you please give me an idea to do so!!!! Is it possible thru SYNCSORT?
Back to top
View user's profile Send private message

dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Fri Jun 29, 2007 8:24 pm    Post subject:
Reply with quote

Hello,

You will need to provide a much more thorough description of your input data what you want to occur when your process executes.

Does you input only have 2 fields and those 2 always separated by a comma?

If you had one record with AAAAA,BBBBBBBB and another with AAAA,BBBBBBBB (note the different number of A's) would those be duplicates?

If you had one record with AAAAA,BBDDBBDD and another with AAAAA,XXDDXXDD (same number of A's) would those be duplicates.

Which of the duplicates should be written to the new output file?
Back to top
View user's profile Send private message
Devzee

Active Member


Joined: 20 Jan 2007
Posts: 684
Location: Hollywood

PostPosted: Sat Jun 30, 2007 10:09 am    Post subject:
Reply with quote

Code:
AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAA,BBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
MMMM,SSSSSSSSSSSSSSSSSSSSS


In this example according to you are there 3 duplicate set of records?
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 2:14 am    Post subject:
Reply with quote

Hello,

In addition to answering devzee's question, please answer the more inclusive previous questions.

Also, in your original post, the "key" match on the A's is obvious, but which record should be selected/discarded? The one with more B's or the one with less B's?

Please create better sample input data and the output(s) you want when this input is processed.
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 10:20 am    Post subject:
Reply with quote

Hi Dick & Devzee,
Thanx a lot for ur responses. I am also sorry for not to clear my queries properly. However now I am trying to explain the input in proper manner -
I am having the input like -
Code:

AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
AAAAAAAAA,BBBBBBBBBBBBB


Obviously the first word(appearing before the first ',') is the main key field. If any duplicate remains in these fields, we have to remove the duplicates having the next word whose length is less. E.g - in my exmaple cited before, my process should pick the first record (not the fourth one) as it is having more length of B's than in fourth record.
Now coming to your query Dick, if there are records like - AAAAA,BBDDBBDD and another with AAAAA,XXDDXXDD; my process can select any one of them-I mean there is no such restriction to chose which records for the case where length of the second word (here - BBDDBBDD and XXDDXXDD) are same.
Another point to be noted is that there can be more than one occurances of ',' in a statement. So
Code:

CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXX

is also possible.

Please let me know what will be suitable process to solve my problem!
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 6:08 pm    Post subject:
Reply with quote

Hello,

I'm still not clear on this part of your requirement
Quote:
Another point to be noted is that there can be more than one occurances of ',' in a statement. So
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXX
is also possible.


If the input had
Code:
CCC,DDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXXXX
and
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX

which should be kept - the one with more D's or the one with the greater length overall?

The more the requirement is clarified, the more i'd lean towards using program code rather than trying to meet the need with sort control statements.

How many records will there be in the input file?
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 7:11 pm    Post subject:
Reply with quote

Hi Dick,
According to your query and options provided,
Quote:

Code:
CCC,DDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXXXX

and
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX


I would like to clear one thing that - The first word i.e CCC is the key and the other fields are like just simple records associated with that key-CCC.
Code:

CCCCCCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX
<-Key-> <-Simple Record                   ->

Now as I said in my last post -
Quote:

In my exmaple cited before, my process should pick the first record (not the fourth one) as it is having more length of B's than in fourth record.

By this I wanted to say that my process will choose that record having the simple record's length larger that means according to ur example, first record will be chosen.
And if the simple record's length are same, chose any one- no issue with that.[/code]
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 7:45 pm    Post subject:
Reply with quote

Hello,

Is there a way to know what is the maximum number of "sets" there can be for any particular "key". For example, might there be 4 sets with CCC as the key - might there be 30?

I believe it is time to switch gears and implement this using code rahter than sort control statements. If it were my requirement, i would be concerned that a newly discovered "rule" would cause the sort control statements to no longer work (if they existed in the first place) and then coding would still be needed.

Also, what is the lrecl/recfm of the input?
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 8:10 pm    Post subject:
Reply with quote

Hi Dick,
I know man it is quite tough to implement thru SORT and may be it will not be a stable system then! I have already taken the code based approach. So u can say may be I am trying to be over-smart but nothing like that. I JUST WANT TO KNOW - is there any way to do this kind of processing thru SORT and if yes, how? U can say I am trying to explore SORT to handle these kind of scenarios (If any comes to me in future). Ha ha ha !!! icon_smile.gif

Hey Dick FYI - LRECL - 400, RECFM - FB.

Waitin 2 listen from u all guys! Give me some approach or so to handle this kind of situation !!!!

Dick - Pls dnt mind, Dick! I just want to know ... Dnt take it in other sense.
U know I am a bit icon_mad.gif icon_biggrin.gif
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 8:19 pm    Post subject:
Reply with quote

Hello Amitava,

Not to worry - the more alternatives you have, the better choice you may be able to make.

As far as doing this with Syncsort, you may want to look into what they are going to release (early?) next year - doesn't help just now, but may later when "any comes to me in future". The next major release of the sort and a new version of Synctool with documentation are planned, but i've seen no actual release date so far.

Good luck and we're here when there are questons.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Remove junk values in a file and rite... KP1125 DFSORT/ICETOOL 2 Wed Jan 25, 2017 9:58 pm
No new posts Sort Card to Remove Duplicate records... raj4neo SYNCSORT 2 Wed Jan 25, 2017 4:44 am
No new posts Limit duplicate records in the SORT pshongal SYNCSORT 6 Mon Nov 21, 2016 12:54 pm
No new posts Remove Special Characters from Mainfr... Rodger Zhang All Other Mainframe Topics 6 Wed Jul 06, 2016 1:12 am
No new posts how to find the duplicate list RAVIGUPTA1990 DB2 6 Wed Mar 30, 2016 12:11 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us