IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Remove Duplicate When position is not known


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Fri Jun 29, 2007 6:48 pm
Reply with quote

Hi,
I am having an input file whose layout is unknown to us - that means we knwo the file format will be like -

AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
AAAAAAAAA,BBBBBBBBBBBBB

But the file is of fixed block. In the file itself it is possible to have the first data of a record can be more than one [As Record 1 and Record 4 are having the same AAAAAAAAA record at the beginning]. Now I want to remove such kind of duplicates and write it down to a new output file. But as I don't know the position of ','.

Can you please give me an idea to do so!!!! Is it possible thru SYNCSORT?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Jun 29, 2007 8:24 pm
Reply with quote

Hello,

You will need to provide a much more thorough description of your input data what you want to occur when your process executes.

Does you input only have 2 fields and those 2 always separated by a comma?

If you had one record with AAAAA,BBBBBBBB and another with AAAA,BBBBBBBB (note the different number of A's) would those be duplicates?

If you had one record with AAAAA,BBDDBBDD and another with AAAAA,XXDDXXDD (same number of A's) would those be duplicates.

Which of the duplicates should be written to the new output file?
Back to top
View user's profile Send private message
Devzee

Active Member


Joined: 20 Jan 2007
Posts: 684
Location: Hollywood

PostPosted: Sat Jun 30, 2007 10:09 am
Reply with quote

Code:
AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAA,BBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
MMMM,SSSSSSSSSSSSSSSSSSSSS


In this example according to you are there 3 duplicate set of records?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 2:14 am
Reply with quote

Hello,

In addition to answering devzee's question, please answer the more inclusive previous questions.

Also, in your original post, the "key" match on the A's is obvious, but which record should be selected/discarded? The one with more B's or the one with less B's?

Please create better sample input data and the output(s) you want when this input is processed.
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 10:20 am
Reply with quote

Hi Dick & Devzee,
Thanx a lot for ur responses. I am also sorry for not to clear my queries properly. However now I am trying to explain the input in proper manner -
I am having the input like -
Code:

AAAAAAAAA,BBBBBBBBBBBBBBBBBBBBBBBB
CCC,DDDDDDDDDDDDD
MMMM,SSSSSSSSSSSSSSSSSSSSS
AAAAAAAAA,BBBBBBBBBBBBB


Obviously the first word(appearing before the first ',') is the main key field. If any duplicate remains in these fields, we have to remove the duplicates having the next word whose length is less. E.g - in my exmaple cited before, my process should pick the first record (not the fourth one) as it is having more length of B's than in fourth record.
Now coming to your query Dick, if there are records like - AAAAA,BBDDBBDD and another with AAAAA,XXDDXXDD; my process can select any one of them-I mean there is no such restriction to chose which records for the case where length of the second word (here - BBDDBBDD and XXDDXXDD) are same.
Another point to be noted is that there can be more than one occurances of ',' in a statement. So
Code:

CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXX

is also possible.

Please let me know what will be suitable process to solve my problem!
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 6:08 pm
Reply with quote

Hello,

I'm still not clear on this part of your requirement
Quote:
Another point to be noted is that there can be more than one occurances of ',' in a statement. So
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXX
is also possible.


If the input had
Code:
CCC,DDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXXXX
and
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX

which should be kept - the one with more D's or the one with the greater length overall?

The more the requirement is clarified, the more i'd lean towards using program code rather than trying to meet the need with sort control statements.

How many records will there be in the input file?
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 7:11 pm
Reply with quote

Hi Dick,
According to your query and options provided,
Quote:

Code:
CCC,DDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXXXXXXX

and
Code:
CCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX


I would like to clear one thing that - The first word i.e CCC is the key and the other fields are like just simple records associated with that key-CCC.
Code:

CCCCCCC,DDDDDDDDDDDDD,XXXXXXXXXXXXXX,XXXXXXX
<-Key-> <-Simple Record                   ->

Now as I said in my last post -
Quote:

In my exmaple cited before, my process should pick the first record (not the fourth one) as it is having more length of B's than in fourth record.

By this I wanted to say that my process will choose that record having the simple record's length larger that means according to ur example, first record will be chosen.
And if the simple record's length are same, chose any one- no issue with that.[/code]
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 7:45 pm
Reply with quote

Hello,

Is there a way to know what is the maximum number of "sets" there can be for any particular "key". For example, might there be 4 sets with CCC as the key - might there be 30?

I believe it is time to switch gears and implement this using code rahter than sort control statements. If it were my requirement, i would be concerned that a newly discovered "rule" would cause the sort control statements to no longer work (if they existed in the first place) and then coding would still be needed.

Also, what is the lrecl/recfm of the input?
Back to top
View user's profile Send private message
amitava
Warnings : 1

Active User


Joined: 30 Oct 2005
Posts: 186
Location: India

PostPosted: Mon Jul 02, 2007 8:10 pm
Reply with quote

Hi Dick,
I know man it is quite tough to implement thru SORT and may be it will not be a stable system then! I have already taken the code based approach. So u can say may be I am trying to be over-smart but nothing like that. I JUST WANT TO KNOW - is there any way to do this kind of processing thru SORT and if yes, how? U can say I am trying to explore SORT to handle these kind of scenarios (If any comes to me in future). Ha ha ha !!! icon_smile.gif

Hey Dick FYI - LRECL - 400, RECFM - FB.

Waitin 2 listen from u all guys! Give me some approach or so to handle this kind of situation !!!!

Dick - Pls dnt mind, Dick! I just want to know ... Dnt take it in other sense.
U know I am a bit icon_mad.gif icon_biggrin.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 02, 2007 8:19 pm
Reply with quote

Hello Amitava,

Not to worry - the more alternatives you have, the better choice you may be able to make.

As far as doing this with Syncsort, you may want to look into what they are going to release (early?) next year - doesn't help just now, but may later when "any comes to me in future". The next major release of the sort and a new version of Synctool with documentation are planned, but i've seen no actual release date so far.

Good luck and we're here when there are questons.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts INCLUDE OMIT COND for Multiple values... DFSORT/ICETOOL 5
No new posts Duplicate transid's declared using CEDA CICS 3
No new posts Sortjoin and Search for a String and ... DFSORT/ICETOOL 1
No new posts Remove leading zeroes SYNCSORT 4
No new posts Duplicate several members of/in one l... JCL & VSAM 7
Search our Forums:

Back to Top