Finding repetead surnames in differents colmuns

blackjack202 · New User Joined: 24 May 2016 Posts: 4 Location: Spain

Hi there,

I have the following problem. I will try to explain it, the idea is to delete those records who have relatives in the same oficce. We know that if the surname is repeat it, we have to remove those records. But the big problem, at least for me, is how can i manage when the surname1 is also in the surname 2 or the other way around?

For example:

Format INPUT ==> OFIC (1 CHAR) - ACC.NUM( 1 CHAR) - SURNAME1 (50 CHAR) - SURNAME2 (50 CHAR) - NAME (50 CHAR)
INPUT:

Rohit Umarjikar · Posted: Tue Jun 07, 2016 8:36 pm

Welcome!
May be, with push and group with SS and IFTHEN conditions.

blackjack202 · New User Joined: 24 May 2016 Posts: 4 Location: Spain

Hi Rohit,

The record with the SURNAME1 is AMARAL, but the second SURNAME is ZAPATA, the same as in the ACC. NUM = 2, who had the SURNAME1 = ZAPATA, and they belong to the same OFIC = 2, thats why are deleted.

BTW, thnks for the welcome ;)

Regards!

Nic Clouston · Posted: Tue Jun 07, 2016 8:41 pm

Because it is Amaral Zapata? and there is a Zapata in surname 1 further down.

blackjack202 · New User Joined: 24 May 2016 Posts: 4 Location: Spain

Hi,

@Nic Clouston: Exactly, is repetead 1 further down ;)

@Rohit: Would you mind give me more 'light' on this? As far as i know, the SS (Sub String i imagine), how can handle in a GROUP? I mean, the SS, i can get the first SURNAME1 and i can read the second column SURNAME2, but when i read the record2 of the SURNAME1, how i can do it?

Not sure if i can explain it very well. Sorry if my question is silly.

I appreciate it the help and the idea ;)

Regards

Bill Woodger · Posted: Tue Jun 07, 2016 10:53 pm

Are you sure you just want to get rid of things dependent on a match of one out of two surnames? I thought many Spanish surnames were common even when not related?

Anyway, to do it, use JOINKEYS specifying the same file for the input, with the two different key-positions (remembering that office must be first).

That still leaves you with knowing internal duplicates within the two "files".

Before going into that, can you confirm about the requirement. It seems a bit silly. It seems far more likely that a common surname in an office is not of a relative, except for distant ones. Why would you want to do this, what's the business reason?

blackjack202 · New User Joined: 24 May 2016 Posts: 4 Location: Spain

Hi Bill,

That is 100% correcto

. In Spain, have the same surname dont made us relatives. The only why, is that confirm when go to bank that this person is relative with another (even do, don't have any sense). But well, the client (the bank) want this report, knowing that the people with the same surname don't made it relative. Here in Spain we say "Donde manda Capitan, no manda Marinero", something like "Where a captain rules, a sailor has no sway"

The requirement, is because, (you will laught) they dont want that we use internat tables in COBOL programs, just in case "we receive a migration of another bank, and the table collapse" (yes, i put the same face as you )

I think i get the idea from the JOINKEYS, i will try to do it in the following days and i will post it again if i got some logical troubles!

Thank you, so much!

& regards

Bill Woodger · Posted: Wed Jun 08, 2016 1:33 am

Great fun, then.

Since you have to SORT the data, into two sequences, extend the records to include a "count" field (at the beginning for variable-length records, at the end for fixed-length records) which is large enough for the maximum possible people dealing with the same office. Make it a factor of 10 larger, just in case there's a bank merger :-)

SUM on the count.

For the JOIN use UNPAIRED,F1,ONLY.

In the Maintask of the JOINKEYS, use INCLUDE= for the count being one, and have a BUILD on INREC to drop off the count you added earlier.

If you then need some other order than first-surname, you can SORT in the Maintask as well.