Translating characters with accent mark to regular letters

spainj125 · Posted: Mon Feb 23, 2009 10:37 pm

Having an issue where we are receiving input data into the system with an accent mark above the vowels (circumflex, umlat, grave, acute, etc) giving letters that look like ( â ä à á ã å ç ). In discussing with a couple of coworkers how to translate these to regular characters, we could not think of anything except (checking if not alphabetic) inspecting each character individually for one of the hex values of these symbols. This would use a string statement and a loop. But this would also require 72 different checks as there are 6 different possibilities (all vowels and 'y').

Was checking to see if anyone new of an easier and less tedious way to do this in COBOL?

Thanks..

Bill O'Boyle · Posted: Mon Feb 23, 2009 11:00 pm

You could use an INSPECT CONVERTING specifying the FROM characters as LITERALS as well as the TO characters as LITERALS.

This format of INSPECT would generate a single Assembler TR (Translate) instruction and its efficiency would rival that of native Assembler.

INSPECT REPLACING (regardless) as well as INSPECT CONVERTING (using WS fields as opposed to LITERALS), would cause a call (BALR) to a COBOL run time routine.

Regards,

CICS Guy · Posted: Mon Feb 23, 2009 11:03 pm

A quick pass by sort could clean up the data quickly...
OPTION COPY
ALTSEQ CODE=(0040)
OUTREC FIELDS=(1,80,TRAN=ALTSEQ)

William Thompson · Posted: Mon Feb 23, 2009 11:07 pm

enrico-sorichetti · Posted: Mon Feb 23, 2009 11:08 pm

You are discussing with the wrong people,
are You in a multilingual environment,
in this case it would be bad to loose significance in the strings

rather than a quick and dirty translation of apparently wrong
it would be wiser to understand better the application environment

maybe You are outsourcing for a German/French/Spanish customer
and he certainly would not like to lose perfectly legal german/french/spanish chars

Bill O'Boyle · Posted: Mon Feb 23, 2009 11:18 pm

Enrico raises a legitimate issue regarding the replacement of these letters in a given language.

Do you know the hex-values of these characters? Because (for example) a German letter "ä" might be a X'81' (a lower-case "a") in an English collating sequence.

So, I believe you need to compare the other letters (from different languages and collating sequence), with that of an English collating sequence.

You may find that their English counterparts are the same hex-values.

Regards,

spainj125 · Posted: Mon Feb 23, 2009 11:31 pm

Thanks for the ideas everyone. This will help me greatly.

Bill O'Boyle · Posted: Tue Feb 24, 2009 12:09 am

I just Googled "Ebcdic Collating Sequence in German" and found a translate table, which indicates that a German "ä" is a X'C0' in their collating sequence, whereas, a X'C0' in the English collating sequence is a left bracket ("{").

Regards,