UTF-8 encoded file

jasveer singh · New User Joined: 06 Mar 2006 Posts: 16

Hi ,

I have a requirement to send an XML file in UTF-8 encoding to interfacing Unix server. My file contains mixed codepage data , IBM-037 English chars & IBM-930 Katakana DBCS chars ) & i need to use NDM transmission process.

I coded a pgm to create the file in mainframes in UTF-8 format using DISPLAY-OF & NATIONAL-OF function as written below

MOVE FUNCTION NATIONAL-OF(WS-GRPT-DETAILA , 037-ENGCP )
TO WS-GRPN-DETAILA
MOVE FUNCTION DISPLAY-OF(WS-GRPN-DETAILA , UTF-8 )
TO WS-GRPO-DETAILA

& for local language data, i am using

MOVE FUNCTION NATIONAL-OF( WS-XMLT-ADDRLINE1TXT
, 930-LCLCP )
TO WS-XMLN-ADDRLINE1TXT
MOVE FUNCTION DISPLAY-OF(WS-XMLN-ADDRLINE1TXT
, UTF-8 )
TO WS-XMLO-ADDRLINE1TXT

File created in mainframes is unreadable format & i cannot validate the output at my end. When i sent the file to interfacing team, they are receiving eenglish data properly but local langauge data is not coming properly. Here are my queries over this

1 : Is creating a file in mainframes IN UTF-8 encoding the correct way to achieve this ?
2 : Does DISPLAY-OF & NATIONAL-OF function support DBCS chars ,??
3 : How many bytes of output UTF-8 encoded data would be received when i try to convert DBCS data to UTF-8
4 : I heard that UTF-16 is mainframes Unicode codepage, Should i create a file in UTF-16 encoding & mention the SYSOPTS in NDM JCL to convert the data from UTF-16 to UTF-8 ..

Please help me in this.

( Apologies for writing this long story but no one internally seems to know about codepage technique. SO am seeking your support )

Robert Sample · Posted: Mon Apr 06, 2009 5:25 pm

1. Since you're trying it & it's not working, probably not.
2. NATIONAL-OF definitely; the manual isn't clear on DISPLAY-OF
3&4: From the COBOL Programming Guide (link at the top of the page):

jasveer singh · New User Joined: 06 Mar 2006 Posts: 16

It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data.

I could see a significant difference in O/P of NATIONAL-OF function with & without SO SI bytes . ( Not sure why simply including SO SI bytes changed the O/P string altogether )

I was doing one more mistake that i was trying to transmit the file in TEXT mode. While going through various sites , i read that DBCS text has to be transmitted in BINARY mode ( reason was not given ) ..

I tried extracting the file in BINARY mode to my PC & i was able to read the DBCS characters. Again transmission of file thru NDM in BINARY mode worked.

Honestly i am not sure if i should conclude here .. It worked for around 5000 random records ( much more than Production volume )

Can you plaase tell me the reason why it worked ,, This would surely help me understanding the things better . Thank you

dbzTHEdinosauer · Posted: Mon Apr 13, 2009 5:37 pm

this is a layman's explanation. If you want really technical stuff, either wait for someone else better versed (Rob Sample/SuperK/etc...) or start reading the documentation:

SO and SI are 'escape characters' - which simply means that what ever process is translating the characters, treat them differently or ignore any translation. It is 'enveloping'. Often there are many layers to a process, each layer having its own responsibility. in order to bypass the 'normal responsibility' of a 'partial-process', escape characters are used, of which SO and SI are one type of escape character.

DBCS are double bytes (two bytes that are to be treated as one). Char (text) transmission treats each byte as a char and translates them as such. Using binary transmission involves no translation which maintains the integrity of packed-decimal fields and in this case DBCS char.

jasveer singh · New User Joined: 06 Mar 2006 Posts: 16

Thanks ,, Understood the DBCS BINARY rsn ... Awaiting more inputs from experts on NATIONAL-OF function.

Robert Sample · Posted: Mon Apr 13, 2009 6:15 pm

DBCS has to be transferred in binary as it represents an extended character set (2 bytes per character) where every bit has significance. National characters are 2 bytes per character as well, according to the manual, so the SO/SI is probably related to that.

We've not done much with XML documents (yet -- I'm sure it's coming) so I don't really know anywhere near enough about the process. I did read in the COBOL Programming Guide that UTF-8 is easiest handled as UTF-16, and UTF-16 probably needs the SO/SI as part of the double byte character set manipulation. If I find out more, I'll pass it on.

But I'm glad to hear you got it working.

rakesh1155 · New User Joined: 21 Jan 2009 Posts: 84 Location: India

Hi,

I know this is a very old post.

But I have got a similar requirement for which I m gathering information on...

Jasveer mentioned: "It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data. "

Does this mean.. In the WSS declaration of the input string of NATIONAL-OF, you included the SHIFT-IN and SHIFT-OUT variable declaration (like in the manuals) like below:

For eg: The input string is INPUT-STRING
WORKING-STORAGE.
01 INP-STRING.
05 SO PIC X.
05 INPUT-STRING PIC X(500).
05 SI PIC X.
.
.
01 NATIONAL-STRING PIC X(500) USAGE NATIONAL.
.
.

PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE NATIONAL-OF(INPUT-STRING) TO NATIONAL-STRING
MOVE SHIFT-IN TO SI

DISPLAY NATIONAL-STRING.

Is this the correct way of using the SHIFT-IN and SHIFT-OUT ?

-Thanks in advance.

jasveer singh · New User Joined: 06 Mar 2006 Posts: 16

Hi,

Not sure if you still need this info but below is what you need to do

PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE SHIFT-IN TO SI
MOVE NATIONAL-OF(INPUT-STRING,DBCS Code page ) TO NATIONAL-STRING

That means before you pass the data to NATIONAL-OF function, you need to include SO SI bytes. Also ,you need to feed in the codepage of input data. As i understood, SO SI bytes are used by all functions to identify the DBCS data in the field. In your case, am assuming entire Input data is DBCS. If it's not, conversion would fail.

Thanks,
Jasveer Singh