IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

UTF-8 encoded file


IBM Mainframe Forums -> COBOL Programming
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
jasveer singh

New User


Joined: 06 Mar 2006
Posts: 16

PostPosted: Mon Apr 06, 2009 3:27 pm
Reply with quote

Hi ,

I have a requirement to send an XML file in UTF-8 encoding to interfacing Unix server. My file contains mixed codepage data , IBM-037 English chars & IBM-930 Katakana DBCS chars ) & i need to use NDM transmission process.

I coded a pgm to create the file in mainframes in UTF-8 format using DISPLAY-OF & NATIONAL-OF function as written below

MOVE FUNCTION NATIONAL-OF(WS-GRPT-DETAILA , 037-ENGCP )
TO WS-GRPN-DETAILA
MOVE FUNCTION DISPLAY-OF(WS-GRPN-DETAILA , UTF-8 )
TO WS-GRPO-DETAILA

& for local language data, i am using

MOVE FUNCTION NATIONAL-OF( WS-XMLT-ADDRLINE1TXT
, 930-LCLCP )
TO WS-XMLN-ADDRLINE1TXT
MOVE FUNCTION DISPLAY-OF(WS-XMLN-ADDRLINE1TXT
, UTF-8 )
TO WS-XMLO-ADDRLINE1TXT

File created in mainframes is unreadable format & i cannot validate the output at my end. When i sent the file to interfacing team, they are receiving eenglish data properly but local langauge data is not coming properly. Here are my queries over this

1 : Is creating a file in mainframes IN UTF-8 encoding the correct way to achieve this ?
2 : Does DISPLAY-OF & NATIONAL-OF function support DBCS chars ,??
3 : How many bytes of output UTF-8 encoded data would be received when i try to convert DBCS data to UTF-8
4 : I heard that UTF-16 is mainframes Unicode codepage, Should i create a file in UTF-16 encoding & mention the SYSOPTS in NDM JCL to convert the data from UTF-16 to UTF-8 ..

Please help me in this.

( Apologies for writing this long story but no one internally seems to know about codepage technique. SO am seeking your support )
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Mon Apr 06, 2009 5:25 pm
Reply with quote

1. Since you're trying it & it's not working, probably not.
2. NATIONAL-OF definitely; the manual isn't clear on DISPLAY-OF
3&4: From the COBOL Programming Guide (link at the top of the page):
Quote:
1.7.6 Processing UTF-8 data

When you need to process UTF-8 data, first convert the data to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.

You need to do two steps to convert ASCII or EBCDIC data to UTF-8:

1. Use the function NATIONAL-OF to convert the ASCII or EBCDIC string to a national (UTF-16) string.

2. Use the function DISPLAY-OF to convert the national string to UTF-8.
Back to top
View user's profile Send private message
jasveer singh

New User


Joined: 06 Mar 2006
Posts: 16

PostPosted: Mon Apr 13, 2009 5:23 pm
Reply with quote

It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data.

I could see a significant difference in O/P of NATIONAL-OF function with & without SO SI bytes . ( Not sure why simply including SO SI bytes changed the O/P string altogether )

I was doing one more mistake that i was trying to transmit the file in TEXT mode. While going through various sites , i read that DBCS text has to be transmitted in BINARY mode ( reason was not given ) ..

I tried extracting the file in BINARY mode to my PC & i was able to read the DBCS characters. Again transmission of file thru NDM in BINARY mode worked.

Honestly i am not sure if i should conclude here .. It worked for around 5000 random records ( much more than Production volume )

Can you plaase tell me the reason why it worked ,, This would surely help me understanding the things better . Thank you
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Mon Apr 13, 2009 5:37 pm
Reply with quote

this is a layman's explanation. If you want really technical stuff, either wait for someone else better versed (Rob Sample/SuperK/etc...) or start reading the documentation:

SO and SI are 'escape characters' - which simply means that what ever process is translating the characters, treat them differently or ignore any translation. It is 'enveloping'. Often there are many layers to a process, each layer having its own responsibility. in order to bypass the 'normal responsibility' of a 'partial-process', escape characters are used, of which SO and SI are one type of escape character.


DBCS are double bytes (two bytes that are to be treated as one). Char (text) transmission treats each byte as a char and translates them as such. Using binary transmission involves no translation which maintains the integrity of packed-decimal fields and in this case DBCS char.
Back to top
View user's profile Send private message
jasveer singh

New User


Joined: 06 Mar 2006
Posts: 16

PostPosted: Mon Apr 13, 2009 6:11 pm
Reply with quote

Thanks ,, Understood the DBCS BINARY rsn ... Awaiting more inputs from experts on NATIONAL-OF function.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Mon Apr 13, 2009 6:15 pm
Reply with quote

DBCS has to be transferred in binary as it represents an extended character set (2 bytes per character) where every bit has significance. National characters are 2 bytes per character as well, according to the manual, so the SO/SI is probably related to that.

We've not done much with XML documents (yet -- I'm sure it's coming) so I don't really know anywhere near enough about the process. I did read in the COBOL Programming Guide that UTF-8 is easiest handled as UTF-16, and UTF-16 probably needs the SO/SI as part of the double byte character set manipulation. If I find out more, I'll pass it on.

But I'm glad to hear you got it working.
Back to top
View user's profile Send private message
rakesh1155

New User


Joined: 21 Jan 2009
Posts: 84
Location: India

PostPosted: Thu Jan 06, 2011 5:33 pm
Reply with quote

Hi,

I know this is a very old post.

But I have got a similar requirement for which I m gathering information on...

Jasveer mentioned: "It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data. "

Does this mean.. In the WSS declaration of the input string of NATIONAL-OF, you included the SHIFT-IN and SHIFT-OUT variable declaration (like in the manuals) like below:

For eg: The input string is INPUT-STRING
WORKING-STORAGE.
01 INP-STRING.
05 SO PIC X.
05 INPUT-STRING PIC X(500).
05 SI PIC X.
.
.
01 NATIONAL-STRING PIC X(500) USAGE NATIONAL.
.
.

PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE NATIONAL-OF(INPUT-STRING) TO NATIONAL-STRING
MOVE SHIFT-IN TO SI

DISPLAY NATIONAL-STRING.


Is this the correct way of using the SHIFT-IN and SHIFT-OUT ?


-Thanks in advance.
Back to top
View user's profile Send private message
jasveer singh

New User


Joined: 06 Mar 2006
Posts: 16

PostPosted: Wed Feb 23, 2011 7:35 pm
Reply with quote

Hi,

Not sure if you still need this info but below is what you need to do

PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE SHIFT-IN TO SI
MOVE NATIONAL-OF(INPUT-STRING,DBCS Code page ) TO NATIONAL-STRING

That means before you pass the data to NATIONAL-OF function, you need to include SO SI bytes. Also ,you need to feed in the codepage of input data. As i understood, SO SI bytes are used by all functions to identify the DBCS data in the field. In your case, am assuming entire Input data is DBCS. If it's not, conversion would fail.

Thanks,
Jasveer Singh
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> COBOL Programming

 


Similar Topics
Topic Forum Replies
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 1
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
Search our Forums:

Back to Top