View previous topic :: View next topic
|
Author |
Message |
jasveer singh
New User
Joined: 06 Mar 2006 Posts: 16
|
|
|
|
Hi ,
I have a requirement to send an XML file in UTF-8 encoding to interfacing Unix server. My file contains mixed codepage data , IBM-037 English chars & IBM-930 Katakana DBCS chars ) & i need to use NDM transmission process.
I coded a pgm to create the file in mainframes in UTF-8 format using DISPLAY-OF & NATIONAL-OF function as written below
MOVE FUNCTION NATIONAL-OF(WS-GRPT-DETAILA , 037-ENGCP )
TO WS-GRPN-DETAILA
MOVE FUNCTION DISPLAY-OF(WS-GRPN-DETAILA , UTF-8 )
TO WS-GRPO-DETAILA
& for local language data, i am using
MOVE FUNCTION NATIONAL-OF( WS-XMLT-ADDRLINE1TXT
, 930-LCLCP )
TO WS-XMLN-ADDRLINE1TXT
MOVE FUNCTION DISPLAY-OF(WS-XMLN-ADDRLINE1TXT
, UTF-8 )
TO WS-XMLO-ADDRLINE1TXT
File created in mainframes is unreadable format & i cannot validate the output at my end. When i sent the file to interfacing team, they are receiving eenglish data properly but local langauge data is not coming properly. Here are my queries over this
1 : Is creating a file in mainframes IN UTF-8 encoding the correct way to achieve this ?
2 : Does DISPLAY-OF & NATIONAL-OF function support DBCS chars ,??
3 : How many bytes of output UTF-8 encoded data would be received when i try to convert DBCS data to UTF-8
4 : I heard that UTF-16 is mainframes Unicode codepage, Should i create a file in UTF-16 encoding & mention the SYSOPTS in NDM JCL to convert the data from UTF-16 to UTF-8 ..
Please help me in this.
( Apologies for writing this long story but no one internally seems to know about codepage technique. SO am seeking your support ) |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8696 Location: Dubuque, Iowa, USA
|
|
|
|
1. Since you're trying it & it's not working, probably not.
2. NATIONAL-OF definitely; the manual isn't clear on DISPLAY-OF
3&4: From the COBOL Programming Guide (link at the top of the page):
Quote: |
1.7.6 Processing UTF-8 data
When you need to process UTF-8 data, first convert the data to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.
You need to do two steps to convert ASCII or EBCDIC data to UTF-8:
1. Use the function NATIONAL-OF to convert the ASCII or EBCDIC string to a national (UTF-16) string.
2. Use the function DISPLAY-OF to convert the national string to UTF-8. |
|
|
Back to top |
|
|
jasveer singh
New User
Joined: 06 Mar 2006 Posts: 16
|
|
|
|
It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data.
I could see a significant difference in O/P of NATIONAL-OF function with & without SO SI bytes . ( Not sure why simply including SO SI bytes changed the O/P string altogether )
I was doing one more mistake that i was trying to transmit the file in TEXT mode. While going through various sites , i read that DBCS text has to be transmitted in BINARY mode ( reason was not given ) ..
I tried extracting the file in BINARY mode to my PC & i was able to read the DBCS characters. Again transmission of file thru NDM in BINARY mode worked.
Honestly i am not sure if i should conclude here .. It worked for around 5000 random records ( much more than Production volume )
Can you plaase tell me the reason why it worked ,, This would surely help me understanding the things better . Thank you |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
this is a layman's explanation. If you want really technical stuff, either wait for someone else better versed (Rob Sample/SuperK/etc...) or start reading the documentation:
SO and SI are 'escape characters' - which simply means that what ever process is translating the characters, treat them differently or ignore any translation. It is 'enveloping'. Often there are many layers to a process, each layer having its own responsibility. in order to bypass the 'normal responsibility' of a 'partial-process', escape characters are used, of which SO and SI are one type of escape character.
DBCS are double bytes (two bytes that are to be treated as one). Char (text) transmission treats each byte as a char and translates them as such. Using binary transmission involves no translation which maintains the integrity of packed-decimal fields and in this case DBCS char. |
|
Back to top |
|
|
jasveer singh
New User
Joined: 06 Mar 2006 Posts: 16
|
|
|
|
Thanks ,, Understood the DBCS BINARY rsn ... Awaiting more inputs from experts on NATIONAL-OF function. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8696 Location: Dubuque, Iowa, USA
|
|
|
|
DBCS has to be transferred in binary as it represents an extended character set (2 bytes per character) where every bit has significance. National characters are 2 bytes per character as well, according to the manual, so the SO/SI is probably related to that.
We've not done much with XML documents (yet -- I'm sure it's coming) so I don't really know anywhere near enough about the process. I did read in the COBOL Programming Guide that UTF-8 is easiest handled as UTF-16, and UTF-16 probably needs the SO/SI as part of the double byte character set manipulation. If I find out more, I'll pass it on.
But I'm glad to hear you got it working. |
|
Back to top |
|
|
rakesh1155
New User
Joined: 21 Jan 2009 Posts: 84 Location: India
|
|
|
|
Hi,
I know this is a very old post.
But I have got a similar requirement for which I m gathering information on...
Jasveer mentioned: "It finally worked. I just included Shiftout & Shiftin characters in my input string for NATIONAL-OF function & then passed the o/p of this function to DISPLAY-OF function to get the UTF-8 encoded data. "
Does this mean.. In the WSS declaration of the input string of NATIONAL-OF, you included the SHIFT-IN and SHIFT-OUT variable declaration (like in the manuals) like below:
For eg: The input string is INPUT-STRING
WORKING-STORAGE.
01 INP-STRING.
05 SO PIC X.
05 INPUT-STRING PIC X(500).
05 SI PIC X.
.
.
01 NATIONAL-STRING PIC X(500) USAGE NATIONAL.
.
.
PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE NATIONAL-OF(INPUT-STRING) TO NATIONAL-STRING
MOVE SHIFT-IN TO SI
DISPLAY NATIONAL-STRING.
Is this the correct way of using the SHIFT-IN and SHIFT-OUT ?
-Thanks in advance. |
|
Back to top |
|
|
jasveer singh
New User
Joined: 06 Mar 2006 Posts: 16
|
|
|
|
Hi,
Not sure if you still need this info but below is what you need to do
PROCEDURE DIVISION.
MOVE SHIFT-OUT TO SO
MOVE SHIFT-IN TO SI
MOVE NATIONAL-OF(INPUT-STRING,DBCS Code page ) TO NATIONAL-STRING
That means before you pass the data to NATIONAL-OF function, you need to include SO SI bytes. Also ,you need to feed in the codepage of input data. As i understood, SO SI bytes are used by all functions to identify the DBCS data in the field. In your case, am assuming entire Input data is DBCS. If it's not, conversion would fail.
Thanks,
Jasveer Singh |
|
Back to top |
|
|
|