IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Upload and replace newline characters from data file


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 6:57 pm
Reply with quote

Can you provide any guidance on how would I do the conversions if I transfer the file as binary?

I think this is an interesting solution, because the record breaking works fine with binary, since the dataset's LRECL is exactly the size of the records (which are fixed-length). I just don't know how to convert to "text" (ascii or ebcdic?) afterwards...
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:02 pm
Reply with quote

You need to load up a translation table of all the hex values in ASCII and the hex value that you want it to match to in EBCDIC. Then run through each binary records doing the conversion. This could be very bad then because you basically need to inspect each character in the binary string and replace it. The processing time and CPU on this probably isn't very pretty.

example
Code:

ASCII  EBCDIC
65     C1
66     C2
67     C3
32     40
etc
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 7:08 pm
Reply with quote

Yeah, and my monthly file size is approximately 2.5 GB...

Even though, let me ask: How would you do this translation script on your environmet? Would you use a cobol program, or are you able to do it using only JCL?

If I am not asking too much, could you provide a code sample for this? (Don't need to work or anything... just a draft for me to use as guidance to start researching)
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:49 pm
Reply with quote

Ummm.. you may be able to do it through a DFSORT with a translation table, but I am not sure. I'm not sure if Frank watches this part of the forum.
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Tue Aug 21, 2007 7:51 pm
Reply with quote

I don't know about anyone else, but I'd like to see an actual real sample of a few lines of the raw ASCII data in a hex dump format. Something doesn't add up.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 7:52 pm
Reply with quote

I'll get it for you, superk... Just a sec...
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 21, 2007 8:41 pm
Reply with quote

Hello Rafael,

As we talked about before, i believe removing the cr/lf on the SQLServer side would be a good way to go.

Here is some SQLServer info that will work for the 2000 and 2005 versions

Code:
CREATE TABLE #T
(
[ID]         INT
,[TestValue]   VARCHAR(MAX)
);

DECLARE @value VARCHAR(MAX);
SET @value = '1' + CHAR(10) + CHAR(8);

INSERT #T (id, testValue) VALUES (1, @value);

SELECT
   *
   --SQL 2005 specific, cannot use text datatype in REPLACE function must be varchar(max)
   ,REPLACE(TestValue, '', '') AS TestValue_REPLACE
   --SQL 2000 eligible, charindex finds the location of the carriage return that could then be used
   --in a combination with substring function
   ,CHARINDEX('', TestValue) AS FirstCRLF
FROM #T;

DROP TABLE #T;


The INSERT and DROP are just for testing - you might use them for your testing, but that is up to you. REPLACE is a feature in 2005 and hopefully, you are running 2005.

Sorry i've been "away" much of the morning and haven't been in the dialog.
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 8:45 pm
Reply with quote

I like Dick's solution, but that changes the data in the underlying table, and might not be feasible.

How about using his solution but defining a temp table just before extraction with a SELECT ..... FROM sourceTable INTO TEMPTABLE then do the extract from the temp table. Or maybe define a view that does the replace for you ahead of time. This has the advantage of offloading the replacement to the MSSQL server....
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Tue Aug 21, 2007 8:50 pm
Reply with quote

Let's see if I have a solution:

Presuming that the original data in ASCII is:

Code:

-d

1792:0100  52 61 66 61 65 6C 20 4C-6F 6E 67 6F 50 72 6F 63   Rafael LongoProc
1792:0110  65 73 73 20 4D 69 73 73-69 6E 67 20 72 65 71 75   ess Missing requ
1792:0120  69 72 65 6D 65 6E 74 73-2E 0D 0A 54 68 65 20 72   irements...The r
1792:0130  65 71 75 69 72 65 6D 65-6E 74 73 20 77 65 72 65   equirements were
1792:0140  20 69 6E 61 63 63 75 72-61 74 65 2E 20 20 20 20    inaccurate.   
1792:0150  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0160  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0170  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
-d

1792:0180  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0190  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01A0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01B0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01C0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01D0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01E0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:01F0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
-d

1792:0200  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0210  20 20 20 20 20 20 20 20-52 61 66 61 65 6C 20 4C           Rafael L
1792:0220  6F 6E 67 6F 4E 6F 6E 2D-50 72 6F 63 65 73 73 53   ongoNon-ProcessS
1792:0230  65 74 75 70 20 69 73 73-75 65 20 20 20 20 20 20   etup issue     
1792:0240  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0250  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0260  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0270  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
-d

1792:0280  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0290  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02A0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02B0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02C0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02D0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02E0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:02F0  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
-d

1792:0300  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0310  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0320  20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20                   
1792:0330  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1792:0340  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1792:0350  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1792:0360  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
1792:0370  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................
-q


and the file is FTP'd binary to a dataset with RECFM=FB LRECL=280, then the data would look like this:

Code:

ê/Ã/Á%.<?>Å?&Ê?ÄÁËË.(ÑËËÑ>Å.ÊÁÉÍÑÊÁ_Á>ÈË...èÇÁ.ÊÁÉÍÑÊÁ_Á>ÈË.ÏÁÊÁ.Ñ>/ÄÄÍÊ/ÈÁ.....
56666624666657666772467766627677676666772005662767767666677276762666667767622222
21615C0CFE7F02F35330D9339E702515925D5E43EDA48502515925D5E430752509E13352145E0000
 ------------------------------------------------------------------------------
ê/Ã/Á%.<?>Å?+?>.&Ê?ÄÁËËëÁÈÍø.ÑËËÍÁ..............................................
56666624666646625766677567772677762222222222222222222222222222222222222222222222
21615C0CFE7FEFED02F3533354500933550000000000000000000000000000000000000000000000


then a simple run through OCOPY:

Code:

//A2E      EXEC PGM=IKJEFT01                                     
//SYSTERM  DD   DUMMY                                           
//SYSTSPRT DD   SYSOUT=*                                         
//INPUT    DD   DSN=... <UPLOADED BINARY ASCII DATA>             
//OUTPUT   DD   DSN=... <CONVERTED EBCDIC DATA>,                 
//         DISP=(,CATLG,DELETE),RECFM=FB,LRECL=280,             
//         UNIT=SYSDA,SPACE=(CYL,(10,10))                       
//SYSTSIN  DD   DATA                                             
OCOPY INDD(INPUT) OUTDD(OUTPUT) TEXT CONVERT((BPXFX311)) TO1047 
/*                                                               


would result in this:

Code:

Rafael LongoProcess Missing requirements...The requirements were inaccurate.   
D888894D9989D9988AA4D8AA8984989A898989AA401E884989A898989AA4A898489888A98A844444
9161530365767963522049229570958499545532BD538509584995455320659509513349135B0000
 ------------------------------------------------------------------------------
Rafael LongoNon-ProcessSetup issue                                             
D888894D9989D996D9988AAE8AA948AAA84444444444444444444444444444444444444444444444
91615303657656507963522253470922450000000000000000000000000000000000000000000000
 ------------------------------------------------------------------------------


at which point I imagine a simple SORT step could easily translate the x'0D15' into blanks or some other desired characters.
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 8:55 pm
Reply with quote

I see OCOPY converted the x'0D0A' (CRLF) to x'0D15'. Can you force a conversion to x'4040'. I didn't look back, but I think the OP said he wants it converted to spaces.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 8:59 pm
Reply with quote

Guys, you will find attached a ascii sample file and the ascii dump of the same file.

Please, take a look...

Only the lines that begin with 'HD000'... are actual new records. So on this sample we should have only 4 records. But the dataset ends up with 17 records.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 9:14 pm
Reply with quote

Guys,

You have helped me a lot... both suggestions (From Dick and Kevin) have brought light to my path... icon_biggrin.gif

I will work with them to see which fits best on the situation...

Dick, don't worry about it.. You helped a lot!

Steve, that's true, I said that I wanted to switch the cr/lfs to spaces but probably with these solutions I can keep them, AND avoid the record breaking... This would be ideal to me...

I'll let you know, guys...

Thanks a lot!!!
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 21, 2007 11:50 pm
Reply with quote

Hello,

The REPLACE that modified the temporary table was only for example.

The REPLACE could be used directly in the EXPORT if the system is using SQLServer 2005.

I agree - changing the data in the "real" table might be more of a problem than a solution.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 28, 2007 8:29 pm
Reply with quote

Hi, guys,

I finally got to work on this issue again today and...

It worked!!

As Kevin suggested I uploaded the file as 'binary' and converted to EBCDIC right before I fastload the data into Teradata.

The record-breaking issue is solved with this, as the records are broken exclusively by the length of the data... I mean, as the lrecl is set to 2660 bytes, the dataset is created breaking the data each 2660 bytes. And that's exactly what I needed.

One lasting issue though. After the conversion, I have one "extra" character (D5) after each line-breaking. Like, in the example below, where I have the hex codes for two line-breaks:

Original data: 0D 0A 0D 0A
Converted data: 0D 0A D5 0D 0A D5

I didn't have the time to investigate in detail why this is happening (I will do it now) but I wanted to share it with you, guys, to see if you have any thoughts on the reason...

Any ideas?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 28, 2007 8:43 pm
Reply with quote

Hello,

The extra character is a D5 (dee-five)? If it were a zero-five, it might make a bit of sense (x'05' is a tab character).

If the records are already 2660 and this extra byte is inserted, is the lrecl changed or is the last byte truncated?

I'm slow today . . . icon_smile.gif
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 28, 2007 9:06 pm
Reply with quote

Yeah, it's a D5 (dee-five) in Ascii it represents 'Õ'. In EBCDIC, if Im not mistaken, it's 'N'.

Can you guys explain to me in detail the ocopy command, like:

OCOPY INDD(INPUT) OUTDD(OUTPUT) TEXT CONVERT((BPXFX311)) TO1047

I understood the input/output part... But I am not so familiar with the TEXT CONVERT part. What is BPXFX311? I suppose it's a translation table or something... But is there a way to see it's contents? Do I have options on this (to use another table or create my own)? What is TO1047?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 28, 2007 9:12 pm
Reply with quote

The reason why I am asking all this is that I think that the blame for the extra character is of the translation table...
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 28, 2007 9:47 pm
Reply with quote

I was wrong. The conversion is perfect. There is no 'D5' extra character in the output dataset.

The extra character is only appearing in the final data fastloaded into the teradata database.

Probably teradata's fastload is not handling the dataset's EBCDIC data correctly or something...

I am looking into it... Meanwhile, any comments/ideas are welcome...
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 28, 2007 11:55 pm
Reply with quote

Hello,

Quote:
it's a D5 (dee-five) in Ascii it represents 'Õ'. In EBCDIC, if Im not mistaken, it's 'N'.
Yes, a D5 is an EBCDIC "N".

How do you get from the "output dataset" to the "final data fastloaded into the teradata database"?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Aug 29, 2007 12:19 am
Reply with quote

I run the following job that invokes Teradata's fastload:

Code:
//P54BAF2 JOB (LOADRMDY),'LOADRMDY',CLASS=3,                           
//         MSGCLASS=X,REGION=4096K,TIME=1339,NOTIFY=P54BAF             
/*JOBPARM S=SYS1                                                       
//JOBLIB   DD DSN=<LIBRARY>,DISP=SHR                         
//PS20     EXEC PGM=FASTLOAD,PARM='NOSPIE/BUFSIZE=31'                   
//SYSPRINT DD SYSOUT=*                                                 
//SYSABEND DD SYSOUT=*                                                 
//SYSOUT   DD SYSOUT=*                                                 
//INFILE   DD DSN=<OUTPUT DATASET>,DISP=SHR                     
//SYSIN    DD *                                                         
                                                                       
SESSIONS 32;                                                           
                                                                       
RECORD 2;                                                               
                                               
.LOGON TDP9/<USER>,<PASS>;         
                                               
DELETE FROM CMMI_WORKDB.CDB_REMEDY_DATA;       
                                               
BEGIN LOADING CMMI_WORKDB.CDB_REMEDY_DATA       
   ERRORFILES CMMI_UTILITYDB.CDB_REMEDY_ERR1,   
              CMMI_UTILITYDB.CDB_REMEDY_ERR2;   
                                               
DEFINE                                         
<VALUES_LIST>
DDNAME=INFILE;                               
                                             
SHOW;                                         
                                             
INSERT INTO  CMMI_WORKDB.CDB_REMEDY_DATA     
(  <COLUMNS_LIST> )                                                     
VALUES                                                 
(  <VALUES_LIST> );               
END LOADING;     


Note that the input is the "OUTPUT DATASET" which is perfect. But when I look at the data in the database (returned by a simple select * from clause), every time that I have a line-break (0D 0A), I have a D5 following...

As I am querying the database from a windows based system, I see this "extra-character" as the ascii symbol 'Õ', at the beginning of each new line. Below is the output of the query 'SELECT column_x FROM', where column_x is the column that contains the line_breaks:

Regression completed.
Õ
ÕHere is the report after today's STM regression for future date 08/01/2007.
Õ
ÕQREGRSTM JOB 29652 07/31/2007 X STD **** LOCAL 10,198
Õ
Õ
Õ*------------------------------------*
ÕTOTAL MATCHED CLAIMS 3,916
ÕTOTAL PAID VARIANCES 0
ÕTOTAL ERRORED VARIANCES 0
ÕEXTRA CLAIMS IN TESTLINE 0
ÕEXTRA CLAIMS IN BASELINE 0
Õ*------------------------------------*

The 'Õ's are the 'D5' that I am talking about. They always follow after a x'0D0A'.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Aug 29, 2007 12:22 am
Reply with quote

I know that the output dataset (after the conversion on the mainframe) is perfect because I used windows FTP to "get" the file from mainframe and the sequence is ok (0D 0A 0D 0A).
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Aug 29, 2007 1:11 am
Reply with quote

Hello,

Can you run the SELECT * query from a green-screen (3270) rather than a windows client?

Can you "unload" a few selected rows (ones with the embedded cr/lf) from teradata and see if the problem character is present?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Aug 29, 2007 1:19 am
Reply with quote

Hi, Dick...

Sorry, but I did not understand...

What do you mean by green-screen(3270)?

And what do you mean by "unload"? If you mean, exporting the data, I already did and, yes, the problem character is present...

It's driving me crazy...
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Aug 29, 2007 1:38 am
Reply with quote

Hi Rafael,

By green-screen, i mean a terminal that is used for TSO or CICS rather than a custom windows gui font-end application (in the "old days", the only device most people saw was a physical 3270 terminal which typically was a dark background with green characters). These days, a 3270 terminal emulator is how most people connect to the mainframe.

Whatever you use to connect to TSO or CICS should work as long as you can enter a query from that "terminal". Is there a way to run the SELECT query in batch and view the result in the output queue?

I suspect that my "unload" and your "export" are similar. Does teradata have the ability to export individual fields rather than whole rows/records?

Is there a way to hex-dump a row/record from teradata?

I'm trying to isloate when the "bad" character is introduced - during the load into teradata or just on the retrieval via the windows process.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Aug 29, 2007 1:46 am
Reply with quote

Ok.. understood.

Yes... I can run the query via Rumba/TSO/ISPF which I use to access mainframe. I will do that and let you know...

About the column exporting, yes, I did export it and the D5s are really there.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM Goto page 1, 2, 3  Next

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Replace each space in cobol string wi... COBOL Programming 3
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts Access to non cataloged VSAM file JCL & VSAM 18
Search our Forums:

Back to Top