View previous topic :: View next topic
|
Author |
Message |
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Can you provide any guidance on how would I do the conversions if I transfer the file as binary?
I think this is an interesting solution, because the record breaking works fine with binary, since the dataset's LRECL is exactly the size of the records (which are fixed-length). I just don't know how to convert to "text" (ascii or ebcdic?) afterwards... |
|
Back to top |
|
|
stodolas
Active Member
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
|
|
|
|
You need to load up a translation table of all the hex values in ASCII and the hex value that you want it to match to in EBCDIC. Then run through each binary records doing the conversion. This could be very bad then because you basically need to inspect each character in the binary string and replace it. The processing time and CPU on this probably isn't very pretty.
example
Code: |
ASCII EBCDIC
65 C1
66 C2
67 C3
32 40
etc
|
|
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Yeah, and my monthly file size is approximately 2.5 GB...
Even though, let me ask: How would you do this translation script on your environmet? Would you use a cobol program, or are you able to do it using only JCL?
If I am not asking too much, could you provide a code sample for this? (Don't need to work or anything... just a draft for me to use as guidance to start researching) |
|
Back to top |
|
|
stodolas
Active Member
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
|
|
|
|
Ummm.. you may be able to do it through a DFSORT with a translation table, but I am not sure. I'm not sure if Frank watches this part of the forum. |
|
Back to top |
|
|
superk
Global Moderator
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
I don't know about anyone else, but I'd like to see an actual real sample of a few lines of the raw ASCII data in a hex dump format. Something doesn't add up. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
I'll get it for you, superk... Just a sec... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello Rafael,
As we talked about before, i believe removing the cr/lf on the SQLServer side would be a good way to go.
Here is some SQLServer info that will work for the 2000 and 2005 versions
Code: |
CREATE TABLE #T
(
[ID] INT
,[TestValue] VARCHAR(MAX)
);
DECLARE @value VARCHAR(MAX);
SET @value = '1' + CHAR(10) + CHAR(8);
INSERT #T (id, testValue) VALUES (1, @value);
SELECT
*
--SQL 2005 specific, cannot use text datatype in REPLACE function must be varchar(max)
,REPLACE(TestValue, '', '') AS TestValue_REPLACE
--SQL 2000 eligible, charindex finds the location of the carriage return that could then be used
--in a combination with substring function
,CHARINDEX('', TestValue) AS FirstCRLF
FROM #T;
DROP TABLE #T; |
The INSERT and DROP are just for testing - you might use them for your testing, but that is up to you. REPLACE is a feature in 2005 and hopefully, you are running 2005.
Sorry i've been "away" much of the morning and haven't been in the dialog. |
|
Back to top |
|
|
stodolas
Active Member
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
|
|
|
|
I like Dick's solution, but that changes the data in the underlying table, and might not be feasible.
How about using his solution but defining a temp table just before extraction with a SELECT ..... FROM sourceTable INTO TEMPTABLE then do the extract from the temp table. Or maybe define a view that does the replace for you ahead of time. This has the advantage of offloading the replacement to the MSSQL server.... |
|
Back to top |
|
|
superk
Global Moderator
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
|
|
|
|
Let's see if I have a solution:
Presuming that the original data in ASCII is:
Code: |
-d
1792:0100 52 61 66 61 65 6C 20 4C-6F 6E 67 6F 50 72 6F 63 Rafael LongoProc
1792:0110 65 73 73 20 4D 69 73 73-69 6E 67 20 72 65 71 75 ess Missing requ
1792:0120 69 72 65 6D 65 6E 74 73-2E 0D 0A 54 68 65 20 72 irements...The r
1792:0130 65 71 75 69 72 65 6D 65-6E 74 73 20 77 65 72 65 equirements were
1792:0140 20 69 6E 61 63 63 75 72-61 74 65 2E 20 20 20 20 inaccurate.
1792:0150 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0160 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0170 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-d
1792:0180 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0190 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01A0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01B0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01C0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01D0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01E0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:01F0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-d
1792:0200 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0210 20 20 20 20 20 20 20 20-52 61 66 61 65 6C 20 4C Rafael L
1792:0220 6F 6E 67 6F 4E 6F 6E 2D-50 72 6F 63 65 73 73 53 ongoNon-ProcessS
1792:0230 65 74 75 70 20 69 73 73-75 65 20 20 20 20 20 20 etup issue
1792:0240 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0250 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0260 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0270 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-d
1792:0280 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0290 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02A0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02B0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02C0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02D0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02E0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:02F0 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
-d
1792:0300 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0310 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0320 20 20 20 20 20 20 20 20-20 20 20 20 20 20 20 20
1792:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
1792:0340 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
1792:0350 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
1792:0360 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
1792:0370 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-q
|
and the file is FTP'd binary to a dataset with RECFM=FB LRECL=280, then the data would look like this:
Code: |
ê/Ã/Ã%.<?>Ã…?&Ê?ÄÃËË.(ÑËËÑ>Ã….ÊÃÉÃÑÊÃ_Ã>ÈË...èÇÃ.ÊÃÉÃÑÊÃ_Ã>ÈË.ÃÃÊÃ.Ñ>/ÄÄÃÊ/ÈÃ.....
56666624666657666772467766627677676666772005662767767666677276762666667767622222
21615C0CFE7F02F35330D9339E702515925D5E43EDA48502515925D5E430752509E13352145E0000
------------------------------------------------------------------------------
ê/Ã/Ã%.<?>Ã…?+?>.&Ê?ÄÃËËëÃÈÃø.ÑËËÃÃ..............................................
56666624666646625766677567772677762222222222222222222222222222222222222222222222
21615C0CFE7FEFED02F3533354500933550000000000000000000000000000000000000000000000
|
then a simple run through OCOPY:
Code: |
//A2E EXEC PGM=IKJEFT01
//SYSTERM DD DUMMY
//SYSTSPRT DD SYSOUT=*
//INPUT DD DSN=... <UPLOADED BINARY ASCII DATA>
//OUTPUT DD DSN=... <CONVERTED EBCDIC DATA>,
// DISP=(,CATLG,DELETE),RECFM=FB,LRECL=280,
// UNIT=SYSDA,SPACE=(CYL,(10,10))
//SYSTSIN DD DATA
OCOPY INDD(INPUT) OUTDD(OUTPUT) TEXT CONVERT((BPXFX311)) TO1047
/*
|
would result in this:
Code: |
Rafael LongoProcess Missing requirements...The requirements were inaccurate.
D888894D9989D9988AA4D8AA8984989A898989AA401E884989A898989AA4A898489888A98A844444
9161530365767963522049229570958499545532BD538509584995455320659509513349135B0000
------------------------------------------------------------------------------
Rafael LongoNon-ProcessSetup issue
D888894D9989D996D9988AAE8AA948AAA84444444444444444444444444444444444444444444444
91615303657656507963522253470922450000000000000000000000000000000000000000000000
------------------------------------------------------------------------------
|
at which point I imagine a simple SORT step could easily translate the x'0D15' into blanks or some other desired characters. |
|
Back to top |
|
|
stodolas
Active Member
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
|
|
|
|
I see OCOPY converted the x'0D0A' (CRLF) to x'0D15'. Can you force a conversion to x'4040'. I didn't look back, but I think the OP said he wants it converted to spaces. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Guys, you will find attached a ascii sample file and the ascii dump of the same file.
Please, take a look...
Only the lines that begin with 'HD000'... are actual new records. So on this sample we should have only 4 records. But the dataset ends up with 17 records. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Guys,
You have helped me a lot... both suggestions (From Dick and Kevin) have brought light to my path...
I will work with them to see which fits best on the situation...
Dick, don't worry about it.. You helped a lot!
Steve, that's true, I said that I wanted to switch the cr/lfs to spaces but probably with these solutions I can keep them, AND avoid the record breaking... This would be ideal to me...
I'll let you know, guys...
Thanks a lot!!! |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
The REPLACE that modified the temporary table was only for example.
The REPLACE could be used directly in the EXPORT if the system is using SQLServer 2005.
I agree - changing the data in the "real" table might be more of a problem than a solution. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Hi, guys,
I finally got to work on this issue again today and...
It worked!!
As Kevin suggested I uploaded the file as 'binary' and converted to EBCDIC right before I fastload the data into Teradata.
The record-breaking issue is solved with this, as the records are broken exclusively by the length of the data... I mean, as the lrecl is set to 2660 bytes, the dataset is created breaking the data each 2660 bytes. And that's exactly what I needed.
One lasting issue though. After the conversion, I have one "extra" character (D5) after each line-breaking. Like, in the example below, where I have the hex codes for two line-breaks:
Original data: 0D 0A 0D 0A
Converted data: 0D 0A D5 0D 0A D5
I didn't have the time to investigate in detail why this is happening (I will do it now) but I wanted to share it with you, guys, to see if you have any thoughts on the reason...
Any ideas? |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
The extra character is a D5 (dee-five)? If it were a zero-five, it might make a bit of sense (x'05' is a tab character).
If the records are already 2660 and this extra byte is inserted, is the lrecl changed or is the last byte truncated?
I'm slow today . . . |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Yeah, it's a D5 (dee-five) in Ascii it represents 'Õ'. In EBCDIC, if Im not mistaken, it's 'N'.
Can you guys explain to me in detail the ocopy command, like:
OCOPY INDD(INPUT) OUTDD(OUTPUT) TEXT CONVERT((BPXFX311)) TO1047
I understood the input/output part... But I am not so familiar with the TEXT CONVERT part. What is BPXFX311? I suppose it's a translation table or something... But is there a way to see it's contents? Do I have options on this (to use another table or create my own)? What is TO1047? |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
The reason why I am asking all this is that I think that the blame for the extra character is of the translation table... |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
I was wrong. The conversion is perfect. There is no 'D5' extra character in the output dataset.
The extra character is only appearing in the final data fastloaded into the teradata database.
Probably teradata's fastload is not handling the dataset's EBCDIC data correctly or something...
I am looking into it... Meanwhile, any comments/ideas are welcome... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
it's a D5 (dee-five) in Ascii it represents 'Õ'. In EBCDIC, if Im not mistaken, it's 'N'. |
Yes, a D5 is an EBCDIC "N".
How do you get from the "output dataset" to the "final data fastloaded into the teradata database"? |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
I run the following job that invokes Teradata's fastload:
Code: |
//P54BAF2 JOB (LOADRMDY),'LOADRMDY',CLASS=3,
// MSGCLASS=X,REGION=4096K,TIME=1339,NOTIFY=P54BAF
/*JOBPARM S=SYS1
//JOBLIB DD DSN=<LIBRARY>,DISP=SHR
//PS20 EXEC PGM=FASTLOAD,PARM='NOSPIE/BUFSIZE=31'
//SYSPRINT DD SYSOUT=*
//SYSABEND DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//INFILE DD DSN=<OUTPUT DATASET>,DISP=SHR
//SYSIN DD *
SESSIONS 32;
RECORD 2;
.LOGON TDP9/<USER>,<PASS>;
DELETE FROM CMMI_WORKDB.CDB_REMEDY_DATA;
BEGIN LOADING CMMI_WORKDB.CDB_REMEDY_DATA
ERRORFILES CMMI_UTILITYDB.CDB_REMEDY_ERR1,
CMMI_UTILITYDB.CDB_REMEDY_ERR2;
DEFINE
<VALUES_LIST>
DDNAME=INFILE;
SHOW;
INSERT INTO CMMI_WORKDB.CDB_REMEDY_DATA
( <COLUMNS_LIST> )
VALUES
( <VALUES_LIST> );
END LOADING; |
Note that the input is the "OUTPUT DATASET" which is perfect. But when I look at the data in the database (returned by a simple select * from clause), every time that I have a line-break (0D 0A), I have a D5 following...
As I am querying the database from a windows based system, I see this "extra-character" as the ascii symbol 'Õ', at the beginning of each new line. Below is the output of the query 'SELECT column_x FROM', where column_x is the column that contains the line_breaks:
Regression completed.
Õ
ÕHere is the report after today's STM regression for future date 08/01/2007.
Õ
ÕQREGRSTM JOB 29652 07/31/2007 X STD **** LOCAL 10,198
Õ
Õ
Õ*------------------------------------*
ÕTOTAL MATCHED CLAIMS 3,916
ÕTOTAL PAID VARIANCES 0
ÕTOTAL ERRORED VARIANCES 0
ÕEXTRA CLAIMS IN TESTLINE 0
ÕEXTRA CLAIMS IN BASELINE 0
Õ*------------------------------------*
The 'Õ's are the 'D5' that I am talking about. They always follow after a x'0D0A'. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
I know that the output dataset (after the conversion on the mainframe) is perfect because I used windows FTP to "get" the file from mainframe and the sequence is ok (0D 0A 0D 0A). |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Can you run the SELECT * query from a green-screen (3270) rather than a windows client?
Can you "unload" a few selected rows (ones with the embedded cr/lf) from teradata and see if the problem character is present? |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Hi, Dick...
Sorry, but I did not understand...
What do you mean by green-screen(3270)?
And what do you mean by "unload"? If you mean, exporting the data, I already did and, yes, the problem character is present...
It's driving me crazy... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hi Rafael,
By green-screen, i mean a terminal that is used for TSO or CICS rather than a custom windows gui font-end application (in the "old days", the only device most people saw was a physical 3270 terminal which typically was a dark background with green characters). These days, a 3270 terminal emulator is how most people connect to the mainframe.
Whatever you use to connect to TSO or CICS should work as long as you can enter a query from that "terminal". Is there a way to run the SELECT query in batch and view the result in the output queue?
I suspect that my "unload" and your "export" are similar. Does teradata have the ability to export individual fields rather than whole rows/records?
Is there a way to hex-dump a row/record from teradata?
I'm trying to isloate when the "bad" character is introduced - during the load into teradata or just on the retrieval via the windows process. |
|
Back to top |
|
|
Rafael Longo
New User
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
|
|
|
|
Ok.. understood.
Yes... I can run the query via Rumba/TSO/ISPF which I use to access mainframe. I will do that and let you know...
About the column exporting, yes, I did export it and the D5s are really there. |
|
Back to top |
|
|
|