IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Upload and replace newline characters from data file


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Jul 25, 2007 2:07 am
Reply with quote

I need to upload a data file from windows into mvs and, after this, use teradata fastload utility to load the data into a database.

Currently, I am uploading the file using windows ftp utility into a dataset created by a job.

The problem arises when I fastload it into teradata. All the data is correctly loaded into the table as far as the record does not have any text fields with new line characters (CR/LF). When a record does have CR/LF, it's broken into multiple records, each one starting at the first character after the CR/LF characters.

What I need is a way to avoid this record breaking. Maybe if I could replace the CR/LFs with spaces or something like that.
Back to top
View user's profile Send private message
William Thompson

Global Moderator


Joined: 18 Nov 2006
Posts: 3156
Location: Tucson AZ

PostPosted: Wed Jul 25, 2007 3:18 am
Reply with quote

How about using a modified version of the codepage used in the FTP?
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Wed Jul 25, 2007 7:15 am
Reply with quote

I'm confused. An EBCDIC MVS dataset doesn't contain, to use your terminology, "new line characters (CR/LF)".

Is this data being uploaded in binary transfer mode, and is it being stored as ASCII within the dataset?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Jul 25, 2007 8:08 am
Reply with quote

Hello,

How is the file being created on the windoes system?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Jul 25, 2007 6:32 pm
Reply with quote

Thanks for the replies.

It's a file exported from a SQL Server database using DTS Export utility.
It's an ASCII file, not EBCDIC.

Here are the ftp commands:

open mvs
user username password
put file.txt DSNAME
quit


William, can you elaborate on the codepage solution? I did not get your idea.

Please, tell me if you need more information...
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Wed Jul 25, 2007 6:44 pm
Reply with quote

I'd use a SORT with the ALTSEQ parameter to convert the undesired characters into the desired characters. Plenty of examples abound in these forums.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Jul 25, 2007 7:18 pm
Reply with quote

Hello,

I believe you want to determine why only some of the records have the cr/lf and eliminate it on the sqlserver export - you may have to make some slight modification. Does the cr/lf always come at the same displacement? If you look at the file on the win-based system, does the export file have the cr/lf?

After you have the file on the mainframe and before you attempt the load, what does the file look like? Where are the cr/lf's? It may be that you have some records longer than the file you created and instead of an abend, you get "wrapped".

If they are not causing a wrap and are only "junk" and do not take an extra byte, replacing them as suggested before may be the ticket. Personally, when i send a file, i make sure i know where all of the content came from - saves surprises later.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Wed Jul 25, 2007 8:52 pm
Reply with quote

Thanks for the help, guys...

Dick--

Yes, when I look at the file on the windows system, The cr/lf`s are there.

No, they are not alway at the same place. There are a few exported fields that, on some records, not all, have cr/lfs.

When I look at the file in mvs, the records are wrapped where the cr/lfs are.

The record length of the dataset is correct and matches the length of all the records in the uploaded file. (It`s a fixed-length file). So, we can suppose that`s not the problem.

Superk--

Thanks for the tip. I will take a look at the SORT command. Someone here suggested me also to take a look at SYNCSORT too... What do you think?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 12:30 am
Reply with quote

Ok, guys,

I have created a Visual basic program that removes the unwanted cr/lfs before the file uploading. It's working fine.

But now I was asked to do all the processing in MVS. So I am looking for the best way to create a program to perform this on Mainframe environment. Just like I did with VB on the windows environment.

I tried to search for a way to do it with SORT and ALTSEQ, like SUPERK suggested but I couldn't. If someone could provide me a piece of code or link to illustrate, I would appreciate it very much.

I was thinking that maybe I could use a language to create a program (like COBOL, Assembler, REXX, etc...). What do you guys think? Which language would be more suitable for that?

Any suggestions/comments would be greatly appreciated...

Thanks a lot!
Rafael Longo
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 21, 2007 1:10 am
Reply with quote

Hello,

Were you able to detemine why these cr/lf's are in the output from the dts export? I believe you will be way ahead if you can determine why this is happening and remove the cause in the export.

It will be much more tedious to deal with on the mainframe because the ftp is already "breaking" the data and creating a new record.

When you have uploaded "broken" records to the mainframe what are the attributes of the mainframe file? If an uploaded record is broken into 2 mainframe records, is the cr/lf the last 2 bytes of the "first" record?

Can you post one or 2 of these broken mainframe records so we can see the "break"? If they contain sensative data, the sensative content could be replaced before posting.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 1:24 am
Reply with quote

Thanks for the reply, Dick.

About the samples, right now I don't have it, since I am uploading the corrected version of the file. I will upload the uncorrected version and, if you still want to see, I can post later...

But The origin of the cr/lfs is a free text field stored on SQL server. There is nothing we can do about it. The user have the freedom to break lines on this field and we have to deal with it.

The last 2 bytes of the "first" record are not CR/LFs. You are right when you say that the FTP has already broken the records.

Is there a way (let it be tedious.. icon_wink.gif ) to write a program to re-compose these records? Which programming language would you suggest in this case?
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 21, 2007 1:52 am
Reply with quote

You're welcome.

A few more questions - not necessarily needing answers, but may help clarify.

Does the maximum split cause 2 records or are there an unpredictable number of records created by what should be one "broken" record?

Do the cr and the lf always stay in the same record, or might they also split?

Is there something in the data that would identify "start of record", "end of record", or might be put in as literals on the export?

If you can easily post a sample, that may help, but may not be necessary.

If i was going to do this, i would most likely use COBOL as more folks would be able to maintain it easily - i work at making sure that just because i was the author of something, i am not tied to it "forever" icon_smile.gif

Something that will also help are the dcb info for the file after the upload and the dcb info for a "corrected" file.

A simple something you might try is to make sure the lrecl of the mainframe file is longer (a lot longer icon_smile.gif ) and see if the "break" stops. As the cr/lf are not at the end of a "broken" record, it may be lrecl that is causing the break - seen that before.

What should happen to the cr/lf controls that the user placed in the data originally?
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 2:32 am
Reply with quote

Quote:
Does the maximum split cause 2 records or are there an unpredictable number of records created by what should be one "broken" record?

There are an unpredictable number of records. It's a free text field, so the user may have entered any number of line-breaks. And each one will result in a new record.

Quote:
Do the cr and the lf always stay in the same record, or might they also split?

I don't think the cr/lfs are in the splitted dataset.

Quote:
Is there something in the data that would identify "start of record", "end of record", or might be put in as literals on the export?

I suppose I don't need it since the file has a fixed length of 2660 bytes.

Quote:
If you can easily post a sample, that may help, but may not be necessary.

I will do it tomorrow morning, Dick. First thing.

Quote:
A simple something you might try is to make sure the lrecl of the mainframe file is longer (a lot longer ) and see if the "break" stops. As the cr/lf are not at the end of a "broken" record, it may be lrecl that is causing the break - seen that before.

The lrecl is right. 2660 bytes extactly. When it breaks it breaks before that because of a CR/LF.

Quote:
Something that will also help are the dcb info for the file after the upload and the dcb info for a "corrected" file.

What is DCB? How do I use it? (I am using TSO/ISPF)

Quote:
What should happen to the cr/lf controls that the user placed in the data originally?

They should be replaced with a space or just disappear.

Quote:
If i was going to do this, i would most likely use COBOL as more folks would be able to maintain it easily

Ok, but, easyness of maintenance apart, is COBOL the most suitable language for this kind of task? (Manipulate records in a dataset)

Could you provide a code sample that do something a little like this? (I am not exactly an expert on any mainframe language... )
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:30 am
Reply with quote

The ASCII -> EBCDIC is happening in the FTP server. If you are don't issue the BI command in your FTP stream it will be converted to EBCDIC with the CR/LF forcing the breaks to new records on the mainframe. I think you may be able to fix this if you issue the correct "SITE" commands to the MVS FTP server so it doesn't break on CR/LF. I'll look into that.

What happens if you zip the file, transfer it binary and extract it on the mainframe do the records still break on the CR/LF?

Sometimes the requirement to run on the mainframe is not a sound business decision.

<RANT - not to hijack>
Case in point. We only have COBOL (REXX too, but only the Sysprogs know anything about that) on our mainframe. How I wish they would enable the C++ and Java compilers. My director who has been there 25+ years says "This process will run on the mainframe." And so I had to query an LDAP server from COBOL. And to accomplish this I create a TSO environment in the program and then read an input file and call a TCP/IP gldsearch app from the COBOL program. Because its on the mainframe and paired with COBOL the run time on 10,000 records has extended up to 2 hours. If we had Java enabled in z/OS, which knows how to talk LDAP and can talk to the destination MS SQL Server (I hope), this would be phenomanally faster.

Current process
(Daily on MF)Generate input file
(Nightly on MF)Sort input to de-dupe, Read de-duped file, do LDAP queries 1 by 1 and write output file, ftp to lan [2 hours 10,000 recs mostly in the query step]
(N on LAN)Read file from MF load SQL Server, trigger mainframe job
(N on MF) Send email to server to trigger notice of new web invoice to customers

Or better yet in my opinion (shared by too few).
(Daily on MF)Generate input file
(Nightly on MF)Sort to de-dupe, ftp to lan
(N on LAN)Read file from MF de-dupe, query LDAP and do a direct insert to SQL Server, trigger mainframe job
(N on MF) Send email to server to trigger notice of new web invoice to customers

</RANT>

I'm open to suggestions on improving that process.


I'll go look at the FTP SITE commands now. icon_smile.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Aug 21, 2007 7:33 am
Reply with quote

Hello Rafael,

Quote:
is COBOL the most suitable language for this kind of task?

From my perspective, the most suitable language for code that is not actually business rules driven is whichever language the author is comfortable. Time was, that i would only write such code in assembler, but times have changed. Your "business requirement" may be to remover the unwantred cr/lf, but to me this is more of a utility function than a business rule implementation.

Quote:
(I am not exactly an expert on any mainframe language... )
I suspect that you will want to borrow one - if not a complete expert, at least somewhat fluent.

It will be good to see samples of the "broken" records. I am still not clear on when the breaks occur (if the end of a short, broken record is not the cr/lf. If you can force a "start of record" value and an "end of record" value on the export, it may make the code to repair the records much easier.

I'll look in the morning. . . .
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:45 am
Reply with quote

Dick,

Here is what happens on the CR/LF stuff. Hope this makes sense...

File from PC side, I'll assume comma seperated.
Code:

F1, F2, F3a F3b F3c F3d F3e, F4
F1, F2, F3a CR/LF F3b F3c F3d CR/LF F3e, F4
F1, F2, F3a F3b CR/LF F3c F3d CR/LF F3e, F4
F1, F2, F3a F3b F3c CR/LF F3d CR/LF F3e, F4


Result after FTP to the mainframe. -- indicates comments not records
Code:

--rec 1 from pc
F1, F2, F3, F4
--rec 2 from pc
F1, F2, F3a
F3b F3c F3d
F3e, F4
--rec 3 from pc
F1, F2, F3a F3b
F3c F3d
F3e, F4
--rec 4 from pc
F1, F2, F3a F3b F3c
F3d
F3e, F4
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:45 am
Reply with quote

Whoops rec 2 from the PC should be this
Code:

F1, F2, F3a F3b F3c F3d F3e, F4
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:47 am
Reply with quote

and shoot I need to go to bed.... REC 1, not REC 2 WOW

Please correct the first of these posts and delete the second 2

Thanks
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 6:30 pm
Reply with quote

icon_biggrin.gif

Thanks for the illustration stodolas. This is exactly what happens.

Let me try to illustrate too, Dick:

Record structure
-----------
Position 1 to 12: Employee name
Position 13 to 23: Defect Type
Position 24 to 280: Root cause
LRECL = 280 bytes

Text file from windows:
---------------------------
--Rec 1
Rafael LongoProcess Missing requirements. CR/LF The requirements were inaccurate.
--Rec 2
Rafael LongoNon-ProcessSetup issue
---------------------------------

After we FTP this file to Mainframe, we have the following
Dataset on MVS:
-------------------
--Rec1
Rafael LongoProcess Missing requirements.
--Rec2
The requirements were inaccurate.
--Rec3
Rafael LongoNon-ProcessSetup issue
----------------------

But here is what we should have:
-------------------
--Rec1
Rafael LongoProcess Missing requirements. The requirements were inaccurate.
--Rec2
Rafael LongoNon-ProcessSetup issue
----------------------

This is exactly what happens. Did that make things clearer, Dick? Let me know if you need any more info...

Any new thoughts?

About the programming language, like you said we should use the one that is most suitable, considering not only the language's resources but also our abilities with it. But even though I am not fluent on languages like assembler or Cobol, I can learn and get it done. But, since I am going to do some resarching (including getting help from folks that are more fluent than me), I need first to choose which language is the best way to achieve this goal. That's why I am trying to find out if Cobol is really the best way (Some people here that work with Cobol told me that it is not the most indicated language to perform such a task).

Stodolas mentioned something about C++ on mainframe. This, for example would be perfect since I am more fluent on C++. Java would also be great. Is it possible to use these languages on Mainframe? If it is, what do I need to set them up? What about Rexx and Assembler compared to Cobol to perform this task?

Stodolas, did you find out anything on the SITE command that could help avoid the record breaking on FTP?

Thanks for all the help guys, hope I don't get you tired of me... icon_wink.gif
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 6:39 pm
Reply with quote

I looked at the SITE command and didn't find anything.

A couple of other notes:
If you transfer binary, you won't have any records breaks and you will have to do the ASCII to EBCDIC conversion yourself in your language of choice then break into different records on your own.

For Java or C++ your need to ensure the compiler is available. Beyond that I don't know. People don't talk much about anything other than Cobol on the mainframe.

Not that my opinion matters to the people asking, but your pre-processing on the little box server and keeping the records together is probably the best solution. Since you are outputting ASCII there, it only makes sense to work with it on the system where it is the native encoding.
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 6:57 pm
Reply with quote

Can you provide any guidance on how would I do the conversions if I transfer the file as binary?

I think this is an interesting solution, because the record breaking works fine with binary, since the dataset's LRECL is exactly the size of the records (which are fixed-length). I just don't know how to convert to "text" (ascii or ebcdic?) afterwards...
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:02 pm
Reply with quote

You need to load up a translation table of all the hex values in ASCII and the hex value that you want it to match to in EBCDIC. Then run through each binary records doing the conversion. This could be very bad then because you basically need to inspect each character in the binary string and replace it. The processing time and CPU on this probably isn't very pretty.

example
Code:

ASCII  EBCDIC
65     C1
66     C2
67     C3
32     40
etc
Back to top
View user's profile Send private message
Rafael Longo

New User


Joined: 21 Jul 2007
Posts: 33
Location: Campinas, S?o Paulo, Brazil

PostPosted: Tue Aug 21, 2007 7:08 pm
Reply with quote

Yeah, and my monthly file size is approximately 2.5 GB...

Even though, let me ask: How would you do this translation script on your environmet? Would you use a cobol program, or are you able to do it using only JCL?

If I am not asking too much, could you provide a code sample for this? (Don't need to work or anything... just a draft for me to use as guidance to start researching)
Back to top
View user's profile Send private message
stodolas

Active Member


Joined: 13 Jun 2007
Posts: 632
Location: Wisconsin

PostPosted: Tue Aug 21, 2007 7:49 pm
Reply with quote

Ummm.. you may be able to do it through a DFSORT with a translation table, but I am not sure. I'm not sure if Frank watches this part of the forum.
Back to top
View user's profile Send private message
superk

Global Moderator


Joined: 26 Apr 2004
Posts: 4652
Location: Raleigh, NC, USA

PostPosted: Tue Aug 21, 2007 7:51 pm
Reply with quote

I don't know about anyone else, but I'd like to see an actual real sample of a few lines of the raw ASCII data in a hex dump format. Something doesn't add up.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM Goto page 1, 2, 3  Next

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 7
No new posts Replace each space in cobol string wi... COBOL Programming 2
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts Access to non cataloged VSAM file JCL & VSAM 18
Search our Forums:

Back to Top