Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
I need to upload a data file from windows into mvs and, after this, use teradata fastload utility to load the data into a database.
Currently, I am uploading the file using windows ftp utility into a dataset created by a job.
The problem arises when I fastload it into teradata. All the data is correctly loaded into the table as far as the record does not have any text fields with new line characters (CR/LF). When a record does have CR/LF, it's broken into multiple records, each one starting at the first character after the CR/LF characters.
What I need is a way to avoid this record breaking. Maybe if I could replace the CR/LFs with spaces or something like that.
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello,
I believe you want to determine why only some of the records have the cr/lf and eliminate it on the sqlserver export - you may have to make some slight modification. Does the cr/lf always come at the same displacement? If you look at the file on the win-based system, does the export file have the cr/lf?
After you have the file on the mainframe and before you attempt the load, what does the file look like? Where are the cr/lf's? It may be that you have some records longer than the file you created and instead of an abend, you get "wrapped".
If they are not causing a wrap and are only "junk" and do not take an extra byte, replacing them as suggested before may be the ticket. Personally, when i send a file, i make sure i know where all of the content came from - saves surprises later.
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Thanks for the help, guys...
Dick--
Yes, when I look at the file on the windows system, The cr/lf`s are there.
No, they are not alway at the same place. There are a few exported fields that, on some records, not all, have cr/lfs.
When I look at the file in mvs, the records are wrapped where the cr/lfs are.
The record length of the dataset is correct and matches the length of all the records in the uploaded file. (It`s a fixed-length file). So, we can suppose that`s not the problem.
Superk--
Thanks for the tip. I will take a look at the SORT command. Someone here suggested me also to take a look at SYNCSORT too... What do you think?
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Ok, guys,
I have created a Visual basic program that removes the unwanted cr/lfs before the file uploading. It's working fine.
But now I was asked to do all the processing in MVS. So I am looking for the best way to create a program to perform this on Mainframe environment. Just like I did with VB on the windows environment.
I tried to search for a way to do it with SORT and ALTSEQ, like SUPERK suggested but I couldn't. If someone could provide me a piece of code or link to illustrate, I would appreciate it very much.
I was thinking that maybe I could use a language to create a program (like COBOL, Assembler, REXX, etc...). What do you guys think? Which language would be more suitable for that?
Any suggestions/comments would be greatly appreciated...
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello,
Were you able to detemine why these cr/lf's are in the output from the dts export? I believe you will be way ahead if you can determine why this is happening and remove the cause in the export.
It will be much more tedious to deal with on the mainframe because the ftp is already "breaking" the data and creating a new record.
When you have uploaded "broken" records to the mainframe what are the attributes of the mainframe file? If an uploaded record is broken into 2 mainframe records, is the cr/lf the last 2 bytes of the "first" record?
Can you post one or 2 of these broken mainframe records so we can see the "break"? If they contain sensative data, the sensative content could be replaced before posting.
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Thanks for the reply, Dick.
About the samples, right now I don't have it, since I am uploading the corrected version of the file. I will upload the uncorrected version and, if you still want to see, I can post later...
But The origin of the cr/lfs is a free text field stored on SQL server. There is nothing we can do about it. The user have the freedom to break lines on this field and we have to deal with it.
The last 2 bytes of the "first" record are not CR/LFs. You are right when you say that the FTP has already broken the records.
Is there a way (let it be tedious.. ) to write a program to re-compose these records? Which programming language would you suggest in this case?
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
You're welcome.
A few more questions - not necessarily needing answers, but may help clarify.
Does the maximum split cause 2 records or are there an unpredictable number of records created by what should be one "broken" record?
Do the cr and the lf always stay in the same record, or might they also split?
Is there something in the data that would identify "start of record", "end of record", or might be put in as literals on the export?
If you can easily post a sample, that may help, but may not be necessary.
If i was going to do this, i would most likely use COBOL as more folks would be able to maintain it easily - i work at making sure that just because i was the author of something, i am not tied to it "forever"
Something that will also help are the dcb info for the file after the upload and the dcb info for a "corrected" file.
A simple something you might try is to make sure the lrecl of the mainframe file is longer (a lot longer ) and see if the "break" stops. As the cr/lf are not at the end of a "broken" record, it may be lrecl that is causing the break - seen that before.
What should happen to the cr/lf controls that the user placed in the data originally?
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Quote:
Does the maximum split cause 2 records or are there an unpredictable number of records created by what should be one "broken" record?
There are an unpredictable number of records. It's a free text field, so the user may have entered any number of line-breaks. And each one will result in a new record.
Quote:
Do the cr and the lf always stay in the same record, or might they also split?
I don't think the cr/lfs are in the splitted dataset.
Quote:
Is there something in the data that would identify "start of record", "end of record", or might be put in as literals on the export?
I suppose I don't need it since the file has a fixed length of 2660 bytes.
Quote:
If you can easily post a sample, that may help, but may not be necessary.
I will do it tomorrow morning, Dick. First thing.
Quote:
A simple something you might try is to make sure the lrecl of the mainframe file is longer (a lot longer ) and see if the "break" stops. As the cr/lf are not at the end of a "broken" record, it may be lrecl that is causing the break - seen that before.
The lrecl is right. 2660 bytes extactly. When it breaks it breaks before that because of a CR/LF.
Quote:
Something that will also help are the dcb info for the file after the upload and the dcb info for a "corrected" file.
What is DCB? How do I use it? (I am using TSO/ISPF)
Quote:
What should happen to the cr/lf controls that the user placed in the data originally?
They should be replaced with a space or just disappear.
Quote:
If i was going to do this, i would most likely use COBOL as more folks would be able to maintain it easily
Ok, but, easyness of maintenance apart, is COBOL the most suitable language for this kind of task? (Manipulate records in a dataset)
Could you provide a code sample that do something a little like this? (I am not exactly an expert on any mainframe language... )
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
The ASCII -> EBCDIC is happening in the FTP server. If you are don't issue the BI command in your FTP stream it will be converted to EBCDIC with the CR/LF forcing the breaks to new records on the mainframe. I think you may be able to fix this if you issue the correct "SITE" commands to the MVS FTP server so it doesn't break on CR/LF. I'll look into that.
What happens if you zip the file, transfer it binary and extract it on the mainframe do the records still break on the CR/LF?
Sometimes the requirement to run on the mainframe is not a sound business decision.
<RANT - not to hijack>
Case in point. We only have COBOL (REXX too, but only the Sysprogs know anything about that) on our mainframe. How I wish they would enable the C++ and Java compilers. My director who has been there 25+ years says "This process will run on the mainframe." And so I had to query an LDAP server from COBOL. And to accomplish this I create a TSO environment in the program and then read an input file and call a TCP/IP gldsearch app from the COBOL program. Because its on the mainframe and paired with COBOL the run time on 10,000 records has extended up to 2 hours. If we had Java enabled in z/OS, which knows how to talk LDAP and can talk to the destination MS SQL Server (I hope), this would be phenomanally faster.
Current process
(Daily on MF)Generate input file
(Nightly on MF)Sort input to de-dupe, Read de-duped file, do LDAP queries 1 by 1 and write output file, ftp to lan [2 hours 10,000 recs mostly in the query step]
(N on LAN)Read file from MF load SQL Server, trigger mainframe job
(N on MF) Send email to server to trigger notice of new web invoice to customers
Or better yet in my opinion (shared by too few).
(Daily on MF)Generate input file
(Nightly on MF)Sort to de-dupe, ftp to lan
(N on LAN)Read file from MF de-dupe, query LDAP and do a direct insert to SQL Server, trigger mainframe job
(N on MF) Send email to server to trigger notice of new web invoice to customers
</RANT>
I'm open to suggestions on improving that process.
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello Rafael,
Quote:
is COBOL the most suitable language for this kind of task?
From my perspective, the most suitable language for code that is not actually business rules driven is whichever language the author is comfortable. Time was, that i would only write such code in assembler, but times have changed. Your "business requirement" may be to remover the unwantred cr/lf, but to me this is more of a utility function than a business rule implementation.
Quote:
(I am not exactly an expert on any mainframe language... )
I suspect that you will want to borrow one - if not a complete expert, at least somewhat fluent.
It will be good to see samples of the "broken" records. I am still not clear on when the breaks occur (if the end of a short, broken record is not the cr/lf. If you can force a "start of record" value and an "end of record" value on the export, it may make the code to repair the records much easier.
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Thanks for the illustration stodolas. This is exactly what happens.
Let me try to illustrate too, Dick:
Record structure
-----------
Position 1 to 12: Employee name
Position 13 to 23: Defect Type
Position 24 to 280: Root cause
LRECL = 280 bytes
Text file from windows:
---------------------------
--Rec 1
Rafael LongoProcess Missing requirements. CR/LF The requirements were inaccurate.
--Rec 2
Rafael LongoNon-ProcessSetup issue
---------------------------------
After we FTP this file to Mainframe, we have the following
Dataset on MVS:
-------------------
--Rec1
Rafael LongoProcess Missing requirements.
--Rec2
The requirements were inaccurate.
--Rec3
Rafael LongoNon-ProcessSetup issue
----------------------
But here is what we should have:
-------------------
--Rec1
Rafael LongoProcess Missing requirements. The requirements were inaccurate.
--Rec2
Rafael LongoNon-ProcessSetup issue
----------------------
This is exactly what happens. Did that make things clearer, Dick? Let me know if you need any more info...
Any new thoughts?
About the programming language, like you said we should use the one that is most suitable, considering not only the language's resources but also our abilities with it. But even though I am not fluent on languages like assembler or Cobol, I can learn and get it done. But, since I am going to do some resarching (including getting help from folks that are more fluent than me), I need first to choose which language is the best way to achieve this goal. That's why I am trying to find out if Cobol is really the best way (Some people here that work with Cobol told me that it is not the most indicated language to perform such a task).
Stodolas mentioned something about C++ on mainframe. This, for example would be perfect since I am more fluent on C++. Java would also be great. Is it possible to use these languages on Mainframe? If it is, what do I need to set them up? What about Rexx and Assembler compared to Cobol to perform this task?
Stodolas, did you find out anything on the SITE command that could help avoid the record breaking on FTP?
Thanks for all the help guys, hope I don't get you tired of me...
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
I looked at the SITE command and didn't find anything.
A couple of other notes:
If you transfer binary, you won't have any records breaks and you will have to do the ASCII to EBCDIC conversion yourself in your language of choice then break into different records on your own.
For Java or C++ your need to ensure the compiler is available. Beyond that I don't know. People don't talk much about anything other than Cobol on the mainframe.
Not that my opinion matters to the people asking, but your pre-processing on the little box server and keeping the records together is probably the best solution. Since you are outputting ASCII there, it only makes sense to work with it on the system where it is the native encoding.
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Can you provide any guidance on how would I do the conversions if I transfer the file as binary?
I think this is an interesting solution, because the record breaking works fine with binary, since the dataset's LRECL is exactly the size of the records (which are fixed-length). I just don't know how to convert to "text" (ascii or ebcdic?) afterwards...
Joined: 13 Jun 2007 Posts: 632 Location: Wisconsin
You need to load up a translation table of all the hex values in ASCII and the hex value that you want it to match to in EBCDIC. Then run through each binary records doing the conversion. This could be very bad then because you basically need to inspect each character in the binary string and replace it. The processing time and CPU on this probably isn't very pretty.
Joined: 21 Jul 2007 Posts: 33 Location: Campinas, S?o Paulo, Brazil
Yeah, and my monthly file size is approximately 2.5 GB...
Even though, let me ask: How would you do this translation script on your environmet? Would you use a cobol program, or are you able to do it using only JCL?
If I am not asking too much, could you provide a code sample for this? (Don't need to work or anything... just a draft for me to use as guidance to start researching)
Joined: 26 Apr 2004 Posts: 4652 Location: Raleigh, NC, USA
I don't know about anyone else, but I'd like to see an actual real sample of a few lines of the raw ASCII data in a hex dump format. Something doesn't add up.