I’m working on an application design and have a tricky design issue to handle.
I have 2 input files, say File A & File B. Both files are huge in terms of Volume (Millinons of records) and Record length (Record consists of about 2000 fields).
The applications design is for a migration project. My File A is the data from old system, File B is the data from new system and My File C is the output which will contain combination of both data.
The Cobol program has to read File A, File B and produce File C. This is simple enough. But the tricky part is,
- The program has to dynamically determine where it should get the data for each field on the output file.
- Say A1 through A9999 are my File A fields, B1 through B9999 are my File B fields and C1 through C9999 are my output file C fields, at runtime the program has to determine,
1. The data for C1 should come from File A or B
2. If File A which field?
3. If File B which field?
Point 1 above, this can be controlled via a DB2 table, so the program can read the table and determine which file to use.
But I don’t have a clear idea on how point 2 & 3 can be achieved. To me it sounds like a dynamic cobol MOVE statements which is not supported by cobol I believe.
Joined: 18 Jul 2007 Posts: 2146 Location: At my coffee table
vasan2 wrote:
2. If File A which field?
3. If File B which field?
.
.
.
But I don’t have a clear idea on how point 2 & 3 can be achieved. To me it sounds like a dynamic cobol MOVE statements which is not supported by cobol I believe.
If you are designing this program, there should be some specification and/or requirement as to which (if not all) fields are needed in the output file.
Dynamic move? What would that be? Why do you think you would need something like that?
Well, File C will be a replacement of File A with some formatting at end of the migration project.
But business is looking at rolling out the new system in phases. So we do know what data will go into output file for some fields though, don't know the data source for rest at this stage.
The requirement is to have a flexible design to dynamically determine which field data should map to a particulat field on the output file (mapping can perhaps maintained as a DB2 table?). So given that situation wondering it's possible to have such a flexible design in a cobol program.
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
Reference modification allows you to specify a starting location and length for a move and both location and length can be variables, so what you are wanting to do can be done. However, if you're not sure at this point what fields you are wanting to move from which file then I suggest using reference modification would be the height of silliness. The design must be complete before coding. How do you know when the design is complete? When the source -- which file and which bytes of that file -- of every field in the output file is known and documented.
Hopefully, neither File-A nor File-B have fields that are subject to OCCURS DEPENDING ON clauses. If not, here is what I would suggest.
1) In the program, construct an internal table for each file (File-A, File-B, and File-C) which will contain (max-no-of-fields) entries of 3 elements each: an argument of Field-Number (1 to max), and results of Field-Offset in the file (relative to 1) and Field-Length. These tables ( Table-A, Table-B, and Table-C ) can be defined either in the program source, or constructed at run-time from external definition files.
2) In the program, allocate a fourth table ( Table-D ) to be built at run-time, which will consist of 4 elements for each entry: File-A-Field-Offset, File-B-Field-Offset, File-C-Field-Offset, and File-C-Field-Length (it is assumed that the field length for File-A, File-B, and File-C are the same).
3) At run time, read in your "dynamic" requirements, which will contain records defining, for each output field, the File-C-Field-Number, the File-A-Field-Number (0 if the field is to come from File-B), and the File-B-Field-Number (0 if the field is to come from File-A).
4) As you read in each "dynamic" requirement record, use the Field-C-Field-Number as a subscript into Table-C to retrieve the Field-Offset and Field-Length for File-C. Store these values into Table-D using the Field-C-Field-Number as a subscript. Likewise, use the File-A-Field-Number (if not zero) as a subscript into Table-A to retrieve the Field-Offset for File-A and the File-B-Field-Number (if not zero) as a subscript into Table-B to retrieve the Field-Offset for File-B. Store the resulting values (zero for the offset of the unused file) into Table-D as appropriate.
5) Now, as you populate the input file records and prepare to construct File-C, loop thru Table-D from beginning to end for each output record, like the following.
In working storage:
PERFORM
VARYING SUBSCRIPT-C
FROM 1 BY 1
UNTIL SUBSCRIPT-C > MAX-FIELDS
MOVE TABLE-D-ENTRY(SUBSCRIPT-C) TO WS-ENTRY
IF WS-FILE-A-OFFSET NOT = ZERO
MOVE FILE-A-RECORD(WS-FILE-A-OFFSET:WS-FILE-C-LENGTH)
TO FILE-C-RECORD(WS-FILE-C-OFFSET:WS-FILE-C-LENGTH)
ELSE
MOVE FILE-B-RECORD(WS-FILE-B-OFFSET:WS-FILE-C-LENGTH)
TO FILE-C-RECORD(WS-FILE-C-OFFSET:WS-FILE-C-LENGTH)
END-IF
END-PERFORM
WRITE FILE-C-RECORD
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
One of the goals of a migration is to normalize entities (fields representing balance, number of..., last date of ....) - usually to convert from one file representation to another.
reference modification does not allow for the numeric conversion of fields (display to comp, comp-3, etc...).
I would prefer to drive from db2 tables, but external files read in to internal cobol tables will do also.
I would CODE every file a field to file c MOVE - without ref mod, each move within its own paragraph/section. Also, every file b to file c.
in addition, I would have routines to make adjustments (rounding, adding, subtracting factors) which would be contained in the table with the addition of a code to indicate what needed to be done to migrate from file a/b field to file c.
then have two rather large EVALUATEs (one for each file a and b) (to identify which field )(or a GOTO depending on) that would perform a routine to generate the required file c field.
you could drive it the over way and use file c fields as the determining factor for the EVALUATE with the subroutines based on file a or b.
either way, I would stay away from reference modification.
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello,
Both files need to be in order by the same "key".
Then use a match/merge to position the process within the files. If there is an entry in fileA and not B, then all of the data in C would come from A. The same with fileB.
When there is a match, then the decision whether to use fileA or fileB data comes into play.
Suggest that a member be made in a pds that has an entry for each field. These entries would contain the field name and a A/B indicator. This member would be read into the program at the beginning and used to determine the "source" for each field of the matched records.
As the phases progress, this member would be changed to reflect the current "rules". The code would simply be:
Code:
IF a-or-b-ind = 'A"
MOVE A-FLDx TO C-FLDx
ELSE
MOVE B-FLDx TO C-FLDx
END-IF
This would be repeated in the code for each field.
I would not get "cute" in an effort to save some lines of code. . . This would need to be done once and could be generated it one did not want to repeat the lines manually. . . If there are any "special" circumstances for a field, it can easily be accomodated if each field has a bit of code rather than something generic.
Written once correctly, this can be used for most if not all of the phases with no additional program changes. Only the external data need be changed.
Many thanks for taking time to read my post and posting valuable suggestions.
Dick, I did think about this before. But the issue is, I wouldn't know at run time which field on file B will hold the data that i want to move to File C. Though i can specify the mapping in the DATA member or a file/Table, can't figure out how to use that physical field name in the move statement. Well, possibly can use EVALUTE to validate all my fields in the files which is a massive task , considering the volume is in Millions/billions and each record contains about 2000 fields.
Ronald, Thanks for the well explained solution. I'm slightly apprehensive in using reference modification. Considering it's dynamic, any slight format variation will produce unpredictable results.
Thanks again for your valuable suggestions. It has definetly given me something to think about.
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
Hello,
Quote:
But the issue is, I wouldn't know at run time which field on file B will hold the data that i want to move to File C.
If this is not known at run time, there will be no run. . .
My suggestion is to store the information outside the code rather than within the code. This external control can be established moments before the actual run. The method is not difficult - it merely takes one IF construct for each field. . .
And, as i mentioned earlier, once properly written this should work for the entire conversion. . .