Joined: 09 Mar 2016 Posts: 9 Location: United States
Hello,
I have been working on converting a variable length, tab delimited file from (VB 512) to fixed block (FB 80) for a couple days now with limited success. I'm struggling with date conversion, among other things, and I haven't been able to find many examples of what I'm trying to do in the Syncsort manual or online and time is of the essence.
I'd like to do this in one sort step. I need to skip the first record in the input file because it contains headers that I don't want. The fields are separated by hexadecimal tab (X'05'). The input file has data in some rows that I want to process, but hexadecimal tabs (X'05') for the first 31 bytes in other rows that I don't want in my output.
The date must be converted from format 16-APR-18 to 20180416.
The double quotes and comma must be removed from the name, there must be a space between the last and first names, and the last and first names combined must not exceed 30 bytes, padded on the right with spaces or truncated on the right, if the name exceeds 30 bytes.
The dashes must be removed from the 9-digit number.
The letter "C" must be appended immediately after the last significant value in the record. This should be in position 63 of the output record. The remaining 17 bytes at the end of the record (positions 64 through 80) should be padded with spaces.
Below is the input file with 10 records. The last five records appear to be contain spaces, but they actually contain hexadecimal "05" (Tab) in the first 31 bytes. After the last byte of significant data on each input record, there are 25 hexadecimal tabs (X'05') and the remainder of the 512 VB record is padded with spaces.
Code:
field_1 field_2 field_3 field_4 field_5 field_6 field_7
CPG75145 4275 16-Apr-18 "DOE, JOHN" 123-45-6789 E A
CPG75147 3495 20-May-18 "DOE, JANE" 234-56-7891 E A
CPG75148 1010 21-Jun-18 "JAMES, LEBRON" 345-67-8912 E A
CPG75150 3500 21-Jul-18 "IRVING, KYRIE" 456-78-9123 E A
CPG75152 4860 16-Aug-18 "MAYFIELD, BAKER" 567-89-1234 O R
The records do not need to be sorted in any particular sequence. This is what the output file should look like:
Code:
123456789DOE JOHN 2018041604275CPG75145ERC
234567891DOE JANE 2018052003495CPG75147ERC
345678912JAMES LEBRON 2018062101010CPG75148ERC
456789123IRVING KYRIE 2018072103500CPG75150ERC
567891234MAYFIELD BAKER 2018081604860CPG75152ORC
Output record layout:
Positions 01 - 09: Fixed length, 9 bytes
Positions 10 - 39: Fixed length, 30 bytes, left justified and padded on the right with spaces
Positions 40 - 47: Fixed length, 8 bytes, YYYYMMDD
Positions 48 - 52: Fixed length, 5 bytes, right justified and padded with zeroes on the left
Positions 53 - 60: Fixed length, 8 bytes
Positions 61 - 61: Fixed length, 1 byte
Positions 62 - 62: Fixed length, 1 byte
Positions 63 - 80: Fixed length, 18 bytes, value "C "
I don't get to use Syncsort as often as I'd like to. Any assistance with reformatting this file would be appreciated.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
cz016m,
I don't have Syncsort, so cannot test this, but this should get you started. Good luck.
Also note that I have hard-coded the century as 20, you could change it to use the appropriate Syncsort date function to do the actual date conversion instead (Use CENTWIN parm to determine the century window while converting 2 digit years to 4 digit years).
Important fix needs to be added to make the code more robust: handling person name field more flexible, and do not change final field size, neither with, nor without comma in input data.
ABSPOS=5 is needed to ignore possible X'05' as part of RDW.
Code:
* OPTION SKIPREC=1
OMIT COND=(5,1,CH,EQ,X'05')
INREC PARSE=(%01=(ABSPOS=5,ENDBEFR=X'05',FIXLEN=08),
%02=(ENDBEFR=X'05',FIXLEN=05),
%03=(ENDBEFR=X'05',FIXLEN=09),
%04=(STARTAFT=C'"',ENDBEFR=C'"',FIXLEN=30),
%=(ENDBEFR=X'05'), (length not needed to skip field)
%05=(ENDBEFR=X'05',FIXLEN=11),
%06=(ENDBEFR=X'05',FIXLEN=01),
%07=(ENDBEFR=X'05',FIXLEN=01)),
BUILD=(1,4, RDW
%05,UFF,ZD,LENGTH=9, compressed phone number
%04,SQZ=(SHIFT=LEFT,LENGTH=30,PREBLANK=C',',MID=C' '),
* <-to provide optional comma with stable length->
C'20', fixed century (unflexible...)
%03, date as C'dd-mon-yy'
%02,UFF,ZD,LENGTH=5, five digits code
%01, long code
%06,%07, two code characters
C'C', final constant 'C'
20X) filler, to satisfy OUTFIL
SORT FIELDS=COPY
OUTREC FINDREP=(INOUT=(C'-Jan-',C'01',
C'-Feb-',C'02',
C'-Mar-',C'03',
C'-Apr-',C'04',
C'-May-',C'05',
C'-Jun-',C'06',
C'-Jul-',C'07',
C'-Aug-',C'08',
C'-Sep-',C'09',
C'-Oct-',C'10',
C'-Nov-',C'11',
C'-Dec-',C'12'))
OUTFIL VTOF,
BUILD=(5,41, initial part
50,2,48,2,46,2, exchange DD <-> YY
52,33) final part
END
Code:
********************************* TOP OF DATA *********************************
123456789DOE JOHN 2018041604275CPG75145EAC
234567891DOE JANE 2018052003495CPG75147EAC
345678912JAMES LEBRON 2018062101010CPG75148EAC
456789123IRVING KYRIE 2018072103500CPG75150EAC
567891234MAYFIELD BAKER 2018081604860CPG75152ORC
******************************** BOTTOM OF DATA *******************************
Joined: 09 Mar 2016 Posts: 9 Location: United States
Yes, there could be middle initials or multiple values in the name field, even though I didn't show them in the sample input data.
Updated input:
Code:
field_1 field_2 field_3 field_4 field_5 field_6 field_7
CPG75145 4275 16-Apr-18 "DOE, JOHN" 123-45-6789 E A
CPG75147 3495 20-May-18 "DOE, JANE" 234-56-7891 E A
CPG75148 1010 21-Jun-18 "JAMES JR, LEBRON R" 345-67-8912 E A
CPG75150 3500 21-Jul-18 "IRVING, KYRIE" 456-78-9123 E A
CPG75152 4860 16-Aug-18 "MAYFIELD, BAKER" 567-89-1234 O R
Updated output:
Code:
123456789DOE JOHN 2018041604275CPG75145ERC
234567891DOE JANE 2018052003495CPG75147ERC
345678912JAMES JR LEBRON R 2018062101010CPG75148ERC
456789123IRVING KYRIE 2018072103500CPG75150ERC
567891234MAYFIELD BAKER 2018081604860CPG75152ORC
sergeyken - you might want to modify your approach to handle embedded blanks in the firstname or lastname (though not shown in the OPs sample data).
I always do such modifications when creating code for my own job
Here I just wanted to give the idea how to handle this in appropriate way. Anybody may feel free to add his own modifications, to fulfil requirements of his own job.
This forum is not supposed to fully complete a task assigned to someone else.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
cz016m wrote:
Yes, there could be middle initials or multiple values in the name field, even though I didn't show them in the sample input data.
cz016m,
The solution I had posted earlier would handle this scenario.
If you're running into syntax issues, post the complete error message. Someone would be able to help with that.
Joined: 09 Mar 2016 Posts: 9 Location: United States
Hello,
I wanted to say thank you, Arun Raj & sergeyken, for the help. There will always be a comma in the name, so sergeyken's solution works perfectly, even when there are middle initials.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
Quote:
There is no need for any modification; this situation is handled in my approach in normal way.
My bad. I missed you have MID in your control statements. Thanks!
cz016m - Good to hear it is working for you. I don't have Syncsort, but DFSORT and the DFSORT solution gave me the expected output for both the sample inputs.