I have a situation where we unload data from table (tablespace is UNICODE)
in a flat file using IBM high performance unload in "ASIS" format.
The unloaded file is passed as input to SORT JCL.
Following steps are executed on the unload file.
1. STEP1 - Convert Unicode data to Ebcdic using SORT ALTSEQ.
2. STEP2 - Output of step1 is used to create a output file - This step performs some aggregations/sorting.
I want to know, how can i eliminate STEP1, that is I want to directly process Unicode data.
NOTE: I tried to run the same SORT JCL by eliminating STEP1, i get zero records in output file. So i think SORT only understands EBCDIC data.
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
Quote:
So i think SORT only understands EBCDIC data.
Not likely, but depending upon what the SORT statements are doing it is entirely possible that you need to convert them from EBCDIC to UNICODE for your sort to work properly.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Subject to Prino's point, you can put together your two separate steps into one. On the INCLUDE you will need to specify the CH field as AQ. The LT X'C1' you'd have to decide what that means on the source data. X'C1' is "A" in EBCDIC, but the order of numbers and letters are different between ASCII and EBCDIC, so, what is that field you are testing, and what does the test tell you?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Yes, but A is in a different position in the ASCII collating sequence than in the EBCDIC one, so if you use an ASCII A or X'C1' you'll get different results from your INCLUDE. So you'd need to work out what value to put there which will give you your desired results, if you want it all in one step.
Oh. And check your SyncSORT documentation. I am assuming in this case that the INCLUDE works with AQ the same way that DFSORT does.
Yes, but A is in a different position in the ASCII collating sequence than in the EBCDIC one, so if you use an ASCII A or X'C1' you'll get different results from your INCLUDE. So you'd need to work out what value to put there which will give you your desired results, if you want it all in one step.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Sorry, just realised I'm probably chasing my tail.
If you instead change
Code:
118,1,BI,LT,X'C1')
To
Code:
118,1,AQ,LT,C'A')
You'll get the correct results (or the expected results from the code, anyway) without having to be concerned about what the X'C1' means in business terms, as the INCLUDE values being tested, for AQ fields, will have the ALTSEQ-translated values.
So, make all the fields in the INCLUDE AQ and you should be OK to merge the two steps into one.
INREC FIELDS=(1,118,TRAN=ALTSEQ)
SORT FIELDS=COPY
WER276B SYSDIAG= 3352052, 5944027, 5944027, 4574690
WER164B 8,872K BYTES OF VIRTUAL STORAGE AVAILABLE, MAX REQUESTED,
WER164B 156K BYTES RESERVE REQUESTED, 2,424K BYTES USED
WER146B 24K BYTES OF EMERGENCY SPACE ALLOCATED
WER108I SORTIN : RECFM=FB ; LRECL= 118; BLKSIZE= 27966
WER073I SORTIN : DSNAME=XXXXXX.XXXXXX.XXXXXX.SYSREC00
WER257I INREC RECORD LENGTH = 118
WER110I SORTOUT : RECFM=FB ; LRECL= 118; BLKSIZE= 27966
WER074I SORTOUT : DSNAME=XXXXXX.XXXXXX.XXXXXX.EBCDIC
WER410B 7,844K BYTES OF VIRTUAL STORAGE AVAILABLE ABOVE THE 16MEG LINE,
WER410B 0 BYTES RESERVE REQUESTED, 2,280K BYTES USED
WER211B SYNCSMF CALLED BY SYNCSORT; RC=0000
WER416B SORTIN : EXCP'S=546,UNIT=3390,DEV=C930,CHP=(B0B1B2B3B4B59697,1),VOL=
WER416B SORTOUT : EXCP'S=552,UNIT=3390,DEV=C72D,CHP=(767778797A7B5859,1),VOL=
WER416B TOTAL OF 1,098 EXCP'S ISSUED FOR COPYING
WER054I RCD IN 2321815, OUT 2321815
WER169I RELEASE 1.4 BATCH 0525 TPF LEVEL 1.0
WER052I END SYNCSORT - XXXXXXXX,STEP010,,DIAG=EC00,734A,C026,004C,A4CA,49E2,06
SORTOUT:
In Plain browse format:
Code:
BROWSE XXXXXX.XXXXXX.XXXXXX.EBCDIC Line 00000000 Col 001 080
Command ===> Scroll ===> CSR
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+----1----+---
********************************* Top of Data ***********************************************************************
15847803W10920132013-12-182013-12-17.2014-08-25-07.37.55.68252801071291702 2014-08-25-10.32.06.765533BRXTIB10NR
15814753973820132013-08-05..........?2014-08-25-07.37.56.16109301072100079 1900-01-01-01.01.01.000000BRXTIB03N
15814753973820132013-02-01..........?2014-08-25-07.37.56.16134201072100082 1900-01-01-01.01.01.000000BRXTIB03N
Same SORTOUT in HEX format:
Code:
BROWSE XXXXXX.XXXXXX.XXXXXX.EBCDIC Line 00000000 Col 001 080
Command ===> Scroll ===> CSR
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+----1----+---
----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+---
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+----1----+---
------------------------------------------------------------------------------ -------------------------------------
********************************* Top of Data ***********************************************************************
INCLUDE COND=(17,10,CH,GT,C'1900-01-01',AND,
118,1,BI,LT,X'C1')
OUTFIL FILES=OUT,NODETAIL,REMOVECC,
SECTIONS=(01,03,
04,06,
10,03,
13,04,
17,10,
27,10,
117,01,
TRAILER3=(001:001,03,X,
005:004,06,X,
012:010,03,X,
016:013,04,X,
021:017,10,X,
032:027,10,
042:037,01,X,
044:038,10,X,
055:COUNT=(M11,LENGTH=10),X,
066:064,19,X,
086:083,10,X,
097:109,08,X,
106:117,01))
WER276B SYSDIAG= 3349193, 5941165, 5941165, 4574690
WER164B 65,540K BYTES OF VIRTUAL STORAGE AVAILABLE, MAX REQUESTED,
WER164B 156K BYTES RESERVE REQUESTED, 65,512K BYTES USED
WER146B 24K BYTES OF EMERGENCY SPACE ALLOCATED
WER108I SORTIN : RECFM=FB ; LRECL= 118; BLKSIZE= 27966
WER073I SORTIN : DSNAME=XXXXXX.XXXXXX.XXXXXX.EBCDIC
WER110I SORTOUT : RECFM=FB ; LRECL= 106; BLKSIZE= 27984
WER074I SORTOUT : DSNAME=XXXXXX.XXXXXX.XXXXXX.OUT
WER410B 63M BYTES OF VIRTUAL STORAGE AVAILABLE ABOVE THE 16MEG LINE,
WER410B 0 BYTES RESERVE REQUESTED, 63M BYTES USED
WER036B G=193629,B=480,BIAS=00
WER162B 0 PREALLOCATED SORTWORK TRACKS, 5,580 DYNAMICALLY ALLOCATED,
WER162B 0 ACQUIRED IN SECONDARY EXTENTS, 3,240 RELEASED, TOTAL OF 2,140 TRA
WER045C END SORT PHASE
WER055I INSERT 0, DELETE 1300390
WER405I SORTOUT : DATA RECORDS OUT 0; TOTAL RECORDS OUT 13428
WER493I ZIIP PROCESSOR USED
WER211B SYNCSMF CALLED BY SYNCSORT; RC=0000
WER416B SORTIN : EXCP'S=13,UNIT=3390,DEV=C72D,CHP=(767778797A7B5859,1),VOL=T
WER416B OUTFIL WAS USED FOR SORTOUT
WER416B SORTWK01 : EXCP'S=19,UNIT=3390,DEV=BC08,CHP=A6A7A8A9AAAB8889,VOL=WORKD
WER416B SORTWK02 : EXCP'S=7,UNIT=3390,DEV=C91F,CHP=B0B1B2B3B4B59697,VOL=WORKDE
WER416B TOTAL OF 26 EXCP'S ISSUED FOR SORTWORKS
WER246I FILESIZE 120,528,150 BYTES
WER054I RCD IN 2321815, OUT 1021425
WER072I NOEQUALS, ELAP IN EFFECT
WER169I RELEASE 1.4 BATCH 0525 TPF LEVEL 1.0
WER052I END SYNCSORT - XXXXXXXX,STEP020,,DIAG=E600,F958,CA3B,A457,BFC6,49E2,06
SORTOUT:
Plain Browse mode:
Code:
BROWSE XXXXXX.XXXXXX.XXXXXX.OUT Line 00000000 Col 001 080
Command ===> Scroll ===> CSR
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+-
********************************* Top of Data ************************************************************
158 N74108 106 2013 2013-10-30 ..........? 2014-08-25 0000000001 0105PD58807 1900-01-01 BRXTIB03 N
158 U1362Q AA7 2013 2013-01-31 ..........? 2014-08-25 0000000003 01063975012 1900-01-01 BRXTIB03 N
158 U1362Q AA7 2013 2013-05-01 ..........? 2014-08-25 0000000003 01063975012 1900-01-01 BRXTIB03 N
Same SORTOUT in hex mode:
Code:
BROWSE XXXXXX.XXXXXX.XXXXXX.OUT Line 00000000 Col 001 080
Command ===> Scroll ===> CSR
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+-
----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+-
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+-
------------------------------------------------------------------------------ -------------------------
********************************* Top of Data ************************************************************
INCLUDE COND=(17,10,AQ,GT,C'1900-01-01',AND,
118,1,AQ,LT,C'A')
OUTFIL FILES=OUT,NODETAIL,REMOVECC,
SECTIONS=(01,03,
04,06,
10,03,
13,04,
17,10,
27,10,
117,01,
TRAILER3=(001:001,03,X,
005:004,06,X,
012:010,03,X,
016:013,04,X,
021:017,10,X,
032:027,10,
042:037,01,X,
044:038,10,X,
055:COUNT=(M11,LENGTH=10),X,
066:064,19,X,
086:083,10,X,
097:109,08,X,
106:117,01))
WER276B SYSDIAG= 11668784, 13785134, 13785134, 2807635
WER164B 65,540K BYTES OF VIRTUAL STORAGE AVAILABLE, MAX REQUESTED,
WER164B 156K BYTES RESERVE REQUESTED, 65,512K BYTES USED
WER146B 24K BYTES OF EMERGENCY SPACE ALLOCATED
WER108I SORTIN : RECFM=FB ; LRECL= 118; BLKSIZE= 27966
WER073I SORTIN : DSNAME=XXXXXX.XXXXXX.XXXXXX.SYSREC00
WER257I INREC RECORD LENGTH = 118
WER110I SORTOUT : RECFM=FB ; LRECL= 106; BLKSIZE= 27984
WER074I SORTOUT : DSNAME=XXXXXX.XXXXXX.XXXXXX.SYSREC00.OUT
WER410B 63M BYTES OF VIRTUAL STORAGE AVAILABLE ABOVE THE 16MEG LINE,
WER410B 0 BYTES RESERVE REQUESTED, 63M BYTES USED
WER036B G=27417
WER045C END SORT PHASE
WER055I INSERT 0, DELETE 2321815
WER418I DATASPACE(S) AND/OR ZSPACE USED
WER405I SORTOUT : DATA RECORDS OUT 0; TOTAL RECORDS OUT 1
WER493I ZIIP PROCESSOR USED
WER211B SYNCSMF CALLED BY SYNCSORT; RC=0000
WER416B SORTIN : EXCP'S=24,UNIT=3390,DEV=C930,CHP=(B0B1B2B3B4B59697,1),VOL=T
WER416B OUTFIL WAS USED FOR SORTOUT
WER246I FILESIZE 0 BYTES
WER054I RCD IN 2321815, OUT 0
WER072I NOEQUALS, BALANCE IN EFFECT
WER169I RELEASE 1.4 BATCH 0525 TPF LEVEL 1.0
WER052I END SYNCSORT - XXXXXXX,STEP010,,DIAG=8100,D258,A81E,A475,C9E3,6CC3,22
SORTOUT:
In Hex browse mode - Output has only 1 record instead of 13428 records.
Code:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+-
----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+----F----+-
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0----+-
------------------------------------------------------------------------------ -------------------------
********************************* Top of Data ************************************************************
Joined: 07 Feb 2009 Posts: 1306 Location: Vilnius, Lithuania
gylbharat wrote:
prino wrote:
gylbharat wrote:
Prino, I dont think we have double byte characters in table... tablespace is defined in unicode format.
You are obviously totally clueless about the format of Unicode.
What i know - unicode supports character set > 255. It includes UTF-8 and UTF-16
UTF-8 is single byte character set and UTF-16 is double byte character set.
You amaze me by being even more clueless than I initially thought.
UTF-8 encodes each of the 1,112,064 valid code points in the Unicode code space (1,114,112 code points minus 2,048 surrogate code points) using one to four 8-bit bytes (a group of 8 bits is known as an octet in the Unicode Standard).
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
You have situations where you change the collating sequence of XX to YY and then change the collating sequence of YY to ZZ.
This means that an input of XX is treated in the INCLUDE as ZZ.
You can test that this is the case by removing all the changes to what would be EBCDIC numbers (F0-F9), the letter A (C1) and the - (2D).
Your INCLUDE will then "work".
If you genuinely need to change "overlapping" values from one encoding to another I don't think you can do it in one step if you also have the INCLUDE and the SORT. Not with ALTSEQ.
The date you are including on is in ascending order whether in ASCII or in EBCDIC. Without the ALTSEQ you can use a field-type of AC which will automatically convert the literal comparator to ASCII.
That leaves the A. So we're back to "what does that actually mean?".
If you don't know, I'd suggest identifying values with counts for that field, and seeing from the data what you are dealing with. If the A can be represented in the INCLUDE, either as AC or CH, then you can look to using a big FINDREP to do the encoding conversion.
If the amount of data is large doing this would save on passing the file twice. If small, probably not worth putting the two steps together. If in the middle, try both solutions (two steps and one) and see which fits the data better.
SELECT PKEY, MIN(CHARCOL), MAX(DATE) FROM TABLE GROUP BY PKEY
can same be done through sort? Like calculating MIN and MAX on CHAR columns in SORT? I know sections and trailer3 can find min, max, count but only on numeric columns.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
If MIN and MAX can only be numeric, make them numeric. CH takes account of all 256 bit-values per byte. BI takes account of all 256 bit-values per byte. Define your "CHAR" fields as BI.