View previous topic :: View next topic
|
Author |
Message |
Sugin
New User
Joined: 10 Nov 2014 Posts: 4 Location: United States
|
|
|
|
Hi,
I have a very tricky question. It looked easy to me at first but I am not able to achieve this. I have a PS file with millions of records in it. Consider that the file has two keys. This file is sorted ascending based on one of the key. I need to group the records based on the second key without changing the order.
For Example. The input file has data like
Code: |
2AAA
2BBB
1CCC
2DDD
4EEE
6FFF
5GGG
4HHH |
The file is sorted 2 to 4 bytes.
I need to group the data based on the 1st byte but not sorting it. I want the output data to look like this.
Code: |
2AAA
2BBB
2DDD
1CCC
4EEE
4HHH
6FFF
5GGG |
Is this possible to get this output using JCL in any means?
Code'd |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Unless the "distance" between records is always only that close you are going to have to SORT the file.
This does not mean the order that you want can't be produced, it just means that it will require a SORT (at least one) as part of the process.
So, can you make realistic representative sample input and expected output. Also need to know RECFM and LRECL. |
|
Back to top |
|
|
Sugin
New User
Joined: 10 Nov 2014 Posts: 4 Location: United States
|
|
|
|
Hi Bill,
The distance between the records is not always same or close. The file that I want to achieve this is huge (about 1 million records). Its a VB file with record length 28237. The input data would look like this. (Sample with the first 30 bytes)
Code: |
XXXXXXXX00000 99999 0001AAAA
XXXXXXXX00000 99999 0001BBBB
XXXXXXXX00000 99999 0001CCCC
XXXXXXXX00000 99999 0001DDDD
XXXXXXXXA0000 Z9999 0001EEEE
XXXXXXXXA0000 Z9999 0001FFFF
XXXXXXXXA0000 Z9999 0001GGGG
XXXXXXXX00000 99999 0001HHHH
XXXXXXXX00000 99999 0001IIII
XXXXXXXXA0000 Z9999 0001JJJJ
XXXXXXXX49925 49925 0001KKKK
XXXXXXXXA0000 Z9999 0001LLLL
XXXXXXXXA0000 Z9999 0001MMMM
XXXXXXXX49925 49925 0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ |
And I am expecting the below output
Code: |
XXXXXXXX00000 99999 0001AAAA
XXXXXXXX00000 99999 0001BBBB
XXXXXXXX00000 99999 0001CCCC
XXXXXXXX00000 99999 0001DDDD
XXXXXXXX00000 99999 0001HHHH
XXXXXXXX00000 99999 0001IIII
XXXXXXXXA0000 Z9999 0001EEEE
XXXXXXXXA0000 Z9999 0001FFFF
XXXXXXXXA0000 Z9999 0001GGGG
XXXXXXXXA0000 Z9999 0001JJJJ
XXXXXXXXA0000 Z9999 0001LLLL
XXXXXXXXA0000 Z9999 0001MMMM
XXXXXXXX49925 49925 0001KKKK
XXXXXXXX49925 49925 0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ |
The first 8 bytes are always the same. I would need to group the data from position 9 to next 18 bytes, without sorting the data.
I tried to take the firstdup using ICETOOL and the joining using SORT with the actual file. Even then the data gets sorted based on the key specified in the join keys. (And I learnt that we cannot join without sorting the data)
[/code] |
|
Back to top |
|
|
Arunkumar Chandrasekaran
New User
Joined: 01 Jun 2010 Posts: 63 Location: India
|
|
|
|
Hi,
You can acheive it by exploiting SEQNUM (record number).
So,
(1) First using FIRSTDUP take first record based on first byte.
(2)Then,take the above output file and add SEQNUM using INREC.
(3)JOIN the above file (SEQNUM added) with actual file.Append the SEQNUM
for each record in F2 (actual file) from F1.here, KEY is all 4 bytes.
(4)SORT the above file based on SEQNUM.
I believe this will work.it is not tested since I am in home.Let me know if you face any issues.
Thanks,
Arun |
|
Back to top |
|
|
Arunkumar Chandrasekaran
New User
Joined: 01 Jun 2010 Posts: 63 Location: India
|
|
|
|
Sorry. I guess
Quote: |
(1) First using FIRSTDUP take first record based on first byte. |
will not give desired result since it will do SORT before extracting FIRST record. |
|
Back to top |
|
|
Sugin
New User
Joined: 10 Nov 2014 Posts: 4 Location: United States
|
|
|
|
Hi Arun,
It worked (Credits to you for throwing some light)
I should have thought of this before. My bad. I was trying to add sequence number to the huge file and working it out. Now I have got the output I needed. Here is what I did.
1. Added sequence number to the huge file and created a temp file with just the keys.
2. Used ICETOOL to extract the first entry from the key file.
3. Joined the huge file with the Key file and built my desired output.
Below is the JCL I used.
Code: |
//STEP001 EXEC PGM=SORT
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//SORTIN DD *
XXXXXXXX00000 99999 0001AAAA
XXXXXXXX00000 99999 0001BBBB
XXXXXXXX00000 99999 0001CCCC
XXXXXXXX00000 99999 0001DDDD
XXXXXXXXA0000 Z9999 0001EEEE
XXXXXXXXA0000 Z9999 0001FFFF
XXXXXXXXA0000 Z9999 0001GGGG
XXXXXXXX00000 99999 0001HHHH
XXXXXXXX00000 99999 0001IIII
XXXXXXXXA0000 Z9999 0001JJJJ
XXXXXXXX49925 49925 0001KKKK
XXXXXXXXA0000 Z9999 0001LLLL
XXXXXXXXA0000 Z9999 0001MMMM
XXXXXXXX49925 49925 0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ
//SORTOUT DD DSN=&&T1,UNIT=SYSDA,SPACE=(TRK,(1,1),RLSE),DISP=(MOD,PASS)
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSIN DD *
INREC OVERLAY=(31:SEQNUM,8,ZD)
SORT FIELDS=COPY
OUTREC FIELDS=(1:9,18,19:31,8)
//*
//STEP002 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=&&T1,DISP=(OLD,DELETE)
//OUT DD DSN=&&T2,UNIT=SYSDA,SPACE=(TRK,(1,1),RLSE),DISP=(MOD,PASS)
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,18,CH) FIRST
/*
//STEP003 EXEC PGM=SORT
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//SORTJNF1 DD *
XXXXXXXX00000 99999 0001AAAA
XXXXXXXX00000 99999 0001BBBB
XXXXXXXX00000 99999 0001CCCC
XXXXXXXX00000 99999 0001DDDD
XXXXXXXXA0000 Z9999 0001EEEE
XXXXXXXXA0000 Z9999 0001FFFF
XXXXXXXXA0000 Z9999 0001GGGG
XXXXXXXX00000 99999 0001HHHH
XXXXXXXX00000 99999 0001IIII
XXXXXXXXA0000 Z9999 0001JJJJ
XXXXXXXX49925 49925 0001KKKK
XXXXXXXXA0000 Z9999 0001LLLL
XXXXXXXXA0000 Z9999 0001MMMM
XXXXXXXX49925 49925 0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ
//SORTJNF2 DD DSN=&&T2,DISP=(OLD,DELETE)
//SORTOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSIN DD *
JOINKEYS FILES=F1,FIELDS=(9,18,A)
JOINKEYS FILES=F2,FIELDS=(1,18,A)
REFORMAT FIELDS=(F1:1,30,F2:19,8)
SORT FIELDS=(31,8,CH,A),EQUALS
OUTREC FIELDS=(1,30)
//* |
Thanks Arun and Bill for you time and suggestions. |
|
Back to top |
|
|
Arunkumar Chandrasekaran
New User
Joined: 01 Jun 2010 Posts: 63 Location: India
|
|
|
|
Happy to hear that it worked.!!!
Thanks for sharing the final code.Meanwhile I believe the 18 bytes are your second key (according to your initial mail).
Quote: |
I need to group the records based on the second key without changing the order. |
|
|
Back to top |
|
|
Sugin
New User
Joined: 10 Nov 2014 Posts: 4 Location: United States
|
|
|
|
That is Right Arun. The 18 bytes are the second key. The file is already sorted on bytes 27 to 30. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
You haven't yet addressed the fact that your big file is VB. You'll have to use OUTFIL VTOF to get your fixed-length key file.
You're sorting the big boy twice, and reading it all another time.
I think I can get rid of the two SORTs and save on a lot of data-movement as you add sequence numbers, but it looks like it might be a once-off task, so if you have a working solution, unless you have to sit and watch it for 20 hours, you'd be good already.
A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.
Extracting the keys becomes useful if you want to avoid the SORTs. Basically removing the data from where you don't want it (doesn't change the original order of the first reference of each key) and inserting the removed data after the last record of the first group of that key. |
|
Back to top |
|
|
Arunkumar Chandrasekaran
New User
Joined: 01 Jun 2010 Posts: 63 Location: India
|
|
|
|
Hi Bill,
Quote: |
A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off. |
Can you please explain it more?? I am not familiar with IFTHEN=(WHEN=GROUP.
Thanks,
Arun |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
IFTHEN=(WHEN=GROUP is a way to mark a group of records. It comes with PUSH which is similar in action to OVERLAY but which can only use data from the current record or use the specialised ID and SEQ (ID is a sequence number per group, SEQ a sequence number within the group).
It is documented in the SyncSORT manual, and you will find examples here and in the DFSORT part of the forum, and through your favourite internet search engine.
DEFSORT has KEYBEGIN for WHEN=GROUP. SyncSORT does not/may not, but it can be emulated by a SEQNUM with RESTART= and then BEGIN= for zero in that position.
Find the documentation, find some examples, experiment. If you have problems, ask a new question rather than continuing this one.
It is a very powerful and useful function. |
|
Back to top |
|
|
Arunkumar Chandrasekaran
New User
Joined: 01 Jun 2010 Posts: 63 Location: India
|
|
|
|
Sure Bill.I will let you know once i exprimented.Thank you!! |
|
Back to top |
|
|
JAYACHANDRAN THAMPY
New User
Joined: 06 Jun 2006 Posts: 8
|
|
|
|
Syncsort V1.4.2 supports KEYBEGIN for when=GROUP. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Thanks. I think you mentioned it before, here or elsewhere.
It would be great to know all the things which are now in 1.4.x which weren't there previously, and which of those are documented, or just work.
If you have something of a list, we can make it a "sticky" on this forum and extend it as more information becomes available. JNFnCNTL on JOINKEYS is supported but, I think, not documented, for instance. |
|
Back to top |
|
|
|