IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Merge Data in a PS file without sorting


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Sugin

New User


Joined: 10 Nov 2014
Posts: 4
Location: United States

PostPosted: Tue Nov 11, 2014 2:38 am
Reply with quote

Hi,

I have a very tricky question. It looked easy to me at first but I am not able to achieve this. I have a PS file with millions of records in it. Consider that the file has two keys. This file is sorted ascending based on one of the key. I need to group the records based on the second key without changing the order.

For Example. The input file has data like
Code:
2AAA
2BBB
1CCC
2DDD
4EEE
6FFF
5GGG
4HHH


The file is sorted 2 to 4 bytes.

I need to group the data based on the 1st byte but not sorting it. I want the output data to look like this.
Code:
2AAA
2BBB
2DDD
1CCC
4EEE
4HHH
6FFF
5GGG


Is this possible to get this output using JCL in any means?

Code'd
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Tue Nov 11, 2014 3:27 am
Reply with quote

Unless the "distance" between records is always only that close you are going to have to SORT the file.

This does not mean the order that you want can't be produced, it just means that it will require a SORT (at least one) as part of the process.

So, can you make realistic representative sample input and expected output. Also need to know RECFM and LRECL.
Back to top
View user's profile Send private message
Sugin

New User


Joined: 10 Nov 2014
Posts: 4
Location: United States

PostPosted: Tue Nov 11, 2014 8:28 pm
Reply with quote

Hi Bill,

The distance between the records is not always same or close. The file that I want to achieve this is huge (about 1 million records). Its a VB file with record length 28237. The input data would look like this. (Sample with the first 30 bytes)


Code:
XXXXXXXX00000  99999  0001AAAA     
XXXXXXXX00000  99999  0001BBBB     
XXXXXXXX00000  99999  0001CCCC
XXXXXXXX00000  99999  0001DDDD     
XXXXXXXXA0000  Z9999  0001EEEE     
XXXXXXXXA0000  Z9999  0001FFFF     
XXXXXXXXA0000  Z9999  0001GGGG   
XXXXXXXX00000  99999  0001HHHH       
XXXXXXXX00000  99999  0001IIII   
XXXXXXXXA0000  Z9999  0001JJJJ   
XXXXXXXX49925  49925  0001KKKK     
XXXXXXXXA0000  Z9999  0001LLLL   
XXXXXXXXA0000  Z9999  0001MMMM     
XXXXXXXX49925  49925  0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ



And I am expecting the below output

Code:
XXXXXXXX00000  99999  0001AAAA     
XXXXXXXX00000  99999  0001BBBB     
XXXXXXXX00000  99999  0001CCCC
XXXXXXXX00000  99999  0001DDDD 
XXXXXXXX00000  99999  0001HHHH       
XXXXXXXX00000  99999  0001IIII     
XXXXXXXXA0000  Z9999  0001EEEE     
XXXXXXXXA0000  Z9999  0001FFFF     
XXXXXXXXA0000  Z9999  0001GGGG
XXXXXXXXA0000  Z9999  0001JJJJ   
XXXXXXXXA0000  Z9999  0001LLLL   
XXXXXXXXA0000  Z9999  0001MMMM
XXXXXXXX49925  49925  0001KKKK     
XXXXXXXX49925  49925  0001NNNN
XXXXXXXXZ30011 Z30019 0001OOOO
XXXXXXXXZ30011 Z30019 0001PPPP
XXXXXXXXZ30011 Z30019 0001QQQQ


The first 8 bytes are always the same. I would need to group the data from position 9 to next 18 bytes, without sorting the data.

I tried to take the firstdup using ICETOOL and the joining using SORT with the actual file. Even then the data gets sorted based on the key specified in the join keys. (And I learnt that we cannot join without sorting the data)
[/code]
Back to top
View user's profile Send private message
Arunkumar Chandrasekaran

New User


Joined: 01 Jun 2010
Posts: 63
Location: India

PostPosted: Wed Nov 12, 2014 1:12 am
Reply with quote

Hi,

You can acheive it by exploiting SEQNUM (record number).

So,

(1) First using FIRSTDUP take first record based on first byte.

(2)Then,take the above output file and add SEQNUM using INREC.

(3)JOIN the above file (SEQNUM added) with actual file.Append the SEQNUM
for each record in F2 (actual file) from F1.here, KEY is all 4 bytes.

(4)SORT the above file based on SEQNUM.

I believe this will work.it is not tested since I am in home.Let me know if you face any issues.


Thanks,
Arun
Back to top
View user's profile Send private message
Arunkumar Chandrasekaran

New User


Joined: 01 Jun 2010
Posts: 63
Location: India

PostPosted: Wed Nov 12, 2014 1:30 am
Reply with quote

Sorry. I guess

Quote:
(1) First using FIRSTDUP take first record based on first byte.



will not give desired result since it will do SORT before extracting FIRST record.
Back to top
View user's profile Send private message
Sugin

New User


Joined: 10 Nov 2014
Posts: 4
Location: United States

PostPosted: Wed Nov 12, 2014 10:22 pm
Reply with quote

Hi Arun,


It worked icon_biggrin.gif (Credits to you for throwing some light)

I should have thought of this before. My bad. I was trying to add sequence number to the huge file and working it out. Now I have got the output I needed. Here is what I did.

1. Added sequence number to the huge file and created a temp file with just the keys.
2. Used ICETOOL to extract the first entry from the key file.
3. Joined the huge file with the Key file and built my desired output.

Below is the JCL I used.
Code:
//STEP001  EXEC PGM=SORT                                               
//TOOLMSG  DD SYSOUT=*                                                 
//DFSMSG   DD SYSOUT=*                                                 
//SORTIN   DD *                                                         
XXXXXXXX00000  99999  0001AAAA                                         
XXXXXXXX00000  99999  0001BBBB                                         
XXXXXXXX00000  99999  0001CCCC                                         
XXXXXXXX00000  99999  0001DDDD                                         
XXXXXXXXA0000  Z9999  0001EEEE                                         
XXXXXXXXA0000  Z9999  0001FFFF                                         
XXXXXXXXA0000  Z9999  0001GGGG                                         
XXXXXXXX00000  99999  0001HHHH                                         
XXXXXXXX00000  99999  0001IIII                                         
XXXXXXXXA0000  Z9999  0001JJJJ                                         
XXXXXXXX49925  49925  0001KKKK                                         
XXXXXXXXA0000  Z9999  0001LLLL                                         
XXXXXXXXA0000  Z9999  0001MMMM                                         
XXXXXXXX49925  49925  0001NNNN                                         
XXXXXXXXZ30011 Z30019 0001OOOO                                         
XXXXXXXXZ30011 Z30019 0001PPPP                                         
XXXXXXXXZ30011 Z30019 0001QQQQ                                         
//SORTOUT  DD DSN=&&T1,UNIT=SYSDA,SPACE=(TRK,(1,1),RLSE),DISP=(MOD,PASS)
//SYSPRINT DD SYSOUT=*                                                 
//SYSOUT   DD SYSOUT=*                                                 
//SYSIN    DD *                                                         
  INREC OVERLAY=(31:SEQNUM,8,ZD)                                       
  SORT FIELDS=COPY                                                     
  OUTREC FIELDS=(1:9,18,19:31,8)                                       
//*                                                                     
//STEP002  EXEC PGM=ICETOOL                                             
//TOOLMSG  DD SYSOUT=*                                                 
//DFSMSG   DD SYSOUT=*                                                 
//IN       DD DSN=&&T1,DISP=(OLD,DELETE)                               
//OUT      DD DSN=&&T2,UNIT=SYSDA,SPACE=(TRK,(1,1),RLSE),DISP=(MOD,PASS)
//TOOLIN   DD *                                                         
 SELECT FROM(IN) TO(OUT) ON(1,18,CH) FIRST                             
/*                                                                     
//STEP003  EXEC PGM=SORT                                               
//TOOLMSG  DD SYSOUT=*                                                 
//DFSMSG   DD SYSOUT=*                                                 
//SORTJNF1 DD *                                                         
XXXXXXXX00000  99999  0001AAAA                                         
XXXXXXXX00000  99999  0001BBBB                                         
XXXXXXXX00000  99999  0001CCCC                                         
XXXXXXXX00000  99999  0001DDDD                                         
XXXXXXXXA0000  Z9999  0001EEEE                                         
XXXXXXXXA0000  Z9999  0001FFFF                                         
XXXXXXXXA0000  Z9999  0001GGGG                                         
XXXXXXXX00000  99999  0001HHHH                                         
XXXXXXXX00000  99999  0001IIII                                         
XXXXXXXXA0000  Z9999  0001JJJJ                                         
XXXXXXXX49925  49925  0001KKKK                                         
XXXXXXXXA0000  Z9999  0001LLLL                                         
XXXXXXXXA0000  Z9999  0001MMMM                                         
XXXXXXXX49925  49925  0001NNNN                                         
XXXXXXXXZ30011 Z30019 0001OOOO                                         
XXXXXXXXZ30011 Z30019 0001PPPP                                         
XXXXXXXXZ30011 Z30019 0001QQQQ                                         
//SORTJNF2 DD DSN=&&T2,DISP=(OLD,DELETE)                               
//SORTOUT  DD  SYSOUT=*                                                 
//SYSPRINT DD SYSOUT=*                                                 
//SYSOUT   DD SYSOUT=*                                                 
//SYSIN    DD *                                                         
  JOINKEYS FILES=F1,FIELDS=(9,18,A)                                     
  JOINKEYS FILES=F2,FIELDS=(1,18,A)                                     
  REFORMAT FIELDS=(F1:1,30,F2:19,8)                                     
  SORT FIELDS=(31,8,CH,A),EQUALS                                       
  OUTREC FIELDS=(1,30)                                                 
//*                                                                     



Thanks Arun and Bill for you time and suggestions.
Back to top
View user's profile Send private message
Arunkumar Chandrasekaran

New User


Joined: 01 Jun 2010
Posts: 63
Location: India

PostPosted: Wed Nov 12, 2014 11:42 pm
Reply with quote

Happy to hear that it worked.!!!


Thanks for sharing the final code.Meanwhile I believe the 18 bytes are your second key (according to your initial mail).

Quote:
I need to group the records based on the second key without changing the order.
Back to top
View user's profile Send private message
Sugin

New User


Joined: 10 Nov 2014
Posts: 4
Location: United States

PostPosted: Thu Nov 13, 2014 1:34 am
Reply with quote

That is Right Arun. The 18 bytes are the second key. The file is already sorted on bytes 27 to 30. icon_smile.gif
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Nov 13, 2014 5:29 am
Reply with quote

You haven't yet addressed the fact that your big file is VB. You'll have to use OUTFIL VTOF to get your fixed-length key file.

You're sorting the big boy twice, and reading it all another time.

I think I can get rid of the two SORTs and save on a lot of data-movement as you add sequence numbers, but it looks like it might be a once-off task, so if you have a working solution, unless you have to sit and watch it for 20 hours, you'd be good already.

A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.

Extracting the keys becomes useful if you want to avoid the SORTs. Basically removing the data from where you don't want it (doesn't change the original order of the first reference of each key) and inserting the removed data after the last record of the first group of that key.
Back to top
View user's profile Send private message
Arunkumar Chandrasekaran

New User


Joined: 01 Jun 2010
Posts: 63
Location: India

PostPosted: Thu Nov 13, 2014 1:54 pm
Reply with quote

Hi Bill,

Quote:
A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.


Can you please explain it more?? I am not familiar with IFTHEN=(WHEN=GROUP.


Thanks,
Arun
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Nov 13, 2014 2:06 pm
Reply with quote

IFTHEN=(WHEN=GROUP is a way to mark a group of records. It comes with PUSH which is similar in action to OVERLAY but which can only use data from the current record or use the specialised ID and SEQ (ID is a sequence number per group, SEQ a sequence number within the group).

It is documented in the SyncSORT manual, and you will find examples here and in the DFSORT part of the forum, and through your favourite internet search engine.

DEFSORT has KEYBEGIN for WHEN=GROUP. SyncSORT does not/may not, but it can be emulated by a SEQNUM with RESTART= and then BEGIN= for zero in that position.

Find the documentation, find some examples, experiment. If you have problems, ask a new question rather than continuing this one.

It is a very powerful and useful function.
Back to top
View user's profile Send private message
Arunkumar Chandrasekaran

New User


Joined: 01 Jun 2010
Posts: 63
Location: India

PostPosted: Thu Nov 13, 2014 3:42 pm
Reply with quote

Sure Bill.I will let you know once i exprimented.Thank you!!
Back to top
View user's profile Send private message
JAYACHANDRAN THAMPY

New User


Joined: 06 Jun 2006
Posts: 8

PostPosted: Thu Nov 13, 2014 5:47 pm
Reply with quote

Syncsort V1.4.2 supports KEYBEGIN for when=GROUP.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Nov 13, 2014 7:36 pm
Reply with quote

Thanks. I think you mentioned it before, here or elsewhere.

It would be great to know all the things which are now in 1.4.x which weren't there previously, and which of those are documented, or just work.

If you have something of a list, we can make it a "sticky" on this forum and extend it as more information becomes available. JNFnCNTL on JOINKEYS is supported but, I think, not documented, for instance.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 8
No new posts Store the data for fixed length COBOL Programming 1
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
Search our Forums:

Back to Top