Merge Data in a PS file without sorting

Sugin · Posted: Tue Nov 11, 2014 2:38 am

Hi,

I have a very tricky question. It looked easy to me at first but I am not able to achieve this. I have a PS file with millions of records in it. Consider that the file has two keys. This file is sorted ascending based on one of the key. I need to group the records based on the second key without changing the order.

For Example. The input file has data like

Bill Woodger · Posted: Tue Nov 11, 2014 3:27 am

Unless the "distance" between records is always only that close you are going to have to SORT the file.

This does not mean the order that you want can't be produced, it just means that it will require a SORT (at least one) as part of the process.

So, can you make realistic representative sample input and expected output. Also need to know RECFM and LRECL.

Sugin · Posted: Tue Nov 11, 2014 8:28 pm

Hi Bill,

The distance between the records is not always same or close. The file that I want to achieve this is huge (about 1 million records). Its a VB file with record length 28237. The input data would look like this. (Sample with the first 30 bytes)

Arunkumar Chandrasekaran · New User Joined: 01 Jun 2010 Posts: 63 Location: India

Hi,

You can acheive it by exploiting SEQNUM (record number).

So,

(1) First using FIRSTDUP take first record based on first byte.

(2)Then,take the above output file and add SEQNUM using INREC.

(3)JOIN the above file (SEQNUM added) with actual file.Append the SEQNUM
for each record in F2 (actual file) from F1.here, KEY is all 4 bytes.

(4)SORT the above file based on SEQNUM.

I believe this will work.it is not tested since I am in home.Let me know if you face any issues.

Thanks,
Arun

Arunkumar Chandrasekaran · New User Joined: 01 Jun 2010 Posts: 63 Location: India

Sorry. I guess

Sugin · Posted: Wed Nov 12, 2014 10:22 pm

Hi Arun,

It worked

(Credits to you for throwing some light)

I should have thought of this before. My bad. I was trying to add sequence number to the huge file and working it out. Now I have got the output I needed. Here is what I did.

1. Added sequence number to the huge file and created a temp file with just the keys.
2. Used ICETOOL to extract the first entry from the key file.
3. Joined the huge file with the Key file and built my desired output.

Below is the JCL I used.

Arunkumar Chandrasekaran · New User Joined: 01 Jun 2010 Posts: 63 Location: India

Happy to hear that it worked.!!!

Thanks for sharing the final code.Meanwhile I believe the 18 bytes are your second key (according to your initial mail).

Sugin · Posted: Thu Nov 13, 2014 1:34 am

That is Right Arun. The 18 bytes are the second key. The file is already sorted on bytes 27 to 30.

Bill Woodger · Posted: Thu Nov 13, 2014 5:29 am

You haven't yet addressed the fact that your big file is VB. You'll have to use OUTFIL VTOF to get your fixed-length key file.

You're sorting the big boy twice, and reading it all another time.

I think I can get rid of the two SORTs and save on a lot of data-movement as you add sequence numbers, but it looks like it might be a once-off task, so if you have a working solution, unless you have to sit and watch it for 20 hours, you'd be good already.

A simple way to do it, with two SORTs, is to SORT on the key (with EQUALS) and then use IFTHEN=(WHEN=GROUP to propagate the first sequence number of a key across all records of that key, then SORT on that sequence number and strip them off.

Extracting the keys becomes useful if you want to avoid the SORTs. Basically removing the data from where you don't want it (doesn't change the original order of the first reference of each key) and inserting the removed data after the last record of the first group of that key.

Arunkumar Chandrasekaran · New User Joined: 01 Jun 2010 Posts: 63 Location: India

Hi Bill,

Bill Woodger · Posted: Thu Nov 13, 2014 2:06 pm

IFTHEN=(WHEN=GROUP is a way to mark a group of records. It comes with PUSH which is similar in action to OVERLAY but which can only use data from the current record or use the specialised ID and SEQ (ID is a sequence number per group, SEQ a sequence number within the group).

It is documented in the SyncSORT manual, and you will find examples here and in the DFSORT part of the forum, and through your favourite internet search engine.

DEFSORT has KEYBEGIN for WHEN=GROUP. SyncSORT does not/may not, but it can be emulated by a SEQNUM with RESTART= and then BEGIN= for zero in that position.

Find the documentation, find some examples, experiment. If you have problems, ask a new question rather than continuing this one.

It is a very powerful and useful function.

Arunkumar Chandrasekaran · New User Joined: 01 Jun 2010 Posts: 63 Location: India

Sure Bill.I will let you know once i exprimented.Thank you!!

JAYACHANDRAN THAMPY · New User Joined: 06 Jun 2006 Posts: 8

Syncsort V1.4.2 supports KEYBEGIN for when=GROUP.

Bill Woodger · Posted: Thu Nov 13, 2014 7:36 pm

Thanks. I think you mentioned it before, here or elsewhere.

It would be great to know all the things which are now in 1.4.x which weren't there previously, and which of those are documented, or just work.

If you have something of a list, we can make it a "sticky" on this forum and extend it as more information becomes available. JNFnCNTL on JOINKEYS is supported but, I think, not documented, for instance.