SYNCSORT keep MIN/MAX record

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

Hi,

I found the DUPKEYS/MIN-MAX option for SYNCSORT, but having tested this overlays the first found record with the MIN/MAX value.

The manual seems to suggest it will keep the record with the min/max value, but it is not entirely clear.

I want to keep the record which has the MIN/MAX value. Is this possible with one SORT step?

Obviously, I realise I can do it in two SORT steps, but these are incredibly large files which take hours, I cannot afford two steps.

Thanks in advance,
Clark

expat · Posted: Mon Sep 17, 2018 4:10 pm

Can you not use one pass of the data but writing out to two different datasets, one for MIN and one for MAX

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

Sorry, I didn't go into enough detail.

I don't need MIN and MAX, I just need MIN, I included MAX for the sake of tagging if other people are looking for same thing with MAX.

But no, 99.99% of records have NO duplicates. On those odd ones with duplicate keys, I want to keep the record which has the MIN value of another field.

I can exclude all but the MIN value, this keeps the MIN value of that other field, but all the other fields from the first record in sequence.

I can XDUP the duplicate records and send all of them to another file (including the MIN one) - but then I would need to re-sort to filter back in the one record (of each dup) that matched the MIN requirement.

I'm soooooooo nearly there but not quite...

sergeyken · Posted: Mon Sep 17, 2018 5:31 pm

It would be nice if you presented here
1) the full set of your SORT statements
2) clear short sample of possible input data (only the fields critical to this functionality, not real long, or production data)
3) clear picture of actual output with comments: what exactly does not satisfy you?

Unless people do these simple steps the discussion looks rather as blah-blah-blah. Many readers just ignore questions in this style.

Marso · Posted: Mon Sep 17, 2018 5:55 pm

I tried the following:

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

I want to keep the middle 1111111111 record (0001 is lowest other field), the last 2222222222 record (0010 is lowest other field), and both the 3333333333 & 4444444444 record.

Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value.

This code does NOT work, because it keeps FIRST 1111111111 record (0003 field appears in output) and overlays the (12,4,BI) field with the x'0001' from the 2nd 1111111111 rec.

I want to keep the whole record from where the MIN value is found.

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

Hi Marso, your MIN records are always the first in sequence. Please try with duplicated key field records in opposite sequence...

expat · Posted: Mon Sep 17, 2018 6:17 pm

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

I explained that: "Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value."

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

In hex display:

Arun Raj · Posted: Wed Sep 19, 2018 3:08 am

Cloink,

I don't have Syncsort, but you could try something like this to get what you want, if I understood correctly.

Cloink · New User Joined: 12 Sep 2018 Posts: 14 Location: UK

Hmm - thanks for that Arun, I might try it - how does ICETOOL compare to SYNCSORT (anyone) in runtimes/cpu? These are BIIIIIIIIIIIIIG files....! (Ok, I'll google that q myself before someone posts a link...)

Arun Raj · Posted: Wed Sep 19, 2018 7:32 pm

Cloink - You're welcome. When I was at a syncsort site, we have had pretty huge files (in terms of number of records in the order of hundreds of millions, but those were of smaller record lengths), and have had no issues with the run times. Another option would be to sort the records same as above and assign a group id-sequence when key(pos 1-10) changes and write only sequence number=1 into the output using a PGM=SORT/SYNCSORT step. You could run a test with your actual data and compare the results. Good luck.

Rohit Umarjikar · Posted: Thu Sep 20, 2018 8:37 pm

How about little Google search or this forum? It gives enough links to solve your problem. Try them and if they don't work for SYNCSORT then post it back
e.g.
ibmmainframes.com/about18411.html
www.ibmmainframeforum.com/dfsort-icetool-icegener/topic8648.html
ibmmainframes.com/about56960.html

sergeyken · Posted: Fri Sep 21, 2018 4:05 am

I'm tired now to try it myself:

You can SORT file by two of your fields, then use OUTFIL NODETAIL,REMOVECC with SECTIONS=(...,HEADER3=(build from full first record of a group))

I if you cannot, then I'll give you an example tomorrow.

P.S.
In that case you can get simultaneously MIN and MAX by using HEADER3 and TRAILER3 in the same SECTIONS group.

Rohit Umarjikar · Posted: Fri Sep 21, 2018 5:32 am

sergeyken, Right. In my first link it’s there with an example.

sergeyken · Posted: Fri Sep 21, 2018 5:19 pm