IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

SYNCSORT keep MIN/MAX record


IBM Mainframe Forums -> SYNCSORT
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 3:00 pm
Reply with quote

Hi,

I found the DUPKEYS/MIN-MAX option for SYNCSORT, but having tested this overlays the first found record with the MIN/MAX value.

The manual seems to suggest it will keep the record with the min/max value, but it is not entirely clear.

I want to keep the record which has the MIN/MAX value. Is this possible with one SORT step?

Obviously, I realise I can do it in two SORT steps, but these are incredibly large files which take hours, I cannot afford two steps.

Thanks in advance,
Clark
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8797
Location: Welsh Wales

PostPosted: Mon Sep 17, 2018 4:10 pm
Reply with quote

Can you not use one pass of the data but writing out to two different datasets, one for MIN and one for MAX
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 4:29 pm
Reply with quote

Sorry, I didn't go into enough detail.

I don't need MIN and MAX, I just need MIN, I included MAX for the sake of tagging if other people are looking for same thing with MAX.

But no, 99.99% of records have NO duplicates. On those odd ones with duplicate keys, I want to keep the record which has the MIN value of another field.

I can exclude all but the MIN value, this keeps the MIN value of that other field, but all the other fields from the first record in sequence.

I can XDUP the duplicate records and send all of them to another file (including the MIN one) - but then I would need to re-sort to filter back in the one record (of each dup) that matched the MIN requirement.

I'm soooooooo nearly there but not quite...
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2012
Location: USA

PostPosted: Mon Sep 17, 2018 5:31 pm
Reply with quote

It would be nice if you presented here
1) the full set of your SORT statements
2) clear short sample of possible input data (only the fields critical to this functionality, not real long, or production data)
3) clear picture of actual output with comments: what exactly does not satisfy you?

Unless people do these simple steps the discussion looks rather as blah-blah-blah. Many readers just ignore questions in this style.
Back to top
View user's profile Send private message
Marso

REXX Moderator


Joined: 13 Mar 2006
Posts: 1353
Location: Israel

PostPosted: Mon Sep 17, 2018 5:55 pm
Reply with quote

I tried the following:
Code:
//STEP01   EXEC PGM=SYNCSORT
//SYSPRINT DD   SYSOUT=*   
//SYSOUT   DD   SYSOUT=*   
//SORTIN   DD   *           
00001 005 GOOD             
00002 005 GOOD             
00003 005 GOOD             
00004 001 GOOD             
00004 005 NOT MIN           
00005 005 GOOD             
00006 005 GOOD             
00007 005 GOOD             
00008 005 GOOD             
00009 001 GOOD             
00009 005 NOT MIN           
00010 005 GOOD             
00011 005 GOOD             
00012 005 GOOD             
00012 008 NOT MIN           
00013 005 GOOD             
00014 005 GOOD             
//SORTOUT  DD   SYSOUT=*   
//SYSIN    DD   *           
  SORT FIELDS=(1,5,CH,A)   
  DUPKEYS MIN=(7,3,ZD)     
//* ************************
and got for output:
Code:
****************
00001 005 GOOD 
00002 005 GOOD 
00003 005 GOOD 
00004 001 GOOD 
00005 005 GOOD 
00006 005 GOOD 
00007 005 GOOD 
00008 005 GOOD 
00009 001 GOOD 
00010 005 GOOD 
00011 005 GOOD 
00012 005 GOOD 
00013 005 GOOD 
00014 005 GOOD 
****************
so it looks like it works as specified in the documentation:
Quote:
the record with the lower value is retained, and the other record is deleted.
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 5:58 pm
Reply with quote

I want to keep the middle 1111111111 record (0001 is lowest other field), the last 2222222222 record (0010 is lowest other field), and both the 3333333333 & 4444444444 record.

Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value.

This code does NOT work, because it keeps FIRST 1111111111 record (0003 field appears in output) and overlays the (12,4,BI) field with the x'0001' from the 2nd 1111111111 rec.

I want to keep the whole record from where the MIN value is found.

Code:
//DUPKEYS  EXEC PGM=SORT                                     
//SYSIN    DD *                                               
  SORT FIELDS=(01,10,CH,A),EQUALS                             
  DUPKEYS MIN=(12,4,BI),XDUP                                 
//SYSOUT   DD SYSOUT=*                                       
//SORTIN   DD *,LRECL=80                                     
1111111111      0003                                         
1111111111      0001                                         
1111111111      0002                                         
2222222222      0019                                         
2222222222      0010                                         
3333333333   rr 9999                                         
4444444444      1234                                         
*/                                                           
//SORTOUT  DD DSN=DDDB078.XDUP.TEST.SORTOUT,                 
//            DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,SPACE=(TRK,1)
//SORTXDUP DD DSN=DDDB078.XDUP.TEST.SORTXDUP,                 
//            DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,SPACE=(TRK,1)
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 6:02 pm
Reply with quote

Hi Marso, your MIN records are always the first in sequence. Please try with duplicated key field records in opposite sequence...
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8797
Location: Welsh Wales

PostPosted: Mon Sep 17, 2018 6:17 pm
Reply with quote

Code:

//SYSIN    DD *                                               
  SORT FIELDS=(01,10,CH,A),EQUALS                             
  DUPKEYS MIN=(12,4,BI),XDUP                                 
//SYSOUT   DD SYSOUT=*                                       
//SORTIN   DD *,LRECL=80                                     
Expats patented column number headers
0        1         2
12345678901234567890

1111111111      0003                                         
1111111111      0001                                         
1111111111      0002                                         
2222222222      0019                                         
2222222222      0010                                         
3333333333   rr 9999                                         
4444444444      1234                                         

Maybe you need to look at the contents of columns 12-15, or do you need to use columns 17-20 instead ?

We cannot tell what is in those columns without you showing us.
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 8:46 pm
Reply with quote

I explained that: "Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value."
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Mon Sep 17, 2018 8:49 pm
Reply with quote

In hex display:

Code:
--------------------
1111111111      0003
FFFFFFFFFF400004FFFF
11111111110000300003
--------------------
1111111111      0001
FFFFFFFFFF400004FFFF
11111111110000100001
--------------------
1111111111      0002
FFFFFFFFFF400004FFFF
11111111110000200002
--------------------
2222222222      0019
FFFFFFFFFF400014FFFF
22222222220000900019
--------------------
2222222222      0010
FFFFFFFFFF400014FFFF
22222222220000000010
--------------------
3333333333   rr 9999
FFFFFFFFFF400994FFFF
33333333330009909999
--------------------
4444444444      1234
FFFFFFFFFF400134FFFF
44444444440002401234
--------------------
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Sep 19, 2018 3:08 am
Reply with quote

Cloink,

I don't have Syncsort, but you could try something like this to get what you want, if I understood correctly.
Code:
//DUPKEYS  EXEC PGM=ICETOOL                           
//IN       DD *                                       
1111111111      0003 REST OF THE DATA-1               
1111111111      0001 REST OF THE DATA-2               
1111111111      0002 REST OF THE DATA-3               
2222222222      0019 REST OF THE DATA-4               
2222222222      0010 REST OF THE DATA-5               
3333333333   rr 9999 REST OF THE DATA-6               
4444444444      1234 REST OF THE DATA-7               
//OUT      DD SYSOUT=*                                 
//TOOLIN   DD *                                       
  SELECT FROM(IN) TO(OUT) ON(1,10,CH) FIRST USING(CTL1)
//CTL1CNTL DD *                                       
  SORT FIELDS=(01,10,CH,A,12,4,CH,A)                   
//TOOLMSG  DD SYSOUT=*                                 
//DFSMSG   DD SYSOUT=*                                 
Back to top
View user's profile Send private message
Cloink

New User


Joined: 12 Sep 2018
Posts: 14
Location: UK

PostPosted: Wed Sep 19, 2018 12:55 pm
Reply with quote

Hmm - thanks for that Arun, I might try it - how does ICETOOL compare to SYNCSORT (anyone) in runtimes/cpu? These are BIIIIIIIIIIIIIG files....! (Ok, I'll google that q myself before someone posts a link...)
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Sep 19, 2018 7:32 pm
Reply with quote

Cloink - You're welcome. When I was at a syncsort site, we have had pretty huge files (in terms of number of records in the order of hundreds of millions, but those were of smaller record lengths), and have had no issues with the run times. Another option would be to sort the records same as above and assign a group id-sequence when key(pos 1-10) changes and write only sequence number=1 into the output using a PGM=SORT/SYNCSORT step. You could run a test with your actual data and compare the results. Good luck.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Thu Sep 20, 2018 8:37 pm
Reply with quote

How about little Google search or this forum? It gives enough links to solve your problem. Try them and if they don't work for SYNCSORT then post it back
e.g.
ibmmainframes.com/about18411.html
www.ibmmainframeforum.com/dfsort-icetool-icegener/topic8648.html
ibmmainframes.com/about56960.html
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2012
Location: USA

PostPosted: Fri Sep 21, 2018 4:05 am
Reply with quote

I'm tired now to try it myself:

You can SORT file by two of your fields, then use OUTFIL NODETAIL,REMOVECC with SECTIONS=(...,HEADER3=(build from full first record of a group))

I if you cannot, then I'll give you an example tomorrow.

P.S.
In that case you can get simultaneously MIN and MAX by using HEADER3 and TRAILER3 in the same SECTIONS group.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Fri Sep 21, 2018 5:32 am
Reply with quote

sergeyken, Right. In my first link it’s there with an example.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2012
Location: USA

PostPosted: Fri Sep 21, 2018 5:19 pm
Reply with quote

Code:
//*==========================================
//SORTMIN  EXEC PGM=SORT                     
//*                                         
//SYNOUT   DD SYSOUT=*                       
//*                                         
//SORTIN   DD *                             
2222222222    BAD! 0003     
2222222222   +MAX  0100     
2222222222    BAD! 0005     
2222222222   +MIN  0002     
2222222222    BAD! 0050     
1111111111    BAD! 0005     
1111111111   +MAX  0025     
1111111111    BAD! 0003     
1111111111   +MIN  0001     
1111111111    BAD! 0004     
3333333333   +MAX  0111     
3333333333    BAD! 0055     
3333333333   +MIN  0000     
3333333333    BAD! 0033     
3333333333    BAD! 0066     
//*-+----1----+----2----+                   
//*                                         
//SORTOUT DD  SYSOUT=*                     
//*                                         
//SYSIN   DD  *                             
 SORT FIELDS=(1,10,CH,A,                   
             20,04,CH,A)                   
 OUTFIL REMOVECC,NODETAIL,                 
        SECTIONS=(1,10,                     
                  HEADER3=(1,80),           
                  TRAILER3=(1,80))         
 END                                       
//*=========================================
 


Code:
********************************* TOP OF DATA *****
1111111111   +MIN  0001                           
1111111111   +MAX  0025                           
2222222222   +MIN  0002                           
2222222222   +MAX  0100                           
3333333333   +MIN  0000                           
3333333333   +MAX  0111                           
******************************** BOTTOM OF DATA ***
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> SYNCSORT

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts FINDREP - Only first record from give... DFSORT/ICETOOL 3
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts To find whether record count are true... DFSORT/ICETOOL 6
Search our Forums:

Back to Top