View previous topic :: View next topic
|
Author |
Message |
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
Hi,
I found the DUPKEYS/MIN-MAX option for SYNCSORT, but having tested this overlays the first found record with the MIN/MAX value.
The manual seems to suggest it will keep the record with the min/max value, but it is not entirely clear.
I want to keep the record which has the MIN/MAX value. Is this possible with one SORT step?
Obviously, I realise I can do it in two SORT steps, but these are incredibly large files which take hours, I cannot afford two steps.
Thanks in advance,
Clark |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Can you not use one pass of the data but writing out to two different datasets, one for MIN and one for MAX |
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
Sorry, I didn't go into enough detail.
I don't need MIN and MAX, I just need MIN, I included MAX for the sake of tagging if other people are looking for same thing with MAX.
But no, 99.99% of records have NO duplicates. On those odd ones with duplicate keys, I want to keep the record which has the MIN value of another field.
I can exclude all but the MIN value, this keeps the MIN value of that other field, but all the other fields from the first record in sequence.
I can XDUP the duplicate records and send all of them to another file (including the MIN one) - but then I would need to re-sort to filter back in the one record (of each dup) that matched the MIN requirement.
I'm soooooooo nearly there but not quite... |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2024 Location: USA
|
|
|
|
It would be nice if you presented here
1) the full set of your SORT statements
2) clear short sample of possible input data (only the fields critical to this functionality, not real long, or production data)
3) clear picture of actual output with comments: what exactly does not satisfy you?
Unless people do these simple steps the discussion looks rather as blah-blah-blah. Many readers just ignore questions in this style. |
|
Back to top |
|
|
Marso
REXX Moderator
Joined: 13 Mar 2006 Posts: 1353 Location: Israel
|
|
|
|
I tried the following:
Code: |
//STEP01 EXEC PGM=SYNCSORT
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SORTIN DD *
00001 005 GOOD
00002 005 GOOD
00003 005 GOOD
00004 001 GOOD
00004 005 NOT MIN
00005 005 GOOD
00006 005 GOOD
00007 005 GOOD
00008 005 GOOD
00009 001 GOOD
00009 005 NOT MIN
00010 005 GOOD
00011 005 GOOD
00012 005 GOOD
00012 008 NOT MIN
00013 005 GOOD
00014 005 GOOD
//SORTOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(1,5,CH,A)
DUPKEYS MIN=(7,3,ZD)
//* ************************ |
and got for output:
Code: |
****************
00001 005 GOOD
00002 005 GOOD
00003 005 GOOD
00004 001 GOOD
00005 005 GOOD
00006 005 GOOD
00007 005 GOOD
00008 005 GOOD
00009 001 GOOD
00010 005 GOOD
00011 005 GOOD
00012 005 GOOD
00013 005 GOOD
00014 005 GOOD
**************** |
so it looks like it works as specified in the documentation:
Quote: |
the record with the lower value is retained, and the other record is deleted. |
|
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
I want to keep the middle 1111111111 record (0001 is lowest other field), the last 2222222222 record (0010 is lowest other field), and both the 3333333333 & 4444444444 record.
Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value.
This code does NOT work, because it keeps FIRST 1111111111 record (0003 field appears in output) and overlays the (12,4,BI) field with the x'0001' from the 2nd 1111111111 rec.
I want to keep the whole record from where the MIN value is found.
Code: |
//DUPKEYS EXEC PGM=SORT
//SYSIN DD *
SORT FIELDS=(01,10,CH,A),EQUALS
DUPKEYS MIN=(12,4,BI),XDUP
//SYSOUT DD SYSOUT=*
//SORTIN DD *,LRECL=80
1111111111 0003
1111111111 0001
1111111111 0002
2222222222 0019
2222222222 0010
3333333333 rr 9999
4444444444 1234
*/
//SORTOUT DD DSN=DDDB078.XDUP.TEST.SORTOUT,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,SPACE=(TRK,1)
//SORTXDUP DD DSN=DDDB078.XDUP.TEST.SORTXDUP,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,SPACE=(TRK,1)
|
|
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
Hi Marso, your MIN records are always the first in sequence. Please try with duplicated key field records in opposite sequence... |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
|
|
|
|
Code: |
//SYSIN DD *
SORT FIELDS=(01,10,CH,A),EQUALS
DUPKEYS MIN=(12,4,BI),XDUP
//SYSOUT DD SYSOUT=*
//SORTIN DD *,LRECL=80
Expats patented column number headers
0 1 2
12345678901234567890
1111111111 0003
1111111111 0001
1111111111 0002
2222222222 0019
2222222222 0010
3333333333 rr 9999
4444444444 1234
|
Maybe you need to look at the contents of columns 12-15, or do you need to use columns 17-20 instead ?
We cannot tell what is in those columns without you showing us. |
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
I explained that: "Inbetween the sort-key and the other number are binary numbers to match the display numbers: this is where MIN=(12,4,BI) is looking for min value." |
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
In hex display:
Code: |
--------------------
1111111111 0003
FFFFFFFFFF400004FFFF
11111111110000300003
--------------------
1111111111 0001
FFFFFFFFFF400004FFFF
11111111110000100001
--------------------
1111111111 0002
FFFFFFFFFF400004FFFF
11111111110000200002
--------------------
2222222222 0019
FFFFFFFFFF400014FFFF
22222222220000900019
--------------------
2222222222 0010
FFFFFFFFFF400014FFFF
22222222220000000010
--------------------
3333333333 rr 9999
FFFFFFFFFF400994FFFF
33333333330009909999
--------------------
4444444444 1234
FFFFFFFFFF400134FFFF
44444444440002401234
--------------------
|
|
|
Back to top |
|
|
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
|
|
|
|
Cloink,
I don't have Syncsort, but you could try something like this to get what you want, if I understood correctly.
Code: |
//DUPKEYS EXEC PGM=ICETOOL
//IN DD *
1111111111 0003 REST OF THE DATA-1
1111111111 0001 REST OF THE DATA-2
1111111111 0002 REST OF THE DATA-3
2222222222 0019 REST OF THE DATA-4
2222222222 0010 REST OF THE DATA-5
3333333333 rr 9999 REST OF THE DATA-6
4444444444 1234 REST OF THE DATA-7
//OUT DD SYSOUT=*
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,10,CH) FIRST USING(CTL1)
//CTL1CNTL DD *
SORT FIELDS=(01,10,CH,A,12,4,CH,A)
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=* |
|
|
Back to top |
|
|
Cloink
New User
Joined: 12 Sep 2018 Posts: 14 Location: UK
|
|
|
|
Hmm - thanks for that Arun, I might try it - how does ICETOOL compare to SYNCSORT (anyone) in runtimes/cpu? These are BIIIIIIIIIIIIIG files....! (Ok, I'll google that q myself before someone posts a link...) |
|
Back to top |
|
|
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
|
|
|
|
Cloink - You're welcome. When I was at a syncsort site, we have had pretty huge files (in terms of number of records in the order of hundreds of millions, but those were of smaller record lengths), and have had no issues with the run times. Another option would be to sort the records same as above and assign a group id-sequence when key(pos 1-10) changes and write only sequence number=1 into the output using a PGM=SORT/SYNCSORT step. You could run a test with your actual data and compare the results. Good luck. |
|
Back to top |
|
|
Rohit Umarjikar
Global Moderator
Joined: 21 Sep 2010 Posts: 3053 Location: NYC,USA
|
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2024 Location: USA
|
|
|
|
I'm tired now to try it myself:
You can SORT file by two of your fields, then use OUTFIL NODETAIL,REMOVECC with SECTIONS=(...,HEADER3=(build from full first record of a group))
I if you cannot, then I'll give you an example tomorrow.
P.S.
In that case you can get simultaneously MIN and MAX by using HEADER3 and TRAILER3 in the same SECTIONS group. |
|
Back to top |
|
|
Rohit Umarjikar
Global Moderator
Joined: 21 Sep 2010 Posts: 3053 Location: NYC,USA
|
|
|
|
sergeyken, Right. In my first link it’s there with an example. |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2024 Location: USA
|
|
|
|
Code: |
//*==========================================
//SORTMIN EXEC PGM=SORT
//*
//SYNOUT DD SYSOUT=*
//*
//SORTIN DD *
2222222222 BAD! 0003
2222222222 +MAX 0100
2222222222 BAD! 0005
2222222222 +MIN 0002
2222222222 BAD! 0050
1111111111 BAD! 0005
1111111111 +MAX 0025
1111111111 BAD! 0003
1111111111 +MIN 0001
1111111111 BAD! 0004
3333333333 +MAX 0111
3333333333 BAD! 0055
3333333333 +MIN 0000
3333333333 BAD! 0033
3333333333 BAD! 0066
//*-+----1----+----2----+
//*
//SORTOUT DD SYSOUT=*
//*
//SYSIN DD *
SORT FIELDS=(1,10,CH,A,
20,04,CH,A)
OUTFIL REMOVECC,NODETAIL,
SECTIONS=(1,10,
HEADER3=(1,80),
TRAILER3=(1,80))
END
//*=========================================
|
Code: |
********************************* TOP OF DATA *****
1111111111 +MIN 0001
1111111111 +MAX 0025
2222222222 +MIN 0002
2222222222 +MAX 0100
3333333333 +MIN 0000
3333333333 +MAX 0111
******************************** BOTTOM OF DATA ***
|
|
|
Back to top |
|
|
|