IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Need Help with using Sort to Omit "almost" duplica


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Tue May 01, 2007 9:37 pm
Reply with quote

I have a report file that has offsetting order records that the users do not need to see. The only difference between the two records is the sign field of "+" or "-" The records represent an order the Positive amount and then a cancel the negative amount. Is there a way to eliminate all the records that are equal except for the sign? The sign is a one byte field by itself. Don't ask me why, that is how I found it..... I want to try to eliminate these using sort prior to the report program, rather than using convoluted cobol logic

6480239512497668569MADONNA M KENNEDY N04232007P43A7511543 001399104+00003225858000000000000000N 0000000000000

6480239512497668569MADONNA M KENNEDY N04232007P43A7511543 001399104-00003225858000000000000000N 0000000000000
Back to top
View user's profile Send private message
William Thompson

Global Moderator


Joined: 18 Nov 2006
Posts: 3156
Location: Tucson AZ

PostPosted: Tue May 01, 2007 10:08 pm
Reply with quote

Is there always a '-' to match every '+'?
Is there always a '+' to match every '-'?
Are there any 'other' types of records?
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Tue May 01, 2007 11:04 pm
Reply with quote

What is the RECFM and LRECL of the input files?

What is the position of the + or - character in the input records?

Can there be duplicate records within input file1? (e.g. can there be two identical DONNA M KENNEDY records in input file1?).

Can there be duplicate records within input file2?
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Tue May 01, 2007 11:13 pm
Reply with quote

Record length . . . : 143
Block size . . . . : 27885


There are more than just these records on the file.

There are positive orders and negative orders by themselves which are fine. it's the offsetting orders that come in on the same day (order/cancel) that I need to get rid of. Like in accounting they cancel each other out so they are useless to the reviewers of the report.
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Tue May 01, 2007 11:14 pm
Reply with quote

Position of the sign field is 88 for a length of 1 and the order amount is 89 for 11
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Tue May 01, 2007 11:20 pm
Reply with quote

Also, there is only one input file.. I just want to eliminate the offsetting records. When sorted the offsetting records would be next to each other one positive and one negative. there are no true duplicate records on the file, just a possibility of multiple offsetting orders. In other words several reps may have had an order and then a cancel of that order coming in on the same day. So if I am a customer and I buy an annuity for 40k and then cancel. Two records would come in +40k and -40K all other customer info would be identical on those two orders.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue May 01, 2007 11:54 pm
Reply with quote

Hello,

For my $.02, i question reading and re-writing the entire file every time this job is run just to save an hour or so's coding time. . .

It need not be convoluted at all if the +/- records are always consecutive. .
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 12:05 am
Reply with quote

I would agree if it were a huge file, but for the amount of records coming in it would be more efficient and easier to maintain if it could be done using syncsort. On average there are only 1850 or so records a day.

Which would be done in a split second. If the users ever decide they want to see the offsetting records again removing the sort process would be a snap as opposed to eliminating the code or commenting it out.

Also, I'm not big on reading ahead... read move to hold area. Read next record compare to first record in hold area, if not equal send record one through the normal processing. When done, move second record to hold area read third record etc.....
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed May 02, 2007 12:13 am
Reply with quote

Hello,

I hate being the bearer of "bad news". . . .

If somone was going to present a "sort solution" for your requirement, it would most likely be Frank - however, Frank supports DFSORT and your site uses Syncsort. Many of the neat features in DFSORT are not available in Syncsort.

Is it possible that you site has DFSORT installed as well (and are you able to use it)? Several centers i've supported have both installed. . .
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 12:30 am
Reply with quote

I'm sorry bad habit... We use DFSSORT
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed May 02, 2007 12:48 am
Reply with quote

Hi Scott,

I hate when that happens icon_smile.gif
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed May 02, 2007 1:02 am
Reply with quote

Quote:
When sorted the offsetting records would be next to each other one positive and one negative. there are no true duplicate records on the file, just a possibility of multiple offsetting orders.


What do you have to sort the records by to get them next to each other?

Can the output be in that sorted order or does it have to be in the original order?

What identifies two records as being identical with the exception of the + or - sign?

Please show a more extensive example of your input and output records including an example of multiple offsetting orders.

I can think of several ways to approach this, but I don't yet know enough about your data to know which of the approaches, if any, will work.
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 1:22 am
Reply with quote

Right now the file is being sorted by Customer number
SORT FIELDS=(1,10,CH,A)
the orders end up next to one another, but I wouldn't rely on that one field. A customer may have more than just the offsetting order/cancel. I would add order amount (89,11) product number (79,9) and a few other fields to ensure they are together. As the records are duplicates except for the sign field. one pos one neg...
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 1:46 am
Reply with quote

Positions 1-143

Customer Number 1,10
Sign 88,1
Order Amount 89,11

6480239512497668569MADONNA M KENNEDY N04232007P43A7511543 001399104+00003225858000000000000000N 0000000000000..........

6480239512497668569MADONNA M KENNEDY N04232007P43A7511543 001399104-00003225858000000000000000N 0000000000000..........



These are the two duplicates except for the sign now there could be 90 single positive order records after this and 1000 single negative order records before these two or a mixture of positive and negative orders which is more real world, that are not connected to each other in any way. there may also be other customer orders of an offsetting (pos/neg) variety that offset each other (order/cancel).

These are the only correlated records. They would be duplicate except for the pos/neg sign. one is the order (+) and one is the cancel(-) coming through in the same day. They are the same customer, same product, same everything with opposing signs.
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 1:52 am
Reply with quote

It's kind of hard to cut and paste mulitple records from the file as they are 143 bytes long and I have to cut part at a time and then piece it together on the forum

can I email it to you?
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 2:22 am
Reply with quote

Here is the first part of the record bytes 1 through 87

6480239512497668569MADONNA M KENNEDY N04242007P43A7511543 001399104
6480239512497668569MADONNA M KENNEDY N04232007P43A7511543 001399104
6489028115497487927EDWARD D JONES W04272007P13A4403900 033064643
6490639611513609272JUANITA L STONE W04272007710289757 ANH999319
6499039813283565903 EDWARD JONES, C/F W04292007PX0210647-AA ANP997983
6500980716513783949CLARA GLOVER W04272007712110351 41659P730
6519034414447604738EDWARD D JONES & CO CUST. W04272007310107693 ANH999350
6520835817349668782ON PLUMBIN CMB FBO PRECISIA04272007P35A4Q00528 033064635
6530866315326623975KENNETH L HUFFMAN W04272007P3797239439 033064858
6541039910404609768CHARLES R MAY W04272007710599934 ANH999B69
6550439910500929045DENNIS J DESMOND W04272007210146147 ANH999319
6550873019487544176WALDO F KLEKAMP W04272007161387 ANH999BT9
6550965716049769361HEALTH SER CMB FBO UNISON A04272007P35A2Q00296 033064635
6551166918499704115RACHEL R ZOLMAN S03122007953004196 534222815
6570933611049443280PHYLLIS F WATKINS W04272007710568729 ANH999CC5
6579450211214134690EDWARD D JONES & CO N04272007712494709 41659Q324
6579450211214134690EDWARD D JONES & CO N04272007712494708 41659Q654
6580864111259965888NANCY W BUSH A04272007710727444 416588606
6581195010503941098WILLIAM B JONES A09052006954121902 534222815
6581195010503941098WILLIAM B JONES A09052006954121902 534222815

Here is the last part of the records with the amount on them bytes 88 through 143

there are two offsetting records at the beginning and end of these samples.

+00003225858000000000000000N 0000000000000..........
-00003225858000000000000000N 0000000000000..........
-00002200000000000000000000 0000648372164..........
-00000035060000000000000000 0000000000000..........
-00000102433042920120006100 0000000000000..........
-00000025000000000000000000 0000000000000..........
-00000050000000000000000000 0000000000000..........
+00000182305000000000000000YC2007000000000..........
-00000555555000000000000000N 0000000000000..........
-00000120000000000000000000 0000000000000..........
-00001530000000000000000000 0000000000000..........
-00000008299000000000000000 0000000000000..........
+00000460464000000000000000YC2007000000000..........
+00032453017000000000000000N 0000655104121..........
-00000040000000000000000000 0000000000000..........
+00010000000000000000000000YC2007657500393..........
+00015000000000000000000000YC2007657500391..........
+00000050816000000000000000N 0000658482235..........
+00008598564000000000000000N 0000658457587..........
-00008598564000000000000000N 0000658457587..........
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed May 02, 2007 2:45 am
Reply with quote

Hi Scott,

If you click the "POST REPLY" button on the bottom left, you can upload an attachment from there.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed May 02, 2007 3:02 am
Reply with quote

Maybe we're overcomplicating this. If you can sort the records by specific fields to get the "pairs" together and you will never have two records in a row that have the same sort fields and the same positive amount or the same negative amount, then you can use a DFSORT/ICETOOL job like this to eliminate the +/- pairs (I assumed you want to remove the + and - records, not keep one of them).

In the example, I used the customer number (1,10,CH) and the product number (79,9,CH) as the sort fields along with the amount (89,11) without the sign, so dups will be found and removed even if the sign is different. You can add more ON fields if there are more sort fields.

Code:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG  DD SYSOUT=*
//IN DD DSN=...  input file
//OUT DD DSN=...  output file
//TOOLIN   DD    *
SELECT FROM(IN) TO(OUT) NODUPS -
  ON(1,10,CH) ON(79,9,CH) ON(89,11,ZD)
/*


If that doesn't work give you what you want, then show me an example of your input data for which it doesn't work (just the relevant fields). That will help me understand what else is involved.
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 02, 2007 3:05 am
Reply with quote

Ok, I'll try it.

On the TOOLIN DDname I can use a pds and member as opposed to instream correct?
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed May 02, 2007 3:57 am
Reply with quote

Yes, as long as the PDS has RECFM=FB and LRECL=80.
Back to top
View user's profile Send private message
ScottUrban

New User


Joined: 23 Apr 2007
Posts: 50
Location: Jefferson City, Mo

PostPosted: Wed May 09, 2007 8:06 pm
Reply with quote

Frank,

Thank you very much the ICETOOL worked like a charm. I had to add a few other field positions to ensure they were exact duplicates except for the sign, but that's it. Two lines as opposed to several paragraphs and working storage in a cobol program...

Again, thanks!
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Wed May 09, 2007 9:36 pm
Reply with quote

Glad I could help. Thanks for the feedback.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Need to set RC4 through JCL SORT DFSORT/ICETOOL 5
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts INCLUDE OMIT COND for Multiple values... DFSORT/ICETOOL 5
No new posts JCL sort card - get first day and las... JCL & VSAM 9
No new posts PuTTY - "User is not a surrogate... IBM Tools 5
Search our Forums:

Back to Top