IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Sort - taking duplicate records based on one key and merging


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Roshnii

New User


Joined: 30 Sep 2008
Posts: 45
Location: bangalore

PostPosted: Thu Feb 10, 2011 6:57 am
Reply with quote

I have two files like this


FILE A
Code:
Name      Subject     Date 
------------------------------
Tom        Maths       2010-09-01     
John       Science     2010-07-02
John       Physics     2011-01-01
Robin      Psychology2011-10-01
Anna       Biology     2010-08-01
Alice       Geography 2010-05-02
Thomas   Geology   2011-03-04
Alice        Physics      2011-01-12
Tom       Physics   2010-10-10
Alice     Biology   2011-12-12



Output file
Code:

Name      Subject1   Subject2  subject3
----------------------------------------
Tom       Maths     Physics   
John      Science   Physics
Alice     Geography Physics    Biology

I want all records from the input file where a student has one or more than one subject and in the order when the subject
was taken
Input file is 30 lrec. and each field is 10 bytes long.

Any idea how should I go about?

I have tried taking the duplicate records from the file using sortxsum
and then joining it with the original file to have all records but somehow I am not getting the desired output. Please help
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Thu Feb 10, 2011 7:39 am
Reply with quote

Quote:
I have two files like this
Roshnii,

But I could see only one input file. Are you missing something here. Which sort product/version you're using?

Please use "Code" tags while posting data/code to make it readable and to preserve the alignment.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Thu Feb 10, 2011 9:09 am
Reply with quote

Hi Arun,

The 2 files are 1 input and 1 output.

I've "Code'd" the data and it looks like all of the entries for each "key" are to be written as one output record.

Unfortunately, the max entries on a line has not been specified. Neither has the recfm. Maybe some clarification will be provided. . .
Back to top
View user's profile Send private message
Roshnii

New User


Joined: 30 Sep 2008
Posts: 45
Location: bangalore

PostPosted: Fri Feb 11, 2011 1:49 am
Reply with quote

I apolozize for the confusion :

The files that I had given is not correct.

Here is the correct requirement
Input file

Code:

Tom        Maths         
John       Science     
John       Physics     
Robin      Psychology
Anna       Biology     
Alice      Geography
Thomas     Geology   
Alice      Physics     
Tom        Physics   
Alice      Biology 
Dennis     Biology
Deniis     Biology


Output file

Code:


Tom        Maths   
Tom        Physics       
John       Science     
John       Physics     
Alice      Geography
Alice      Physics     
Alice      Biology 



The file length of input and output file is 25
name is 15 byte long and subject is 10 bytes long.


My output file should have only those students who have more than one different subjects.


I have tried using sort and was able to get the desired result, however, i ended up having many sort steps in my jcl which is
confusing.

I was curious to find out a better way to get the result and in less number of steps.

This is what I tried:

Step1: Sort the file accoring to the student name. FILE1
Step2: Get all the duplicate rows to one file base on the student name alone. FILE2
Step3: Get all duplicate rows from FILE2 based on both the student name and subject both. output: duplicates into:FILE3 Unique records into FILE3A
Step4: Remove the duplicates from FILE3: output FILE4
Step5: Join the files from step4 and step 3 , ie FILE3 and FILE4: ouput FILE5
Step6: Remove duplicates from FILE5 based on the student name and subject name: FILE 6
Step7: Copy files from step6 and step 3 into one file. i.e FILE3A and FILE6.


The above steps gives me the desired output but it is extremely confusing. I have not give the exact code of the steps that i have followed.

I would like to know a simpler way of doing the same thing.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Fri Feb 11, 2011 4:31 am
Reply with quote

Hi,

try this
Code:
//S1       EXEC PGM=SYNCTOOL                         
//TOOLMSG  DD SYSOUT=*                               
//DFSMSG   DD SYSOUT=*                               
//IN       DD DSN=input-file                         
//OUT      DD SYSOUT=*                               
//TOOLIN   DD *                                       
SELECT FROM(IN) TO(OUT) ON(1,15,CH) ALLDUPS           
/*                                                   


I'm not sure whether the order of the ouput file is what you want.


Gerry
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Fri Feb 11, 2011 6:41 am
Reply with quote

Hi Gerry,

Wont it pull the below records as well into the output, which the OP does not want?
Code:
Dennis     Biology
Deniis     Biology
Back to top
View user's profile Send private message
Garry Carroll

Senior Member


Joined: 08 May 2006
Posts: 1193
Location: Dublin, Ireland

PostPosted: Fri Feb 11, 2011 1:17 pm
Reply with quote

Quote:
Wont it pull the below records as well into the output, which the OP does not want?
Code:
Dennis     Biology
Deniis     Biology




Surely not since they're not duplicates?

Garry.
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Fri Feb 11, 2011 1:18 pm
Reply with quote

Arun !
did You watch the speeling ? icon_biggrin.gif
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Fri Feb 11, 2011 3:45 pm
Reply with quote

Oh my bad, I did n't.

Nice hint. Thanks enrico. icon_smile.gif icon_smile.gif icon_smile.gif icon_smile.gif
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Fri Feb 11, 2011 8:38 pm
Reply with quote

Whew. . .

For a short while there, i thought new "fuzzy match" functionally was available . . . icon_cool.gif

Have a great weekend!

d
Back to top
View user's profile Send private message
Roshnii

New User


Joined: 30 Sep 2008
Posts: 45
Location: bangalore

PostPosted: Fri Feb 11, 2011 9:41 pm
Reply with quote

Hi enrico,

Thanks for the solution but it did not give me the desired output.

It gave me duplicates that I dint want. And I wud have to follow the same sort steps to delete duplicates on both the student and the subject.

I have modified the input file.


Code:


Tom        Maths         
John       Science     
John       Physics     
Robin      Psychology
Anna       Biology     
Alice      Geography
Thomas     Geology   
Alice      Physics     
Tom        Physics   
Alice      Biology 
Dennis     Biology
Deniis     Biology
Tom       Maths
Alice     Physics
Robin    Psychology



Then I get this (not nessecarily in this order)

Code:


Tom        Maths   
Tom        Maths
Tom        Physics       
John        Science     
John        Physics     
Alice       Geography
Alice       Physics   
Alice       Physics 
Alice       Biology 
Robin      Psychology
Robin      Psychology




which is correct according to the input file that I had given.
But with the above input file, it does not work in one step.

I would like the input to be like this:

Code:


 
Tom        Maths
Tom        Physics       
John        Science     
John        Physics     
Alice       Geography
Alice       Physics   
Alice       Biology 



I do not want these rows



Code:


Tom        Maths 
Robin      Psychology
Robin      Psychology
Alice       Physics 



I do not want Tom with Maths twice and ALice with Physics twice as that is already recorded once.
I do not want Robin as he does not have different subjects


And I carefully checked if I made any typo this time icon_smile.gif

[/code]
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Fri Feb 11, 2011 10:03 pm
Reply with quote

You do not have to thank me...
I did nothing, just joking with Arun on speeling mistakes icon_biggrin.gif
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Sat Feb 12, 2011 5:59 pm
Reply with quote

Roshnii wrote:
Thanks for the solution but it did not give me the desired output
so the OP comes up with the corrected speeling now. icon_lol.gif

Roshni,

If you really have duplicate records for both name as well as subject, you could try some alternatives.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Mon Feb 14, 2011 6:59 am
Reply with quote

Hi,

moving targets are hard to hit.

Try this
Code:

//TOOLIN   DD *                                             
SELECT FROM(IN) TO(OUT) ON(1,15,CH) ALLDUPS USING(CTL1)     
/*                                                           
//CTL1CNTL DD *                                             
  SORT FIELDS=(1,25,CH,A)                                   
  SUM FIELDS=NONE                                           
/*                                                           



Gerry
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Mon Feb 14, 2011 1:03 pm
Reply with quote

gcicchet wrote:
moving targets are hard to hit
I knew this was going to be one of them. icon_biggrin.gif
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Need to set RC4 through JCL SORT DFSORT/ICETOOL 5
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Duplicate transid's declared using CEDA CICS 3
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Pulling a fixed number of records fro... DB2 2
Search our Forums:

Back to Top