IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Matching with Key at different postions.


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Wed Nov 09, 2016 10:58 am
Reply with quote

Hello,

I have got a below situation where i have two files to compare , once file contains the all the keys information and other file contains all the other information associated with the key which can be at multiple location.

Here is an example :-

Key in the input file 1 is 1,3,CH
Key in the input file 2 can be at 1,3,CH or in between 17 to 46 column (30 bytes) position i.e. 3 bytes key each occurring 10 times.

Now i need to match my key in input file 1 with input file 2 to get the below output. If key is matched at any of the keys positions of input file 2 then it should write it in the output 2 file and for any key in input file 1 which is not matched in any of the key positions in input file 2 it should write to output 2

Any thoughts on how this can be achieved effectively. ??

Code:
INPUT FILE 1
----+----1----+----2----+----3----+----4----+----5----
x01                                                   
x02                                                   
x03                                                   
x04                                                   
x05                                                   
x06                                                   
x07                                                   
                                                     
INPUT FILE 2
----+----1----+----2----+----3----+----4----+----5----
x34             x01x11x21x31x41x51x61x71x81x91       
x03             x00x11x21x31x41x51x61x71x81x91       
x00             x00x12x21x35x41x52x61x71x81x91       
x11             x00x12x21x35x41x52x61x71x81x07       
                                                     
OUTPUT 1    --- records from Input file 2 with key matched from input file 2                                         
x34             x01x11x21x31x41x51x61x71x81x91       
x03             x00x11x21x31x41x51x61x71x81x91       
x11             x00x12x21x35x41x01x61x71x81x07       
                                                     
OUTPUT 2   -- the keys from input file 1 which didn't matched with input file 2                                           
x02                                                   
x04                                                   
x05                                                   
x06                                                   



Thank you, Rajat
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3053
Location: NYC,USA

PostPosted: Wed Nov 09, 2016 11:06 am
Reply with quote

Have you thought of "SS"sub string to search file 2 at the mentioned offsets?
Back to top
View user's profile Send private message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Wed Nov 09, 2016 11:10 am
Reply with quote

Yes Rohit, The only thing i don't know is how do i dynamically push the key from file 1 as constants to use the SS function for scanning in file 2.

Regards, Rajat
Back to top
View user's profile Send private message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Wed Nov 09, 2016 11:14 am
Reply with quote

The other option is to build a SS card dynamically for all the Keys in input file 1 and then use it against input file 2 but i am looking at 50K keys in input file 1 which would result in 50K lines of cards creation ... so didn't sounded very effective me to follow that approach .. unless there is any other way t use SS , But yeah thanks for point that option out.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Nov 09, 2016 2:24 pm
Reply with quote

Or get the input2-keys into multiple records and match it against input-1?
Back to top
View user's profile Send private message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Wed Nov 09, 2016 2:38 pm
Reply with quote

Yes Arun, thought of that as well but then this will potentially increase my input file 2 size effectively 10 times its original size(as I can have a key at 11 positions in the record) and it already having about 100k records in it. So was not really keen on doing that but yes really thanks for the suggestion. Regards, Rajat
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10873
Location: italy

PostPosted: Wed Nov 09, 2016 2:39 pm
Reply with quote

Quote:
The other option is to build a SS card dynamically for all the Keys in input file 1 and then use it against input file 2


given the requirement, it will not work ...
how will You determine the file1 keys not found ?

I agree with Arun, create multiple records for file2 an use joinkeys

and it will not increase anything... the multiple records will be in a temporary work dataset

100K records times 10 gives 1M records ... does not seem anything to worry about
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Nov 09, 2016 7:37 pm
Reply with quote

Like Enrico has pointed out already, 1M should not be a matter of concern.

Meanwhile it might help if you post some missing information.
- Are the input data sets already in sorted order of the keys shown? At least the input-2 does not seem to be from the sample data. But are they sorted in your 'real' data sets?
- Do you need the output data sets to be sorted OR to preserve the input order for some reason?
- What about the LRECL/RECFM of these data sets?
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Nov 09, 2016 8:11 pm
Reply with quote

Suggesting a substring search with no suggested means to achieve it... adds what to the topic?

rajatbagga,

How are you representing keys for 100,000 records where your sample data shows keys of three characters?

What is the "x" in your key representing?

Even ignoring the size of the key, do you have the possibility of multiple hits from F1 to F2? And if so, what do you want to do?
Back to top
View user's profile Send private message
RahulG31

Active User


Joined: 20 Dec 2014
Posts: 446
Location: USA

PostPosted: Wed Nov 09, 2016 9:35 pm
Reply with quote

Just an observation:

Quote:
but i am looking at 50K keys in input file 1

Key is 3 bytes and if I consider only alphabets and numerals then there could be 36*36*36 = 46656 records in Total (which is a little less than 50k) and that is the number considering no pattern in key.

I have not used special characters to come up with that number Or if the keys can be duplicated.

I think Bill has rightly pointed out about 'x':
Quote:
What is the "x" in your key representing?


If you have a key that has 1 alphabet at start and 2 numerals then you could only have 26*100 = 2600 unique values.

I am just not sure how did you come up with 50k?

.
Back to top
View user's profile Send private message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Thu Nov 10, 2016 6:15 am
Reply with quote

Hello everyone,

For the sake of my example, I mocked up the data. In actual, my key is 5,PD .

And Arun please refer to my comments below for your questions :-

Quote:
Like Enrico has pointed out already, 1M should not be a matter of concern.


Now , here is the issue. The record length of my input file 2 is 5000 FB it contains heaps of information so I was thinking even a 1M record count could result in a massive actual file size and also yeah this count varies and is not static. It may not be a matter of concern now but going forward this could potentially cause issues.

Quote:
Meanwhile it might help if you post some missing information.
- Are the input data sets already in sorted order of the keys shown? At least the input-2 does not seem to be from the sample data. But are they sorted in your 'real' data sets?
- Do you need the output data sets to be sorted OR to preserve the input order for some reason?
- What about the LRECL/RECFM of these data sets?




- Input file 1 is 80/FB sorted key 1,5,PD, input file 2 is 5000/FB its just sorted on column 11,5,PD (first key position) the other key positions starts from Column 80 to 131 (5 * 10 = 50 bytes)

- Sequence of the output datasets doesn't matter to me

- Output 1 is 5000/FB same format data as stored in input file 2 and Output 2 is 80/FB as data format as stored in input file 1.

Yes, in input file 2 , the key can be present in 11 spots (11,5,PD [first occurance] and rest spanning between 80 to 131 as 5,PD[10 occurances] ) on a single record. The same key can be located in multiple instances in the single record or can be present in other single record entries as well.

I am not sure if this can be done but I was thinking if there is a way to define these key values like an array in DFSORT so that I can first put all the keys together right at the end of the input file 2 and then define an array to do the search on my key from input file 1 if found I write the record of input file 2 to output file 1 , this way I can get all the matching records, but still need to figure out how do I get the unmatching records from input file 1 to output file 2


Thanks and Regards,
Rajat
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Thu Nov 10, 2016 12:29 pm
Reply with quote

DFSORT doesn't have arrays, nor looping constructs.

Two ways: "keys only", three steps, final output in key order; "keys+data", two steps, final output not in key order.

With the expected multiple hits, you may want to consider knowing how the match was done.

Keys Only

Extract 11 key records for each input record on the full file: main key plus the 10 additional keys. Output to include original key and key-for-matching only (10 bytes).

JOINKEYS with F1 (already in sequence, so use SORTED,NOSEQCK) as your file to cause extracts and F2 (with the duplicates and you need to let the JOINKEYS SORT it, the default action) as file created above. Output can be cut in half by not dropping the key that was matched on.

JOINKEYS with you original main file (SORTED,NOSEQCK) and the output from the first JOINKEYS (needs to be SORTed, the default action).

Keys+Data

The same as the first two steps above, no need to finally reattach the data, as you've dragged it with you all the time.

How to create the multiple records? Discussed recently, there's REPEAT on OUTFIL, ICETOOL's RESIZE, or a simple COPY with BUILD and the slash-operator (/).
Back to top
View user's profile Send private message
rajatbagga

Active User


Joined: 11 Mar 2007
Posts: 199
Location: india

PostPosted: Fri Nov 11, 2016 5:45 am
Reply with quote

Thanks Bill and all others for showing interest. I will spin my head around it based on your suggestions.

Regards,
Rajat
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Rexx pattern matching on PS qualifer ... CLIST & REXX 1
No new posts File matching functionality in Easytr... DFSORT/ICETOOL 14
No new posts Matching and non matching records usi... DFSORT/ICETOOL 11
No new posts Need assistance formatting when joini... SYNCSORT 8
No new posts One-One matching using SORT SYNCSORT 14
Search our Forums:

Back to Top