Matching with Key at different postions.

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Hello,

I have got a below situation where i have two files to compare , once file contains the all the keys information and other file contains all the other information associated with the key which can be at multiple location.

Here is an example :-

Key in the input file 1 is 1,3,CH
Key in the input file 2 can be at 1,3,CH or in between 17 to 46 column (30 bytes) position i.e. 3 bytes key each occurring 10 times.

Now i need to match my key in input file 1 with input file 2 to get the below output. If key is matched at any of the keys positions of input file 2 then it should write it in the output 2 file and for any key in input file 1 which is not matched in any of the key positions in input file 2 it should write to output 2

Any thoughts on how this can be achieved effectively. ??

Rohit Umarjikar · Posted: Wed Nov 09, 2016 11:06 am

Have you thought of "SS"sub string to search file 2 at the mentioned offsets?

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Yes Rohit, The only thing i don't know is how do i dynamically push the key from file 1 as constants to use the SS function for scanning in file 2.

Regards, Rajat

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

The other option is to build a SS card dynamically for all the Keys in input file 1 and then use it against input file 2 but i am looking at 50K keys in input file 1 which would result in 50K lines of cards creation ... so didn't sounded very effective me to follow that approach .. unless there is any other way t use SS , But yeah thanks for point that option out.

Arun Raj · Posted: Wed Nov 09, 2016 2:24 pm

Or get the input2-keys into multiple records and match it against input-1?

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Yes Arun, thought of that as well but then this will potentially increase my input file 2 size effectively 10 times its original size(as I can have a key at 11 positions in the record) and it already having about 100k records in it. So was not really keen on doing that but yes really thanks for the suggestion. Regards, Rajat

enrico-sorichetti · Posted: Wed Nov 09, 2016 2:39 pm

Arun Raj · Posted: Wed Nov 09, 2016 7:37 pm

Like Enrico has pointed out already, 1M should not be a matter of concern.

Meanwhile it might help if you post some missing information.
- Are the input data sets already in sorted order of the keys shown? At least the input-2 does not seem to be from the sample data. But are they sorted in your 'real' data sets?
- Do you need the output data sets to be sorted OR to preserve the input order for some reason?
- What about the LRECL/RECFM of these data sets?

Bill Woodger · Posted: Wed Nov 09, 2016 8:11 pm

Suggesting a substring search with no suggested means to achieve it... adds what to the topic?

rajatbagga,

How are you representing keys for 100,000 records where your sample data shows keys of three characters?

What is the "x" in your key representing?

Even ignoring the size of the key, do you have the possibility of multiple hits from F1 to F2? And if so, what do you want to do?

RahulG31 · Active User Joined: 20 Dec 2014 Posts: 446 Location: USA

Just an observation:

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Hello everyone,

For the sake of my example, I mocked up the data. In actual, my key is 5,PD .

And Arun please refer to my comments below for your questions :-

Bill Woodger · Posted: Thu Nov 10, 2016 12:29 pm

DFSORT doesn't have arrays, nor looping constructs.

Two ways: "keys only", three steps, final output in key order; "keys+data", two steps, final output not in key order.

With the expected multiple hits, you may want to consider knowing how the match was done.

Keys Only

Extract 11 key records for each input record on the full file: main key plus the 10 additional keys. Output to include original key and key-for-matching only (10 bytes).

JOINKEYS with F1 (already in sequence, so use SORTED,NOSEQCK) as your file to cause extracts and F2 (with the duplicates and you need to let the JOINKEYS SORT it, the default action) as file created above. Output can be cut in half by not dropping the key that was matched on.

JOINKEYS with you original main file (SORTED,NOSEQCK) and the output from the first JOINKEYS (needs to be SORTed, the default action).

Keys+Data

The same as the first two steps above, no need to finally reattach the data, as you've dragged it with you all the time.

How to create the multiple records? Discussed recently, there's REPEAT on OUTFIL, ICETOOL's RESIZE, or a simple COPY with BUILD and the slash-operator (/).

rajatbagga · Active User Joined: 11 Mar 2007 Posts: 199 Location: india

Thanks Bill and all others for showing interest. I will spin my head around it based on your suggestions.

Regards,
Rajat