Select based on a range from a differnet file

vamsimul · New User Joined: 07 Oct 2005 Posts: 6

Hi,

I've 2 files where 1 st file has data. 2nd file has range of data.
Based on the 2nd file range, I've to extract the data from the first file.

I/p file 1: First 6 bytes is Key

AAAAA123456
BBBBB234567
CCCCC345678
DDDDD456789

I/p File2: First 6 bytes is start of the range, 2nd 6 bytes is end of the range.
AAAAABBBBBB
DDDDDDDDDD

O/p:

AAAAA123456
BBBBB234567
DDDDD456789

Request your help on the same.

sergeyken · Posted: Tue Aug 13, 2019 6:55 pm

If none of those datasets is expected to be huge, then a straightforward solution can be used, though it is quite stupid, but simple and short.

Rohit Umarjikar · Posted: Tue Aug 13, 2019 7:37 pm

I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD. The Key is 6 bytes long but in your sample data it is shorted to 5 only.

sergeyken · Posted: Tue Aug 13, 2019 8:59 pm

Rohit Umarjikar · Posted: Tue Aug 13, 2019 9:05 pm

Rohit Umarjikar · Posted: Wed Aug 14, 2019 12:22 am

The Complete JCL with SYMNAMES creation.

sergeyken · Posted: Wed Aug 14, 2019 12:32 am

sergeyken · Posted: Wed Aug 14, 2019 12:37 am

sergeyken · Posted: Wed Aug 14, 2019 12:41 am

sergeyken · Posted: Wed Aug 14, 2019 1:44 am

The specific of the original topic (which is critical compared to other sort tasks): it requires some implementation of many-to-many record comparison, one by one, with varying number of records in each part, and (desirable): easy to change key type, and key size, and key position within each dataset.

The solution involving SYMNAMES doesn't make any difference from simple sequential comparison of ONE dataset to the LIMITED and FIXED number of keys, either those keys are coded as part of SORT control statements, or "hidden" inside the SYMNAMES list.

Using SYMNAMES doesn't make any difference or improvement except shortening a bit the SORT control statements themselves; it doesn't help to achieve the original specific task of many-to-many comparison.

Rohit Umarjikar · Posted: Wed Aug 14, 2019 1:44 am

You are making your own assumption and I can not code for the whole world.

What I have suggested will work with that TS wants at this point of time and not what you assume. If TS wants something else then let it come from TS not from you. I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.

No need to make own assumption and have a no meaning conversation further. If you continue to replay back and forth (off topic talks since already solutions are posted) someone will lock this post soon and TS have no chance to communicate further and that unfair.

sergeyken · Posted: Wed Aug 14, 2019 1:58 am

sergeyken · Posted: Wed Aug 14, 2019 7:56 pm

The straightforward one-step solution given in previous example works fine for reasonable size of input dataset (let's say, up to several hundreds of thousand records, 100,000-300,000). The total number of ranges is also not expected to be very huge, but up to 1,000 ranges can be considered as acceptable amount.

That method does really produce the cartesian product in between, but it does not mean that we must avoid this thing as if it was a devil's creation. This way works fine assuming we are within a reasonable amount of data to handle; so, why not to use the simple method when it really works?

Definitely, there are other cases in this world when huge amount of records must be handled. Let's say, starting from 100M and more input records, and/or starting from 100K of valid ranges. In that case straightforward method with cartesian product highly likely would fail. Another method is needed, to process huge input dataset only once. This can be done; the only requirement: the input must be presorted by the key field.

The range file also needs to be sorted, but it can be done as part of processing, since one extra manipulation with range file is needed by this method. Two JCL steps needed, and most likely it cannot be avoided AFAIK.

sergeyken · Posted: Wed Aug 14, 2019 8:10 pm