Select based on a range from a differnet file (continued)

sergeyken · Posted: Fri Aug 16, 2019 12:37 am

The task from the original topic seems to me quite interesting, since it is a special functionality I was not faced previously.

As said before, a straightforward solution creates intermediate set of rows as cartesian product with the size equal to the product of (file1 size) * (file2 size). For a limited size of input data this approach is acceptable, and may be preferred since it's quite simple. But for huge data files another approach is needed, to provide only single scan of the huge file.

One of possible solutions has been given at the end of the original topic. It works fine for big files, but nevertheless later I found a serious drawback in that sample. When selection key ranges are not prepared correctly, e.g. some key ranges are overlapped (for instance, KKKKKK-MMMMMM, and LLLLLL-NNNNNN), then incorrect output might be produced with no indication that something was wrong. Automatic verification and/or correction of input ranges is not very trivial task (BTW, it also might be a good exercise as training on SORT algorithms and methods).

A simple decision would be just indication that input ranges are not specified correctly, to prevent further use of wrong output. Only minor update to the previous code is needed.

sergeyken · Posted: Sat Aug 17, 2019 2:10 am

The last solution from previous post prevents from producing wrong results of a huge dataset (when it becomes difficult to detect that some wrong data have been included). It causes JCL step to ABEND when given list of ranges is not normalized to non-overlapping list of left-right values.

A much better approach would be to fix the pairs of ranges in the input dataset, including any combination of the following:
1) eliminate wrong ranges, like ZZZZZZ-XXXXXX
2) combine the fully nested ranges, like AAAAAA-DDDDDD, and BBBBBB-CCCCCC into single AAAAAA-DDDDDD
3) combine partially overlapping ranges, like KKKKKK-MMMMMM, and LLLLLL-NNNNNN into KKKKKK-NNNNNN
4) combine adjacent ranges, like PPPPPP-QQQQQQ, and QQQQQQ-SSSSSS into single PPPPPPP-SSSSSS

Of course, this can be done using any computer tool - COBOL/REXX/Assembler/PLI/… +100 others. But it is preferable to always use the tool already used for master task being implemented; in this case SORT utility + JCL. It is not recommended to involve many different tools for one single task unless it is really inevitable.

Fortunately, modern SORT utilities do provide vast variety of operations on flat tables (stored in datasets) to fulfil the required input table normalization. The only thing required is - the ability to think in terms of table operations - very similar to those provided by languages like SQL.

Here is one of possible solutions, implemented as simple sequence of JCL steps, each performing SORT utility for 1-2-3 required operations on several intermediate tables.

Let's consider the steps one by one.

sergeyken · Posted: Tue Aug 20, 2019 12:01 am

Final trick in this normalizing the input ranges list.

This is for ICETOOL/SYNCTOOL lovers.