I've got the following requirement, that I'd love DFSORT to do for me - although I realize I may need to use an application program..
Basically, I've got a list of streets (identified by an unique id) and housenumbers on that street. My requirement is to make a list of streets with housenumber intervals. Also, a record must appear for even houseno and another for odd. The trick being, that "holes" in the housenumbers may appear. Take these examples:
results in 2 records:
streetId HouseNoFrom HouseNoTo Even/Odd
1 1 7 O
1 2 8 E
whereas this input:
would result in these records (as 3 and 6 is not present in input):
streetId HouseNoFrom HouseNoTo Even/Odd
1 1 1 O
1 5 7 O
1 2 4 E
1 8 8 E
Hope this one will bring some joy to you! I welcome anything that can guide me in the right direction.
Note: The odd/even thing can be achieved using MOD - divide by 2 and see if the remainder is 0 or 1.
Is it safe to assume that there are no duplicate house-numbers (like for apartment blocks) or compound house-numbers like 3-5 (for a building which has consumed multiple adjacent numbers)? No sub-divided house-numbers, like 14, 14A, 14B for an old building sub-divided into apartments? No "house names" in place of the number (like Dun Roamin, Bill's Gaff)?
Another way to put it, all correct, unique, numbers?
Are they left-aligned? What is the maximum number that can exist?
There are several ways to do it (almost certainly). Do you want for performance, clarity, or something else?
I believe this could be done but not that we should.
per me, it would involve multiple sort steps (and a decent amount of logic as well).
This is what I thought of doing:
1. Separate odd/even numbers in separate files by checking the last bit (as stated by bill). Let's consider the odd numbers file has input like this:
2. Take this file and see if the difference in consecutive numbers is greater than 2. There will be a separate step for doing this using JOINKEYS to get records like below:
3. Put the file in reverse order and do as in step 2 to get:
4. If the records are merged again with JOINKEYS then we would get:
07 A B
11 A B
5. 'A' signifies start of group and 'B' signifies end of group. If you PUSH what is present in records with 'A' (i.e. 01 for first record and so on..), you should get something like this:
01 A 01
03 B 01
07 A B 07
11 A B 11
6. A simple BUILD on this will give you the required numbers.
So, it looks to me that writing an application program would be much simpler.
I am curious to know a simpler solution. Waiting for reply from Bill. :-)
2. There can be multiple ways to find the difference between consecutive numbers. It's upto you how you do it. I used 2 files, one with all my input records and other with 1 less record and Joined them based on sequence numbers to get:
if you see here, for first record: 03 - 01 = 2
second record: 07 - 03 = 4
third record: 11 - 07 = 4
Place an 'A' where the difference is greater than 2 i.e.
I think, this can be done with DFSORT with the experts help as seen. But the point is why to make things more complex than already it is unless there is no COBOL in your shop? SO, it is easy to write a COBOL program for such complex requirements and also easy to maintain. A new person would go crazy if he is not a DFSORT expert
Also , a SAS card could be easy to achieve this in ase you are running out of choice.
STREET HOUSEFROM HOUSETO EVEN/ODD
1 1 7 O
1 2 6 E
2 1 3 O
2 7 7 O
3 3 9 O
4 1 7 O
4 2 8 E
5 1 1 O
5 5 7 O
5 2 4 E
5 8 8 E
Now, as we agreed, I don't think this is particularly complex. I know you like symbols/SYMNAMES, and in the solution they are effective for a number of reasons, not only that you can just change the input format and not have to change anything else, but they have a documentary value, and save repetition of digit-code, which is prone to typos (typo a symbol and you know about it).
The main part of the solution uses the JOINKEYS-same-dsn-offset-sequence trick. This is a simple way to get data from consecutive records "side by side". Once you have the data side-by-side, that's a large part of the game.
Because odd and even numbers must be treated separately, two sequence numbers are applied in the JNFnCNTL files, for F1 starting from 1 and 5000000 (if there are more than 5m streets in Denmark, you'll have to increase this value); for F2, starting from zero and 4999999.
Since we've mentioned two ways already to tell odd/even, here's a third, which is similar to the bit-test, but just to show TRAN=BIT in operation. The last digit of the house number is converted to a character representation of the bit-pattern, and the last of those bits tells whether even (0) or odd (1).
Having a mix of sequence numbers from two different series, the data must be sorted for each of the JOINKEYS, which will match on the sequence number.
The JOIN is UNPAIRED,F1 because the 0 and 4999999 records will not match and are not needed. The two final F1 of each sequence are also unmatched, but they are needed.
The REFORMAT statement is only presented with the required data, so that is straightforward. All the odds are together on one sequence range, and all the evens on another.
The REFORMAT statement can be amended to exclude the sequence numbers, but they are left for now for investigative purposes.
The data is then SORTed to bring the odds and evens back together.
OUTREC is then used to: subtract the adjacent house numbers (highest from lowest); make a GROUP with KEYBEGIN on the street ad odd/even marker; make a group for "distance" between numbers not being two to get a new start-number for that group, the important thing here is to have the END for this group, as this type of group may not occur within a stree, and we don't want a previous value working through; WHEN=(logical-expression) to use the correct (new) start-number on key-change.
OUTFIL selects all the end-of-key records, and formats them a little.
You'll need some work to change the symbols/SYMNAMES for the input to match your data (lengths and position of streets and numbers, avoid (or keep) any other data) but hopefully it can all be done there without having to change the actual code. If that can't be done, let me know and we should be able to fix it.
Output format is also yours to play with. I just made a simple example.
The other solution I was thinking of would not require any SORTs, but there would be more code wrapped around the Main Task code above. I may code that up if I have a moment.
If we pass the data lots of times, although we can get a solution, it will be more effective to do it in a general-purpose language instead. The trick in the solution has many applications, so experiment and understand.
Since it can be maintained just by changing the symbols/SYMNAMES, I'd hope even novices could manage it fairly well.
The code may also look long (when you need to look at it), but that is to make it easier to read and understand. Here it is after DFSORT has processed the symbols:
Yes, thanks. That is the place. The problem is with consecutive "breaks" in the range within a street. Causes the GROUP to have acted on the previous record and then the GROUP acts again before the data from the previous record can be used.
A sequence number on the GROUP then store in different locations depending on the sequence number being odd/even would do it. On it now :-)
Forgot to mention this little trick. If your OUTFIL is not presenting all the data from the input to it (because it is redundant) as is common with this type of solution, include something like this:
It allows you to see the undisturbed input to the actual OUTFIL which is doing the work.
Needs a DD in the JCL of course, which can just be SYSOUT=...
Yes, good idea. Another trick I used to grasp your code was this:
Convert the step to ICETOOL (I generally never use SORT anymore, as with ICETOOL it's easier to add more input statements, thus easier to maintain). Then I did this to see the input to JOINKEYS: