I've got the following requirement, that I'd love DFSORT to do for me - although I realize I may need to use an application program..
Basically, I've got a list of streets (identified by an unique id) and housenumbers on that street. My requirement is to make a list of streets with housenumber intervals. Also, a record must appear for even houseno and another for odd. The trick being, that "holes" in the housenumbers may appear. Take these examples:
Input:
Code:
streetId HouseNo
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
results in 2 records:
Code:
streetId HouseNoFrom HouseNoTo Even/Odd
1 1 7 O
1 2 8 E
whereas this input:
Code:
streetId HouseNo
1 1
1 2
1 4
1 5
1 7
1 8
would result in these records (as 3 and 6 is not present in input):
Code:
streetId HouseNoFrom HouseNoTo Even/Odd
1 1 1 O
1 5 7 O
1 2 4 E
1 8 8 E
Hope this one will bring some joy to you! I welcome anything that can guide me in the right direction.
Note: The odd/even thing can be achieved using MOD - divide by 2 and see if the remainder is 0 or 1.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
DFSORT can check the bits for odd/even :-)
How much data is there?
Is it safe to assume that there are no duplicate house-numbers (like for apartment blocks) or compound house-numbers like 3-5 (for a building which has consumed multiple adjacent numbers)? No sub-divided house-numbers, like 14, 14A, 14B for an old building sub-divided into apartments? No "house names" in place of the number (like Dun Roamin, Bill's Gaff)?
Another way to put it, all correct, unique, numbers?
Are they left-aligned? What is the maximum number that can exist?
There are several ways to do it (almost certainly). Do you want for performance, clarity, or something else?
Thanks for your fast reply! (thought it was night in the US now?)
Well, it's just Danish addresses, so there's not a whole lot of data - so clarity first, I guess
There CAN be duplicate numbers - but only because of a letter (separate field) - nothing else. So we can have 14A, 14B and so on. We could remove duplicates easily using SELECT FIRST.
Housenumbers in Denmark are numeric 3 - letters (separate field) A-Z.
I believe this could be done but not that we should.
per me, it would involve multiple sort steps (and a decent amount of logic as well).
This is what I thought of doing:
1. Separate odd/even numbers in separate files by checking the last bit (as stated by bill). Let's consider the odd numbers file has input like this:
01
03
07
11
2. Take this file and see if the difference in consecutive numbers is greater than 2. There will be a separate step for doing this using JOINKEYS to get records like below:
01
03
07 A
11 A
3. Put the file in reverse order and do as in step 2 to get:
11 B
07 B
03 B
01
4. If the records are merged again with JOINKEYS then we would get:
01 A
03 B
07 A B
11 A B
5. 'A' signifies start of group and 'B' signifies end of group. If you PUSH what is present in records with 'A' (i.e. 01 for first record and so on..), you should get something like this:
01 A 01
03 B 01
07 A B 07
11 A B 11
6. A simple BUILD on this will give you the required numbers.
So, it looks to me that writing an application program would be much simpler.
I am curious to know a simpler solution. Waiting for reply from Bill. :-)
I am trying to create a group for the data in vertical format, so that you can get your desired output in horizontal format.
I am marking 'A' for the start of the group and 'B' for the end of group.
Where 'A' and 'B' both are present, that means that is the only element in the group e.g. 7 and 11 in my sample data.
The points I mentioned, are part of the idea on how to do it.
2. There can be multiple ways to find the difference between consecutive numbers. It's upto you how you do it. I used 2 files, one with all my input records and other with 1 less record and Joined them based on sequence numbers to get:
0103
0307
0711
if you see here, for first record: 03 - 01 = 2
second record: 07 - 03 = 4
third record: 11 - 07 = 4
Place an 'A' where the difference is greater than 2 i.e.
I think, this can be done with DFSORT with the experts help as seen. But the point is why to make things more complex than already it is unless there is no COBOL in your shop? SO, it is easy to write a COBOL program for such complex requirements and also easy to maintain. A new person would go crazy if he is not a DFSORT expert
Also , a SAS card could be easy to achieve this in ase you are running out of choice.
STREET HOUSEFROM HOUSETO EVEN/ODD
1 1 7 O
1 2 6 E
2 1 3 O
2 7 7 O
3 3 9 O
4 1 7 O
4 2 8 E
5 1 1 O
5 5 7 O
5 2 4 E
5 8 8 E
Now, as we agreed, I don't think this is particularly complex. I know you like symbols/SYMNAMES, and in the solution they are effective for a number of reasons, not only that you can just change the input format and not have to change anything else, but they have a documentary value, and save repetition of digit-code, which is prone to typos (typo a symbol and you know about it).
The main part of the solution uses the JOINKEYS-same-dsn-offset-sequence trick. This is a simple way to get data from consecutive records "side by side". Once you have the data side-by-side, that's a large part of the game.
Because odd and even numbers must be treated separately, two sequence numbers are applied in the JNFnCNTL files, for F1 starting from 1 and 5000000 (if there are more than 5m streets in Denmark, you'll have to increase this value); for F2, starting from zero and 4999999.
Since we've mentioned two ways already to tell odd/even, here's a third, which is similar to the bit-test, but just to show TRAN=BIT in operation. The last digit of the house number is converted to a character representation of the bit-pattern, and the last of those bits tells whether even (0) or odd (1).
Having a mix of sequence numbers from two different series, the data must be sorted for each of the JOINKEYS, which will match on the sequence number.
The JOIN is UNPAIRED,F1 because the 0 and 4999999 records will not match and are not needed. The two final F1 of each sequence are also unmatched, but they are needed.
The REFORMAT statement is only presented with the required data, so that is straightforward. All the odds are together on one sequence range, and all the evens on another.
The REFORMAT statement can be amended to exclude the sequence numbers, but they are left for now for investigative purposes.
The data is then SORTed to bring the odds and evens back together.
OUTREC is then used to: subtract the adjacent house numbers (highest from lowest); make a GROUP with KEYBEGIN on the street ad odd/even marker; make a group for "distance" between numbers not being two to get a new start-number for that group, the important thing here is to have the END for this group, as this type of group may not occur within a stree, and we don't want a previous value working through; WHEN=(logical-expression) to use the correct (new) start-number on key-change.
OUTFIL selects all the end-of-key records, and formats them a little.
You'll need some work to change the symbols/SYMNAMES for the input to match your data (lengths and position of streets and numbers, avoid (or keep) any other data) but hopefully it can all be done there without having to change the actual code. If that can't be done, let me know and we should be able to fix it.
Output format is also yours to play with. I just made a simple example.
Lightly tested.
The other solution I was thinking of would not require any SORTs, but there would be more code wrapped around the Main Task code above. I may code that up if I have a moment.
RahulG31,
If we pass the data lots of times, although we can get a solution, it will be more effective to do it in a general-purpose language instead. The trick in the solution has many applications, so experiment and understand.
Rohit,
Since it can be maintained just by changing the symbols/SYMNAMES, I'd hope even novices could manage it fairly well.
The code may also look long (when you need to look at it), but that is to make it easier to read and understand. Here it is after DFSORT has processed the symbols:
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Yes, thanks. That is the place. The problem is with consecutive "breaks" in the range within a street. Causes the GROUP to have acted on the previous record and then the GROUP acts again before the data from the previous record can be used.
A sequence number on the GROUP then store in different locations depending on the sequence number being odd/even would do it. On it now :-)
A simpler sequence number per record, one digit long, is set. The difference is PUSHed to one or other location, depending on odd/even sequence number.
If neither of those left/right differences are two, then the start number of a sequence of one numbers is the current number.
Output with your new data is:
Code:
STREET HOUSEFROM HOUSETO EVEN/ODD
1 1 7 O
1 2 6 E
2 1 3 O
2 7 7 O
3 3 9 O
4 1 7 O
4 2 8 E
5 1 1 O
5 5 7 O
5 2 4 E
5 8 8 E
6 1 1 O
6 5 5 O
6 9 9 O
Edit: We need some DFSORT code to support various indentation preferences :-)
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Forgot to mention this little trick. If your OUTFIL is not presenting all the data from the input to it (because it is redundant) as is common with this type of solution, include something like this:
Code:
OUTFIL FNAMES=TMPD
It allows you to see the undisturbed input to the actual OUTFIL which is doing the work.
Needs a DD in the JCL of course, which can just be SYSOUT=...
Forgot to mention this little trick. If your OUTFIL is not presenting all the data from the input to it (because it is redundant) as is common with this type of solution, include something like this:
Code:
OUTFIL FNAMES=TMPD
It allows you to see the undisturbed input to the actual OUTFIL which is doing the work.
Needs a DD in the JCL of course, which can just be SYSOUT=...
Yes, good idea. Another trick I used to grasp your code was this:
Convert the step to ICETOOL (I generally never use SORT anymore, as with ICETOOL it's easier to add more input statements, thus easier to maintain). Then I did this to see the input to JOINKEYS:
I've now done a real test against the Danish streets. To do this, I first made the Streetcode 8,ZD, and the Housenumber 3,ZD - using your brilliant symbol file, that was a walk in the park..
Now, there's an issue with my collegues street, Anyvej (we do call it Anyway ).. There are some strange holes on this street. Here's the input:
The curious thing is that I'd expect there to be more items listed (if something is out with the definitions) rather than have something go missing.
So I made the full change to match your data, and got this (with my HEADING1):
Code:
STREET HOUSEFROM HOUSETO EVEN/ODD
00008548 009 013 O
00008548 002 016 E
00008548 020 028 E
This is the complete symbol file I used for that output:
Code:
* INPUT RECORD TO JNFNCNTL
INJ-STREET-REF,1,8,CH
SKIP,4
INJ-HOUSE-NO,*,3,ZD
POSITION,INJ-HOUSE-NO
SKIP,2
INJ-HOUSE-NO-LAST,*,1,ZD
* EXTENSION FROM JNFNCNTL
EXTJ-LAST-OF-NUMBER-BITS,*,8,CH
POSITION,EXTJ-LAST-OF-NUMBER-BITS
SKIP,7
EXTJ-LAST-BIT-OF-BITS,*,1,CH
* OUTPUT FROM JNFNCNTL
OUTJ-RECORD,1,19,CH
OUTJ-STREET-REF,1,8,CH
OUTJ-ODDEVEN-BIT,*,1,CH
OUTJ-NUMBER,*,3,CH
OUTJ-ODDEVEN-SEQ,*,7,CH
* REFORMAT RECORD
REF-F1-STREET,1,8,CH
REF-F1-ODDEVEN,*,1,CH
POSITION,REF-F1-STREET
REF-F1-STREET-ODDEVEN,=,9,CH
REF-F1-NUMBER,*,3,ZD
REF-F1-SEQUENCE,*,7,CH
REF-F2-STREET,*,8,CH
REF-F2-ODDEVEN,*,1,CH
POSITION,REF-F2-STREET
REF-F2-STREET-ODDEVEN,=,9,CH
REF-F2-NUMBER,*,3,ZD
REF-F2-SEQUENCE,*,7,CH
* EXTEND REFORMAT RECORD
REFEX-NUMBER-DIFF,*,5,CH
REFEX-START-NUMBER,*,3,CH
REFEX-SAVE-F2-NUMBER,*,3,CH
REFEX-SEQ,*,1,BI
REFEX-LEFT-DIFF,*,5,CH
I'll try some more testing as well, I think.
I'll also see if I can improve on the symbol definitions, there are several places needing lengths to be changed: it would be nice if there were just one :-)
If your symbols match the above, can you post the previous street to the one where you see the problem, please?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
That data was the same as I had. I ran it anyway, and it produced the same results as I showed previously:
Code:
STREET HOUSEFROM HOUSETO EVEN/ODD
00008548 009 013 O
00008548 002 016 E
00008548 020 028 E
That "lrecl" is a nice thing. I can' do that, as I don't have DFSORT 2.1, which greatly extended the places where a symbol could be used. It adds one unnecessary byte, but a small price to pay in this type of situation.
I can't imagine how the IFOUTLEN would affect the results. I'll try a manual one anyway.