View previous topic :: View next topic
|
Author |
Message |
michael james
New User
Joined: 23 Apr 2011 Posts: 5 Location: USA
|
|
|
|
I posted this in an old topic and it was suggested that a new one be created.
The field size is 33 characters. In this field is city, state and zip. I tried using unstring but some of the city names are compound and there's not always a comma separating the city from the state. Any help resolving this would be appreciated.....Thanks!
Sample records from FileAid:
5 O-RC32-NAME-ADD-4A GROUP
10 O-RC32-NAME-ADD-4 X(33) FORT WORTH, TX 76185 (easily solved)
10 O-RC32-LINE-CODE-4 X
5 O-RC32-NAME-ADD-4A GROUP
10 O-RC32-NAME-ADD-4 X(33) CENTER POINT TX 78010 (problem record)
10 O-RC32-LINE-CODE-4 X
5 O-RC32-NAME-ADD-3A GROUP
10 O-RC32-NAME-ADD-3 X(33) STATEN ISLAND NY 10309 (problem record)
10 O-RC32-LINE-CODE-3 X |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
well, you have not explained what you expect as a result,
nor have you provided enough examples.
BUT
it appears that you have datadatadatadata space STATE space ZIP.
i would delimit by space into 5 33 char areas.
Code: |
UNSTRING O-RC32-ADD-4 DELIMITED BY SPACE
INTO BLK1
,BLK2
,BLK3
,BLK4
,BLK5
END-UNSTRING
IF BLK1 GT SPACE
AND BLK2 GT SPACE
AND BLK3 GT SPACE
AND BLK4 GT SPACE
AND BLK5 GT SPACE
THEN
PERFORM GOT-TOO-MANY-CITY-STATE-ZIPS
GO TO END-OF-THIS-MESS
END-IF
EVALUATE TRUE
WHEN BLK5 GT SPACE
MOVE BLK5 TO ZIP
MOVE SPACE TO BLK5
WHEN BLK4 GT SPACE
MOVE BLK4 TO ZIP
MOVE SPACE TO BLK4
WHEN BLK3 GT SPACE
MOVE BLK3 TO ZIP
MOVE SPACE TO BLK3
WHEN OTHER
PERFORM HAVE-NO-CITY-ST-ZIP
GO TO OUT-OF-THIS-MESS
END-EVALUATE
EVALUATE TRUE
WHEN BLK4 GT SPACE
MOVE BLK4 TO STATE
MOVE SPACE TO BLK4
WHEN BLK3 GT SPACE
MOVE BLK3 TO STATE
MOVE SPACE TO BLK3
WHEN OTHER
PERFORM HAVE-NO-CITY-ST
GO TO OUT-OF-THIS-MESS
END-EVALUATE
STRING BLK1 DELIMITED BY SPACE
BLK2 DELIMITED BY SPACE
BLK3 DELIMITED BY SPACE
INTO CITY
END-STRING
|
I imagine some clever indexing or reference mod may work
but I like to keep my stuff simple,
so that I or somebody else can modify it.
but the above is just an idea... happy weekend. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
Use FUNCTION REVERSE, then you can strip out the zip code and state fields quite easily, leaving the city field. If there's a comma in the city field, get rid of it. Just don't forget to use FUNCTION REVERSE again on the individual city, state, zip variables. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
The last two elements, which should I'm guessing be fixed in length, when stripped off will leave the city - with or without a comma, or other potential extraneous items.
However, I don't suppose we can/should rely on the size of those, or even on their presence.
Are you going to update these on the file so it becomes "easy" or go through this processing each time? If the latter, I'd do it in a called module.
If data inconsistencies are found, what do you want to do? - you definitely have some, which is causing you the problem, some have a comma, some not. What about zip wrong length/missing, State wrong length/missing. "Invalid" for these would have to be something else.
Sorry, a hiatus before a solution, maybe giving you time to answer. I have an appointment. |
|
Back to top |
|
|
michael james
New User
Joined: 23 Apr 2011 Posts: 5 Location: USA
|
|
|
|
Thanks for the responses.....I just got out of a meeting and have a couple more berfore the end of the day......I will respond afterwards. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
Quote: |
which should I'm guessing be fixed in length |
Bill -- actually, that'll depend. Zip codes can have two leading zeroes (US Virgin Islands and Puerto Rico) or one leading zero (Maine to New Jersey) and if the data came from (for example) Excel, the leading zeroes may have been suppressed. So there are cases where a zip code comes in with 3, 4, or 5 digits. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
And what if some have the zip+4 format? |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
Dick -- don't get met started on zip+4 WITH dashes and zip+4 WITHOUT dashes and postal carrier routes and ... -- I've spent WAY too much time lately with address mailing lists! |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Here are some test results from my code.
I have left the "," after the city if it exists.
I have included an example with leading blanks (data-entry error).
I was also thinking of data-entry errors for length of State and Zip, but nice to know that there are some differences for Zip anyway. If a further element is needed ("without dashes" or "postal carrier") then things get more "interesting", as to realise where the City ends (other than noticing the increasing amount of bushes, trees and fields) you rely on there being a fixed number of elements at the end of the line. Otherwise, you'd have to realiably split at the state (before and after).
Robert (or anyone else), What do US Forces postal addresses look like? UK ones were an "exception" to the UK Postal Code system (BFPO followed by a number).
Michael, if the tests look like they are any use to you, I can post the code, or you can answer questions/describe better how you want the output first. I'm not certain you want state/zip seperated, but I did it anyway.
Code: |
INPUT>C S Z <
CITY >C <>+0002<>C <
STATE>S <>+0002<>S <
ZIP >Z <>+0029<>Z <
XXXXX>C S Z <
INPUT>C S Z <
CITY >C <>+0003<>C <
STATE>S <>+0004<>S <
ZIP >Z <>+0026<>Z <
XXXXX>C S Z <
INPUT>S Z <
CITY > <>+0000<><
STATE>S <>+0002<>S <
ZIP >Z <>+0031<>Z <
XXXXX>S Z <
INPUT>Z <
CITY > <>+0000<><
STATE> <>+0000<><
ZIP >Z <>+0033<>Z <
XXXXX>Z <
INPUT>C <
CITY > <>+0000<><
STATE> <>+0000<><
ZIP >C <>+0033<>C <
XXXXX>C <
INPUT> <
CITY > <>+0000<><
STATE> <>+0000<><
ZIP > <>+0000<><
XXXXX> <
INPUT>C C C C C S Z <
CITY >C C C C C <>+0010<>C C C C C <
STATE>S <>+0002<>S <
ZIP >Z <>+0021<>Z <
XXXXX>C C C C C S Z <
INPUT>123456789012345678901234567890123<
CITY > <>+0000<><
STATE> <>+0000<><
ZIP >123456789012345678901234567890123<>+0033<>123456789012345678901234567890123<
XXXXX>123456789012345678901234567890123<
INPUT>FORT WORTH, TX 76185 <
CITY >FORT WORTH, <>+0012<>FORT WORTH, <
STATE>TX <>+0003<>TX <
ZIP >76185 <>+0018<>76185 <
XXXXX>FORT WORTH, TX 76185 <
INPUT>CENTER POINT TX 78010 <
CITY >CENTER POINT <>+0013<>CENTER POINT <
STATE>TX <>+0003<>TX <
ZIP >78010 <>+0017<>78010 <
XXXXX>CENTER POINT TX 78010 <
INPUT>STATEN ISLAND NY 10309 <
CITY >STATEN ISLAND <>+0014<>STATEN ISLAND <
STATE>NY <>+0003<>NY <
ZIP >10309 <>+0016<>10309 <
XXXXX>STATEN ISLAND NY 10309 <
INPUT> STATEN ISLAND NY 10309 <
CITY > STATEN ISLAND <>+0017<> STATEN ISLAND <
STATE>NY <>+0003<>NY <
ZIP >10309 <>+0013<>10309 <
XXXXX> STATEN ISLAND NY 10309 < |
|
|
Back to top |
|
|
michael james
New User
Joined: 23 Apr 2011 Posts: 5 Location: USA
|
|
|
|
05 WS-CITY PIC X(15) VALUE SPACES.
05 WS-STATE PIC X(08) VALUE SPACES. (Mostly 2 characters long)
05 WS-ZIP PIC X(10) VALUE SPACES.
I forgot to include the above in my original post.
Bill Woodger could you include your code? |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
US overseas military addresses are typically a PO box with city being APO or FPO (Army or Fleet Post Office), the state is AE for European bases, AP for Pacific bases, and so forth (there's 4 to 6 -- I forget the exact number -- of designations), and a zip code. The zip+4 is the box number, as is usual for PO boxes. The PO box can have several formats -- CMR and POB or POB only (and I don't recall what CMR stands for -- I suppose I could look it up in my copy of the USPS manuals) being the most common. Domestic military addresses generally follow the street / city / state / zip format. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I tested it as a stand-alone ( a little compiler on my PC), and have just hacked out the other bits to get the relevant code.
Principle is to find the lengths of the last two "words" on the line, including trailing blanks. Gives you the length of the first part of the line, with the city in it (or therabouts). Looks for first non-blank. Looks for next blank (then storing the start of the word, although the displacement is used later when calculating the length). Does this for both the last words (if present) on the line.
Using Occurs Depending On, split the input address line into the three elements.
The defintion of the ODO items is in 3 X 33 bytes to allow for compiling with SSRANGE.
I had to choose subscript or index. Doesn't really get used much. I chose one, then chose the other knowing that my first choice would be wrong. So it is with subscripts, or a subscript.
Has some belt-and-braces in case the field lengths change but they aren't all done.
Code: |
01 W-CITY-STATE-ZIP-SPLIT.
05 W-CSSZ-PRESUMED-CITY.
10 FILLER OCCURS 0 TO 33 TIMES
DEPENDING ON W-LENGTH-OF-PRESUMED-CITY.
15 FILLER PIC X.
05 W-CSSZ-PRESUMED-STATE.
10 FILLER OCCURS 0 TO 33 TIMES
DEPENDING ON W-LENGTH-OF-PRESUMED-STATE.
15 FILLER PIC X.
05 W-CSSZ-PRESUMED-ZIP.
10 FILLER OCCURS 0 TO 33 TIMES
DEPENDING ON W-LENGTH-OF-PRESUMED-ZIP.
15 FILLER PIC X.
01 W-LENGTH-OF-PRESUMED-CITY COMP PIC S9(4).
01 W-LENGTH-OF-PRESUMED-STATE COMP PIC S9(4).
01 W-LENGTH-OF-PRESUMED-ZIP COMP PIC S9(4).
01 W-START-OF-STATE COMP PIC S9(4).
01 W-START-OF-LINE COMP PIC S9(4).
01 W-START-OF-ZIP COMP PIC S9(4).
01 W-LENGTH-OF-CITY-STATE-ZIP COMP PIC S9(4).
01 W-NEED-A-DISPLACEMENT COMP PIC S9(4) VALUE +1.
01 W-CITY-STATE-ZIP-WORK.
05 FILLER OCCURS 33 TIMES.
10 W-CSZW-BYTE PIC X.
01 W-CITY-STATE-ZIP-SUB COMP PIC S9(4).
XX-THE-START-UP.
MOVE LENGTH OF W-CITY-STATE-ZIP-WORK
TO W-LENGTH-OF-CITY-STATE-ZIP
IF ( W-LENGTH-OF-CITY-STATE-ZIP
NOT EQUAL TO LENGTH OF DATA-NAME-FROM-FILE )
DISPLAY "DENMARK TIME"
DISPLAY ">" W-LENGTH-OF-CITY-STATE-ZIP "<"
DISPLAY ">" LENGTH OF DATA-NAME-FROM-FILE "<"
STOP RUN
END-IF
.
XX-THE-GUTS.
MOVE DATA-NAME-FROM-FILE TO W-CITY-STATE-ZIP-WORK
IF ( W-CITY-STATE-ZIP-WORK EQUAL TO SPACE )
PERFORM XX-NO-ADDRESS-LINE
ELSE
PERFORM XX-PROCESS-ADDRESS-LINE
END-IF
MOVE W-CITY-STATE-ZIP-WORK TO W-CITY-STATE-ZIP-SPLIT
MOVE W-CSSZ-PRESUMED-CITY TO THE-CITY
MOVE W-CSSZ-PRESUMED-STATE TO THE-STATE
MOVE W-CSSZ-PRESUMED-ZIP TO THE-ZIP
XX-THE-BIT-WHERE-WE-STOP.
STOP RUN/GOBACK
.
XX-NO-ADDRESS-LINE.
MOVE ZERO TO W-LENGTH-OF-PRESUMED-ZIP
W-LENGTH-OF-PRESUMED-STATE
W-LENGTH-OF-PRESUMED-CITY
.
XX-PROCESS-ADDRESS-LINE.
MOVE W-LENGTH-OF-CITY-STATE-ZIP TO W-CITY-STATE-ZIP-SUB
PERFORM XX-FIND-START-OF-ZIP
ADD +1 TO W-CITY-STATE-ZIP-SUB
GIVING W-START-OF-ZIP
IF ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO)
MOVE ZERO TO W-START-OF-STATE
W-START-OF-LINE
END-IF
PERFORM XX-FIND-START-OF-STATE
IF ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO)
MOVE ZERO TO W-START-OF-LINE
IF ( W-START-OF-ZIP EQUAL TO +1 )
MOVE ZERO TO W-START-OF-STATE
ELSE
MOVE +1 TO W-START-OF-STATE
END-IF
ELSE
ADD +1 TO W-CITY-STATE-ZIP-SUB
GIVING W-START-OF-STATE
MOVE +1 TO W-START-OF-LINE
END-IF
COMPUTE W-LENGTH-OF-PRESUMED-ZIP
= W-LENGTH-OF-CITY-STATE-ZIP
- ( W-START-OF-ZIP
- W-NEED-A-DISPLACEMENT )
IF ( W-START-OF-STATE EQUAL TO ZERO )
MOVE ZERO TO W-LENGTH-OF-PRESUMED-STATE
ELSE
COMPUTE W-LENGTH-OF-PRESUMED-STATE
= W-LENGTH-OF-CITY-STATE-ZIP
- W-LENGTH-OF-PRESUMED-ZIP
- ( W-START-OF-STATE
- W-NEED-A-DISPLACEMENT )
END-IF
IF ( W-START-OF-LINE EQUAL TO ZERO )
MOVE ZERO TO W-LENGTH-OF-PRESUMED-CITY
ELSE
COMPUTE W-LENGTH-OF-PRESUMED-CITY
= W-LENGTH-OF-CITY-STATE-ZIP
- W-LENGTH-OF-PRESUMED-ZIP
- W-LENGTH-OF-PRESUMED-STATE
END-IF
.
XX-FIND-START-OF-ZIP.
* By first finding the end of the last "word" on the line.
* Which we are assuming is the ZIP code.
*
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
NOT EQUAL TO SPACE )
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
EQUAL TO SPACE )
.
XX-FIND-START-OF-STATE.
* By first finding the end of the 2nd-to-last "word" on the
* line. Which we are assuming is the State code.
*
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
NOT EQUAL TO SPACE )
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
EQUAL TO SPACE )
.
XX-REDUCE-CSZ-SUB.
SUBTRACT +1 FROM W-CITY-STATE-ZIP-SUB
. |
|
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I didn't include this because I don't know, yet, if my little compiler supports it, but I always like to do this.
Code: |
01 W-DISPLAY-WHEN-COMPILED PIC X(8)BX(8). |
Then in the start up:
Code: |
MOVE WHEN-COMPILED TO W-DISPLAY-WHEN-COMPILED
DISPLAY program-name "Compiled on " W-DISPLAY-WHEN-COMPILED |
Then when looking at a problem it is easy to check if I have the correct compile listing. |
|
Back to top |
|
|
michael james
New User
Joined: 23 Apr 2011 Posts: 5 Location: USA
|
|
|
|
Bill I really appreciate all your help. I will integrate your code into my program.....Thanks a lot!!! |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
When pasting the code earlier, I jammed a couple of comments in.
A better approach to those two paragraphs is:
Code: |
XX-FIND-START-OF-ZIP.
PERFORM XX-ROLL-BACK-TO-END-OF-WORD-TO-LEFT
PERFORM XX-ROLL-BACK-TO-NEXT-BLANK-TO-LEFT
.
XX-FIND-START-OF-STATE.
PERFORM XX-ROLL-BACK-TO-END-OF-WORD-TO-LEFT
PERFORM XX-ROLL-BACK-TO-NEXT-BLANK-TO-LEFT
.
XX-ROLL-BACK-TO-END-OF-WORD-TO-LEFT.
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
NOT EQUAL TO SPACE )
.
XX-ROLL-BACK-TO-NEXT-BLANK-TO-LEFT.
PERFORM XX-REDUCE-CSZ-SUB
UNTIL ( W-CITY-STATE-ZIP-SUB EQUAL TO ZERO )
OR ( W-CSZW-BYTE ( W-CITY-STATE-ZIP-SUB )
EQUAL TO SPACE )
.
|
Anyone looking at the code can see, and say, "well, you don't need all those extra paragraphs. Just code the use of XX-REDUCE-CSZ-SUB in place of the performs of the paragraphs" (more of them further up as well).
Whilst this is true, doing it the way I have makes it "self-documenting". You can see from the paragraph names how the XX-REDUCE-CSZ-SUB is used in different situations. Without the extra paragraphs, you'd need comments. Comments aren't compiled. They are not always correct initially (as they are not compiled, and not "tested") and are not always updated when the program is maintained.
Doing it this way, the "comments" (the pargraph names) are compiled and tested.
If there are people interested in following the code, but can't, I'd like to know, as I do like my code to be understood by those who need to.
As with the index/subscript question, I chose paragraphs over sections in the same manner. If adaptation is necessary, it should not be onerous.
Since I wrote it using WordPad, the formatting of the code is not as extensive as I would like (no "logical tabs" in WordPad). Sorry about that, but it is a "diminishing returns" thing without a real editor.
Michael, having seen your target fields, I have been writing some code to analyse formatting problems in various ways (as Robert and Dick have rightly introduced complexity to the Zip, and various data-entry problems are possible). When processing a set of addresses for the first time, I'd always check them out to see how "good" the format is. The aim would be to get them corrected where possible or to pre-process them to get them into as good a state as possible for the task.
If you are sure your last address lines are good, that's OK. If not, I have a "little extension" to the code above to get you started. Let me know by PM if you'd like to see it. |
|
Back to top |
|
|
michael james
New User
Joined: 23 Apr 2011 Posts: 5 Location: USA
|
|
|
|
Bill I finally got a chance to integrate your code and it worked beautifully. Again I want to thank you specifically and all the others that provided input. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Thanks for letting us know and thanks for the thanks, appreciated. |
|
Back to top |
|
|
|