View previous topic :: View next topic
|
Author |
Message |
suraaj
New User
Joined: 16 Apr 2009 Posts: 69 Location: Canada
|
|
|
|
Hi
I am trying to check for the occurence of any of the keywords from the list defined in a record using INSPECT. Please see the code below.
Code: |
WS-APARTMENT.
10 WS-APT-TYPE PIC X(6).
88 APT-TYPE VALUE 'APT'
'APP'
'PH'
'SUITE'
'UNIT'
'A TERR'
'BUREAU'
'UNITÉ'
'#'.
|
Processing:
Code: |
INSPECT ADDRESS-LINE TALLYING WS-CHAR-COUNTER-1
FOR CHARACTERS BEFORE INITIAL WS-APT-TYPE
|
Once I have found the position of the start of the keyword, I need to strip out the data from the keyword to another field. Please advise
Thanks Suraaj |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Well, depending on your data it probably won't work, but do you have a specific question?
Searching for "#" like that will really be searching for
and the other short ones likewise. |
|
Back to top |
|
|
suraaj
New User
Joined: 16 Apr 2009 Posts: 69 Location: Canada
|
|
|
|
If not INSPECT how can this be done in COBOL. Basically I need to parse the line to find the keywords. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
|
|
|
|
Two approaches are generally used:
1. redefine the data as an array of PIC X(01) and look for matches one byte at a time.
2. use reference modification of the data to match against the values |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
How much input data is to be searched/parsed?
You will probably want 2 "arrays". The first which is the input data that is to be parsed. The second would contain the apt-types and their length.
For each apt-type (using reference modification) compare that length against the "current byte" in the input data. If all types are checked and not found at this byte, increment to the next byte in the input. Make sure the compare does not go beyond the end of the text.
Keeep in mind this will use LOTS of cpu per input record . . . |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
There are a number of ways to do this. Why don't you show some sample input, expected output, and describe the input as fully as you can? |
|
Back to top |
|
|
suraaj
New User
Joined: 16 Apr 2009 Posts: 69 Location: Canada
|
|
|
|
Input:
Code: |
31 heather street apt2013
45 carlview street unit 2512
|
In this case of input I need the position where the 'APT' word starts.
Once I have found the position, I will strip out the data starting from the keyword.
Thanks Suraaj |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
I think it is a stretch to call "apt" a keyword in this case.
Code: |
31 clapton pond apt2013
31 apthill road apt2013
|
You have a lot of work to do if you want to find that apt2013 (or the 2013) in a "string" of text. |
|
Back to top |
|
|
don.leahy
Active Member
Joined: 06 Jul 2010 Posts: 765 Location: Whitby, ON, Canada
|
|
|
|
Have you tried UNSTRING?
Code: |
unstring address-line
delimited by 'APT'
or 'APP'
or 'PH'
or 'SUITE'
or 'UNIT'
or 'A TERR'
or 'BUREAU'
or 'UNITÉ'
or '#'
into ws-part-1
ws-part-2
end-unstring |
|
|
Back to top |
|
|
don.leahy
Active Member
Joined: 06 Jul 2010 Posts: 765 Location: Whitby, ON, Canada
|
|
|
|
Note that the UNSTRING approach that I outlined will not handle cases like the one that Bill pointed out:
Code: |
31 apthill road apt2013
31 suite street suite 2013
28-31 suite street (this is a format recommended by Canada Post, where the apartment number precedes the street number) |
It doesn't matter what programming language you are using, parsing addresses is not a trivial task. |
|
Back to top |
|
|
suraaj
New User
Joined: 16 Apr 2009 Posts: 69 Location: Canada
|
|
|
|
The apartment number in my case would come at the end of the line and not at the beginning. By this I mean if the
Code: |
31 apthill road apt2013
|
is the input then the "apt" that is present at the end is the one that we should consider.
Thanks Suraaj |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Start at the "back" of the line. Search for a non-blank, to ignore trailing spaces. After finding a non-blank, continue searching backwards for a blank.
Ensure that the field being searched is not entirely blank, and that your code works when there are no trailing blanks.
Having isolated the start and end, it is easy to get hold of the data and see if it is what you want.
Here's an example which is much more complicated than your task, but could be easily adapted.
What country's addresses are you looking at?
I see you have need of one with two words. When you have the first word from above, it is easy to get the 2nd-last word as well. Just remember not to assume that there is one. |
|
Back to top |
|
|
suraaj
New User
Joined: 16 Apr 2009 Posts: 69 Location: Canada
|
|
|
|
Bill,
I am looking at addresses worldwide.
Thanks Suraaj |
|
Back to top |
|
|
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2455 Location: Hampshire, UK
|
|
|
|
Quote: |
I am looking at addresses worldwide. |
Good luck as a lot of countries seem to have their own format! |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
|
|
|
|
My first observation is that your unit list is short -- the US Post Office has officially 23 different abbreviations for the secondary unit name, which implies you need 46 names / abbreviations in your search algorithm. There will be others for non-US addresses.
My second observation is that someone needs to do a LOT of due diligence on the address lists since not all countries use the same format (Costa Rice, for example, does not use street numbers / names in addresses but rather distances from local landmarks). |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
200+ coutries and territories?
If you are trying to fully parse addresses from around the world (or even 10-20 major countries) you have a huge task on your hands.
I'd suggest you look at a commercial service, they have taken years develloping and refining their systems, so you don't have to... |
|
Back to top |
|
|
|