Find the occurrence of keywords using INSPECT

suraaj · New User Joined: 16 Apr 2009 Posts: 69 Location: Canada

Hi

I am trying to check for the occurence of any of the keywords from the list defined in a record using INSPECT. Please see the code below.

Bill Woodger · Posted: Fri Aug 30, 2013 9:42 pm

Well, depending on your data it probably won't work, but do you have a specific question?

Searching for "#" like that will really be searching for

suraaj · New User Joined: 16 Apr 2009 Posts: 69 Location: Canada

If not INSPECT how can this be done in COBOL. Basically I need to parse the line to find the keywords.

Robert Sample · Posted: Fri Aug 30, 2013 10:24 pm

Two approaches are generally used:
1. redefine the data as an array of PIC X(01) and look for matches one byte at a time.
2. use reference modification of the data to match against the values

dick scherrer · Posted: Sat Aug 31, 2013 12:15 am

Hello,

How much input data is to be searched/parsed?

You will probably want 2 "arrays". The first which is the input data that is to be parsed. The second would contain the apt-types and their length.

For each apt-type (using reference modification) compare that length against the "current byte" in the input data. If all types are checked and not found at this byte, increment to the next byte in the input. Make sure the compare does not go beyond the end of the text.

Keeep in mind this will use LOTS of cpu per input record . . .

Bill Woodger · Posted: Sat Aug 31, 2013 4:34 am

There are a number of ways to do this. Why don't you show some sample input, expected output, and describe the input as fully as you can?

suraaj · New User Joined: 16 Apr 2009 Posts: 69 Location: Canada

Input:

Bill Woodger · Posted: Sun Sep 01, 2013 9:56 pm

I think it is a stretch to call "apt" a keyword in this case.

don.leahy · Posted: Tue Sep 03, 2013 6:25 pm

Have you tried UNSTRING?

don.leahy · Posted: Tue Sep 03, 2013 7:27 pm

Note that the UNSTRING approach that I outlined will not handle cases like the one that Bill pointed out:

suraaj · New User Joined: 16 Apr 2009 Posts: 69 Location: Canada

The apartment number in my case would come at the end of the line and not at the beginning. By this I mean if the

Bill Woodger · Posted: Wed Sep 04, 2013 1:04 am

Start at the "back" of the line. Search for a non-blank, to ignore trailing spaces. After finding a non-blank, continue searching backwards for a blank.

Ensure that the field being searched is not entirely blank, and that your code works when there are no trailing blanks.

Having isolated the start and end, it is easy to get hold of the data and see if it is what you want.

Here's an example which is much more complicated than your task, but could be easily adapted.

What country's addresses are you looking at?

I see you have need of one with two words. When you have the first word from above, it is easy to get the 2nd-last word as well. Just remember not to assume that there is one.

suraaj · New User Joined: 16 Apr 2009 Posts: 69 Location: Canada

Bill,

I am looking at addresses worldwide.

Thanks Suraaj

Nic Clouston · Posted: Wed Sep 04, 2013 8:34 pm

Robert Sample · Posted: Wed Sep 04, 2013 9:09 pm

My first observation is that your unit list is short -- the US Post Office has officially 23 different abbreviations for the secondary unit name, which implies you need 46 names / abbreviations in your search algorithm. There will be others for non-US addresses.

My second observation is that someone needs to do a LOT of due diligence on the address lists since not all countries use the same format (Costa Rice, for example, does not use street numbers / names in addresses but rather distances from local landmarks).

Bill Woodger · Posted: Wed Sep 04, 2013 10:14 pm

200+ coutries and territories?

If you are trying to fully parse addresses from around the world (or even 10-20 major countries) you have a huge task on your hands.

I'd suggest you look at a commercial service, they have taken years develloping and refining their systems, so you don't have to...