Binary search issue in table array

lavish.garg · New User Joined: 06 Mar 2012 Posts: 4 Location: india

Hi,

The data in the table array should be in sorted order to use SEARCH ALL. I want to know that the sorting of alphanumeric data should be according to the ascii code or any other code??

The reason I am asking this question is:
I am using an array having ascending on clause. The key has PIC clause X(20). The table array is populated with five entries which are case sensitive. These entries are populated from the table using a cursor having order by clause on the Key_Id of the table.
Now suppose I have following data in the table(all having length 20 and have trailing spaces to complete the length):
MMlavish1garg
MMlavish2garg
PlavPlagarg
Plavishgarg
Ulavishgarg

The data will be populated in the above order in the table array as 'i' has higher ASCII value then 'P'. I have vaerified this using the XPED in CICS that the data is populated in this way only.

Now when I use SEARCH ALL on the bases of Key_Id and try to get match corresponding to 'Plavishgarg', it is not able to find the entry in the table array. I have tried setting index to 1 before using 'search all' still it is not working.

Please provide me a solution to this.

Thanks..

Robert Sample · Posted: Tue Mar 06, 2012 7:03 pm

First, you keep using the term "ASCII". This is wrong. Mainframes use EBCDIC, not ASCII. If you don't know the difference -- learn it. Fast.

Second, if you look in the COBOL Language Reference manual index, you will NOT find the term "array" anywhere used. COBOL uses TABLES, not ARRAYS. Terminology is critical in IT, where similar terms may mean very different things.

Third, you did not provide enough information to answre your question. Things that we need to know are:
1) How many table elements in your table?
2) How did you ensure that the table elements were loaded in ascending order -- just saying "ascending key" in your definition does NOT impose any kind of requirement for them to be sorted at run-time. And the table elements you posted are NOT sorted in ascending EBCDIC sequence -- hence if your table was loaded as indicated, your binary sort will not work since the elements are not sorted appropriately.
3) How did you initialize the unused elements of your table?

dbzTHEdinosauer · Posted: Tue Mar 06, 2012 7:08 pm

hate to impose yet another fact on a clueless rookie,

a COBOL SEARCH ALL (binary search) will be faster
(doubly important for a CICS module,
no telling what other BS you have in this module)
if the COBOL INTERNAL table is defined as ODO.

Bill Woodger · Posted: Tue Mar 06, 2012 7:33 pm

As dbz has mentioned, if you have, say, OCCURS 128 for your table but only five entries populated, the SEARCH ALL will look four times at data which is not populated before it even starts to hit your data. As Robert has mentioned, if you do not give the "empty" table entries a high key value (HIGH-VALUES is good) then again it plain will not work.

With the ODO, as dbz mentioned, and when used correctly, only the part of the table populated will be searched, and the values in the other table entries are irrelevant.

Anything anyone forgot to mention?

enrico-sorichetti · Posted: Tue Mar 06, 2012 9:28 pm

yep.... but in ASCII the numbers come first..
as unpleasant as it might be, <ordering> in a multi-platform/distributed environment is a proper concern
the user ( AKA luser ) when asking for something to be ordered on an alphanumeric key has the right to a consistent display of the results

dbzTHEdinosauer · Posted: Tue Mar 06, 2012 9:59 pm

Enrico,

i agree with you,

but it is the user/programmer/implementation team responsibility
to insure that all data used by the program belong to the same code page.

as long as all data is of the same code page, there will be no problem.

the same cobol program on the mainframe
should have data files in ebcdic
and the appropriate encode parms (bind card) on the db2 select/insert/update
to insure the ebcdic is converted to ascii prior to inclusion in the db
and converted to ebcdic prior to presentation to the cobol program.

same is so for a pc/server/other ascii implementations.

when that is done, there is no problem

Bill Woodger · Posted: Tue Mar 06, 2012 10:03 pm

lavish.garg, where is your CICS running? ASCII machine? You say you used XPED to verify the value of the "i" and "P"....

So your problem is most likely that you do not have ODO and you have not initialised your entire table to a high-key value (like HIGH-VALUES).

The SEARCH ALL will start in the middle of the table. If you get a key hit, search ends, index set to the hit. If key in table is high, SEARCH ALL looks next at the middle of the low-half of the table; if key in table is low, SEARCH ALL looks next at the middle of the high-half of the table. Continues until finds the data, or no data can be found using the process.

If your table has undefined values or values from a previous load of data, you will likely not get a hit.

Say you have OCCURS 20 but load initially with only five items. The other 15 values are "undefined", and quite possibly binary zeros. The mid-point of the table may be a low-key, the SEARCH ALL will try the upper half of the table and stay up there until it determines you key does not exist. Even with non-binary zeros, the undefined data will only allow your data to be found by coincidence.

Either set you table entries to HIGH-VALUES before each load of the table (this is the lazy way, but it works), or use OCCURS DEPENDING ON in the table definition, from which SEARCH ALL will know the length of the table and will only search within that.

dbzTHEdinosauer · Posted: Tue Mar 06, 2012 11:03 pm

so, if the data in the cobol internal table is ascii and sorted,
he is running the module in a pc environment.

lavish.garg · New User Joined: 06 Mar 2012 Posts: 4 Location: india

Ohhh..hold on guys. Thankss to all for so many replies.

Got my answer in ur conversation..

actually what happened was i populated cobol >TABLE< variables from the database table. The query to retrieve the rows has order by clause which returns data in asc order but according to ascii codes and populates the cobol table with the data which is not in sorted order according to EBCDIC code.

As I used order by clause so thought the table should be populated in sorted order. My mistake, totally forgot about the ASCII and EBCDIC code difference.

So search all was not working.

>>To limit the table I am using depending on clause.

One more ques

I have read some where that till 50 entries SEARCH works better then SEARCH ALL. By chance I can have max of 50 entries in table and most of the time these will be less then 20. So should i go for linear search or binary search?? As it is a CICS program so performance matters.

lavish.garg · New User Joined: 06 Mar 2012 Posts: 4 Location: india

And one more thing.. As it is clear that the elements are not in sorted order. I will have to sort the table first in program itself. The no of times i will be hitting the search (in one program) can vary from 10 to 50 times in one transaction.

So Linear or Binary search???

dbzTHEdinosauer · Posted: Wed Mar 07, 2012 12:04 am

don't sort,
use search not search all. remember to set the index to 1 before the search

and the break point is about 250.
actually perform varying an index is faster than
search or search all.

Bill Woodger · Posted: Wed Mar 07, 2012 12:07 am

Depends on your data, not on the method of searching.

If a very high proportion of the matches are on a few items, then SEARCH will likely be better, with the high-hitters at the start of the table. The more even the distribution, the more SEARCH ALL will win out. With the possibility of "misses", SEARCH ALL improves again.

The only way to find out in your case, is to try them both with your data.

Anyone who thinks that just the size of the table can give you the answer, doesn't know what they are talking about.

EDIT: Err... didn't see you there, dbz. That's something I'll have to check now. Haven't looked since the Enterprise took to the seas (metaphorically speaking). SEARCH ALL didn't used to be that slow, although you could well beat it with your own binary chop. Live 'n learn, live 'n learn.

dbzTHEdinosauer · Posted: Wed Mar 07, 2012 12:37 am

a link that you may want to follow/read especially since the TS is talking about 20 or so items.:
serial vs binary search

and the attachment at the bottom of his post is sorta interesting.

by the way, JSAYLES is instrumental in the RDZ development.

Bill Woodger · Posted: Wed Mar 07, 2012 1:11 am

Well, now I'm too confused to know if I'm confused or un-confused.

5815 : 650 : 169 in milliseconds, for perform, SEARCH and SEARCH ALL. Looks to me like SEARCH ALL usesfewer miliseconds, which I'd have thought was good :-)

dbzTHEdinosauer · Posted: Wed Mar 07, 2012 1:34 am

all of the following is quoted from the link in my previous post:

5815 - for Perform Table Search 244th element
650 - Sequential Search 244th element
169 - Search All - 244th element

29 - for Perform Table Search 1st element
37 - Sequential Search 1st element

In Milliseconds. You can see that, even in a small table - to get to the 244 element,
the Search All was much more efficient than linear search -
and MUCH more efficient than PERFORM VARYING.

But if you hit the top of the table,
PERFORM VARYING actually out-performed (sorry :0 ) linear search
(I should have included a test for finding the first element with SEARCH ALL, but it would have been the worst of the three options).

end of quoted material/start of dbzbs:

point is that the extra code executed with a SEARCH ALL starts to be meaningless after the item count reaches a certain volume.

but if the item to find is in the top part of the table 1st 10 or so (with 240 some items),
then the perform varying and linear search out perform the search all.

since we know the values for the table will be unsorted,
any savings of a search all would be used up with a sort.

so, don't sort them
use linear or perform varying.

and i am last, again

Bill Woodger · Posted: Wed Mar 07, 2012 4:33 am

But I'd be repeating myself if I said that :-)

A common way to speed-up a table look-up is to not do it at all. If the current search value is the same as the previous value, don't do anything. Otherwise, look at the table.

With a small table, the SEARCH ALL won't be necessary. If the data is not in sequence, SEARCH ALL is out anyway. If the data is in sequence, there is the possibility of SEARCH with a termination when key match is no longer possible. Both SEARCH and a PERFORM VARYING will be faster at hitting the top n entries of a table, where n is less than the average number of attempts made by SEARCH ALL before finding or giving up on that one.

It really is down to the data. Collect some "frequency" figures that are likely to be representative, plus some figures for numbers of elements in the table. The bigger the table and the more even the values are distributed, the better a binary search will perform.

However, if the data is suitable for "optimisation", a binary search can be beaten for any size of table.

SEARCH ALL can be beaten by a "hand-coded" binary search, the thing is that the SEARCH ALL works out of the box, every time (assuming one knows how to use it) whereas as a "hand-coded" routine has to be proved.

It might no be so good to do dumb things to slow your CICS program down, but it is also not so good to "optimise" it so that it is no longer as easy to understand.

You know it's gonna be here, don't you? :-)