Query perfomance in large tables.

chandraBE · New User Joined: 02 Feb 2010 Posts: 44 Location: Bangalore

Hi,

We have one Table with huge data(around 80 Millions of Records) in my Project. We are planning to split the data of this table into two different Tables.

After Splitting, we will have to search the data in both of the tables in place of one table.

I am not sure about the difference in the performance of the query on this table before splitting and after splitting.

Can anyone let me know the impact of splitting the table into two tables. After splitting, perfromance of select query will improve, degrade or it will be neutral?

I have tried this in testing environment but could not significant results.

Thanks,
Chandra

GuyC · Posted: Fri Jan 14, 2011 6:24 pm

Why don't you partition? Let DB2 do the work for you and (maybe) benefit of parallel access.
If you do it yourself , you'll need to maintain the code to decide where which row has to go.

A lot depends on what kind of access you do.
For direct random access via an index you could benefit if the levels of the indexes are lower than the nlevel of the big index.

I suspect that as soon as you need to 'union all' the 2 tables or always query both tables, performance will be worse.

Cutting it in 2 won't do much. If you can cut it in 5 or more and application-wise know that you don't need to search all 5 "partitions" you could benefit more.

dick scherrer · Posted: Fri Jan 14, 2011 9:10 pm

Hello,

You need to consider the SELECTs that will be used (the current ones and the ones that are projected to be needed later) whether the table is split, partitioned, or left alone.

Several of the larger systems i've built and/or supported have tables with several hundred million rows and the online query response time is sub-second. The trick is to only access a few rows in a transaction and access those rows directly (proper design/implementation of the database/tables/keys).

One problem i see more and more is that organizations throw together some kind of table definitions and code and test with a few hundred or a few thousand rows. Of course this will perform wonderfully. I believe that high-volume testing should be done even in unit test - to provent ugly surprises later . . .

Kjeld · Posted: Sat Jan 15, 2011 9:11 pm

You haven't mentioned what kind of data you are storing.

Fx. splitting into separate tables might be feasible if you have historical rows and you want the most current records in one table (those most frequently accessed), and the historical seldom accessed row in another.

If all rows are retrived on equal frequency, I will definitely recommend partioning, with a proper hashing algoritm to distribute rows evenly across the partitions.

PeterHolland · Posted: Sun Jan 16, 2011 2:28 pm

www.ibm.com/developerworks/data/library/techarticle/dm-0605ahuja2/index.html

chandraBE · New User Joined: 02 Feb 2010 Posts: 44 Location: Bangalore

Thanks you for your quick responses. They were very useful to me.

Since we had historical data in the tables, we wanted to move the old data into a seperate tables.

Thanks & Regards,
Chandra