Query efficiency against historical data?

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

Now I'm designing the massive data backup strategy , in order to achieve the best efficiency when query against these historical data.

Would u please give me some suggestions on this according to your rich experience?
Any possible solution is okay.

Thank you!

bhairon singh rathore · New User Joined: 19 Jun 2008 Posts: 91 Location: banglore

We can use Partitioned table for this and we can store data in different partition as per date e.g like store data of today in one partition and next date data in second partition and so on......when maximum partition is reached start again from first....

Again partition information can be stored in another table which contain partition and date only for each day.

While query you can join both table on the basis of partition and date and use date in where clause

Note :-date is only the example you can use any other unique values as per functionality

enrico-sorichetti · Posted: Mon Jun 13, 2011 3:32 pm

dick scherrer · Posted: Mon Jun 13, 2011 8:26 pm

Hello,

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

hi, bhairon,

Thanks for your suggestion.
But I'd rather use Universal table space than Partitioned table space, as for DB2 v10, I suppose Universal table space is far more excellent than Partitioned table space from all aspects.

correct me if I'm wrong.

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

hi, enrico,

thanks for your reply.
when I mentioned "Query", I mean perform SQL query statement against these "OLD" data, for example, data generated one year ago with seldom query against on it.

I've been considering using UTS to realize this, but I want to know more strategy on this.

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

dick scherrer · Posted: Mon Jun 13, 2011 9:18 pm

Hello,

enrico-sorichetti · Posted: Mon Jun 13, 2011 9:31 pm

as I already said one of the criteria for defining data as historical in IT sense
is the frequency of use
which in turn defines the storage hierarchy to be used and the how to make available it approach

if You are concerned with query performance it means that the data is accessed frequently and is not historical any longer

most organization historical definition is pretty dynamic according to the access pattern

online data has an overall management pattern different from backup/offline/historical data

looks from the wording that You are just concerned with accessing tables where one of the key is a date

so there is really nothing to make a big fuss out of it

reword Your requirement !

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

in below scenario:
Let's say we have a journal table which contains all the transaction processing details.
As the data of the journal table is growing continuously, I'm considering using Partition-by-growth universal table space for it.
Because I have no experience on Partition-by-growth universal table before, I'm not sure when I query against the table for a transaction detail record that was inserted one year before, the speed will be almost the same as the query for record of today.

In addition, as only NPI indexes can be created on UTS, when the data of table becomes large, index data also growing large. Will this impact the query efficiency?

Using UTS, the data limits can reach 128T. Considering the efficiency, I think migration of old data should be implemented far before the data capacity reaches its limit, that is , move old data from the journal master table to another table. My problem is: after the migration, how can the original query successfully performed?

dick scherrer · Posted: Tue Jun 14, 2011 8:23 am

Hello,

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

GuyC · Posted: Tue Jun 14, 2011 12:48 pm

confusing use of terminology and I'll add some :
Both PBG and PBR are UTS.

for this scenario a PBR with adding/rotating partitions seems a good starting point.
The main reason why IBM changed the maximum # of partitions to 4096 was that it can contains 11 years, a partition / day (keeps the doctor away)

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

hi, GuyC,

If PBR is used, partitioning key is required. in this case, the date should be chosen as the partitioning key. Here comes a problem: partitioning range should be specified when the table is defined. and since one partition is used only by one day, shouldn't partitioning key range be explicitly specified for all 4096 partitions???? a horrible task...

I've been considering using PBR UTS, but I'm not sure whether rotating is applicable to PBG UTS.

sushanth bobby · Posted: Tue Jun 14, 2011 11:33 pm

Dejunzhu,

In DB2 V10, there is a feature called Temporal Tables, which might help you keeping the current and historic data separately.

Read on it, Try it out and Test it!

Thanks,
Sushanth

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

GuyC · Posted: Wed Jun 15, 2011 12:56 pm

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

GuyC · Posted: Wed Jun 15, 2011 1:26 pm

go back to the books.
not defining partitions immediately <> choosing PBG
Every month generate and execute this script :
ALTER TABLE xxx ADD PARTITION ENDING AT (end-of-month)

Rotating is only interesting when you can start to delete data, ie. for data that is no longer required.
suppose after 36 months (you now have 37 partitions) you start executing this script instead of the previous one :
ALTER TABLE xxx ROTATE PARTITION FIRST TO LAST ENDING AT (end-of-month) RESET

sushanth bobby · Posted: Wed Jun 15, 2011 2:57 pm

dejunzhu,

When you go for PBG, you accept the fact that data grows huge and you cannot predict how much data will grow in the next few months. So, inorder to reduce the maintainence work/task you go for PBG, so that whenever a partition reaches a specified size, a new partition is added.

As, GuyC mentioned, Adding partitions and rotating partitions can be automated. For that you have to use PBR.

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

dick scherrer · Posted: Thu Jun 16, 2011 8:46 am

dejunzhu · Active User Joined: 08 May 2008 Posts: 390 Location: China

Hi, Dick,
As I mentioned before, our shop lack money, and cannot hire very experienced experts. I'm doing partly DBA's work...
And this is just why I'm fetching help from you experts.

dick scherrer · Posted: Thu Jun 16, 2011 9:35 am

enrico-sorichetti · Posted: Thu Jun 16, 2011 10:28 am