Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Query efficiency against historical data?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DB2
View previous topic :: :: View next topic  
Author Message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Mon Jun 13, 2011 12:46 pm    Post subject: Query efficiency against historical data?
Reply with quote

Now I'm designing the massive data backup strategy , in order to achieve the best efficiency when query against these historical data.

Would u please give me some suggestions on this according to your rich experience?
Any possible solution is okay.

Thank you!
Back to top
View user's profile Send private message

bhairon singh rathore

New User


Joined: 19 Jun 2008
Posts: 91
Location: banglore

PostPosted: Mon Jun 13, 2011 3:03 pm    Post subject:
Reply with quote

We can use Partitioned table for this and we can store data in different partition as per date e.g like store data of today in one partition and next date data in second partition and so on......when maximum partition is reached start again from first....

Again partition information can be stored in another table which contain partition and date only for each day.

While query you can join both table on the basis of partition and date and use date in where clause

Note :-date is only the example you can use any other unique values as per functionality
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Mon Jun 13, 2011 3:32 pm    Post subject: Reply to: Query efficiency against historical data?
Reply with quote

Quote:
Now I'm designing the massive data backup strategy , in order to achieve the best efficiency when query against these historical data.


please use the right terminology ...
backup has nothing to do with historical data
they are two completely different beasts

historical data is just that... history ( and it might not even be online )
there should be no worry about access performance , just the space utilization

if You are worried about performance, it means that the data is less than historical

how/when the data becomes history(cal) depends on the application and the relative legal implications for data accessibility

so the get better help it would be wise to reword the question

query ??? do You mean making old data available for processing
or run a <query/report> on it

again two different things
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jun 13, 2011 8:26 pm    Post subject:
Reply with quote

Hello,

Quote:
Now I'm designing the massive data backup strategy , in order to achieve the best efficiency when query against these historical data.
This is completely contradictory. . .

Which to you think you want to do? Create a massive backup or find some way to run "efficient" queries against historical data.

Maybe there is something else you really want to do. . .
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Mon Jun 13, 2011 8:48 pm    Post subject: Reply to: Query efficiency against historical data?
Reply with quote

hi, bhairon,

Thanks for your suggestion.
But I'd rather use Universal table space than Partitioned table space, as for DB2 v10, I suppose Universal table space is far more excellent than Partitioned table space from all aspects.

correct me if I'm wrong.
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Mon Jun 13, 2011 8:54 pm    Post subject:
Reply with quote

hi, enrico,

thanks for your reply.
when I mentioned "Query", I mean perform SQL query statement against these "OLD" data, for example, data generated one year ago with seldom query against on it.

I've been considering using UTS to realize this, but I want to know more strategy on this.
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Mon Jun 13, 2011 8:59 pm    Post subject:
Reply with quote

dick scherrer wrote:
Which to you think you want to do? Create a massive backup or find some way to run "efficient" queries against historical data.

Maybe there is something else you really want to do. . .

hi, Dick,
I just want the query on these historical data more efficient.
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jun 13, 2011 9:18 pm    Post subject:
Reply with quote

Hello,

Quote:
I just want the query on these historical data more efficient.
Then you need to design historical database structures to support high-speed queries - possibly including ad-hoc. Often these are not copletely normalized to improve query performance.

Many places have implemented their vrsion of a "data warehouse" to do this. Others have merely determined which data is to be available and designed new tables to support the queries rather than optimize transacton processing.

Keep in mind that if you create these new tables, they will require considerable dasd space.
Back to top
View user's profile Send private message
enrico-sorichetti

Global Moderator


Joined: 14 Mar 2007
Posts: 10202
Location: italy

PostPosted: Mon Jun 13, 2011 9:31 pm    Post subject: Reply to: Query efficiency against historical data?
Reply with quote

as I already said one of the criteria for defining data as historical in IT sense
is the frequency of use
which in turn defines the storage hierarchy to be used and the how to make available it approach

if You are concerned with query performance it means that the data is accessed frequently and is not historical any longer

most organization historical definition is pretty dynamic according to the access pattern

online data has an overall management pattern different from backup/offline/historical data

looks from the wording that You are just concerned with accessing tables where one of the key is a date

so there is really nothing to make a big fuss out of it

reword Your requirement !
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Tue Jun 14, 2011 7:36 am    Post subject:
Reply with quote

in below scenario:
Let's say we have a journal table which contains all the transaction processing details.
As the data of the journal table is growing continuously, I'm considering using Partition-by-growth universal table space for it.
Because I have no experience on Partition-by-growth universal table before, I'm not sure when I query against the table for a transaction detail record that was inserted one year before, the speed will be almost the same as the query for record of today.

In addition, as only NPI indexes can be created on UTS, when the data of table becomes large, index data also growing large. Will this impact the query efficiency?

Using UTS, the data limits can reach 128T. Considering the efficiency, I think migration of old data should be implemented far before the data capacity reaches its limit, that is , move old data from the journal master table to another table. My problem is: after the migration, how can the original query successfully performed?
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Tue Jun 14, 2011 8:23 am    Post subject:
Reply with quote

Hello,

Quote:
I'm not sure when I query against the table for a transaction detail record that was inserted one year before, the speed will be almost the same as the query for record of today.
Assuming proper keys have been designed, there will be very little or no difference in the time it takes for the system to retrieve a row from last year than it takes to retrieve a row from today. . .

Once again it appears that you have chosen a direction whether it is needed or not. Why was UTS picked before trying other approaches?
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Tue Jun 14, 2011 9:21 am    Post subject:
Reply with quote

dick scherrer wrote:
Why was UTS picked before trying other approaches?

hi, Dick,

Just because I have no other idea than UTS solution...
And this is just why I'm fetching help from you experts.
Back to top
View user's profile Send private message
GuyC

Senior Member


Joined: 11 Aug 2009
Posts: 1278
Location: Belgium

PostPosted: Tue Jun 14, 2011 12:48 pm    Post subject:
Reply with quote

confusing use of terminology and I'll add some :
Both PBG and PBR are UTS.

for this scenario a PBR with adding/rotating partitions seems a good starting point.
The main reason why IBM changed the maximum # of partitions to 4096 was that it can contains 11 years, a partition / day (keeps the doctor away)
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Tue Jun 14, 2011 2:46 pm    Post subject:
Reply with quote

hi, GuyC,

If PBR is used, partitioning key is required. in this case, the date should be chosen as the partitioning key. Here comes a problem: partitioning range should be specified when the table is defined. and since one partition is used only by one day, shouldn't partitioning key range be explicitly specified for all 4096 partitions???? a horrible task...

I've been considering using PBR UTS, but I'm not sure whether rotating is applicable to PBG UTS.
Back to top
View user's profile Send private message
sushanth bobby

Senior Member


Joined: 29 Jul 2008
Posts: 1013
Location: India

PostPosted: Tue Jun 14, 2011 11:33 pm    Post subject:
Reply with quote

Dejunzhu,

In DB2 V10, there is a feature called Temporal Tables, which might help you keeping the current and historic data separately.

Read on it, Try it out and Test it!

Thanks,
Sushanth
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Wed Jun 15, 2011 7:15 am    Post subject:
Reply with quote

sushanth bobby wrote:
Dejunzhu,

In DB2 V10, there is a feature called Temporal Tables, which might help you keeping the current and historic data separately.

Read on it, Try it out and Test it!

Thanks,
Sushanth

hi, Sushanth,

Thanks for your information.
But Temporal Table is used to manage different VERSIONs of data.
but 'HISTORICAL' data here I mean OLD data like journal data asof one year ago.

So, I'm afraid Temporal Table does not apply for my requirement.
Back to top
View user's profile Send private message
GuyC

Senior Member


Joined: 11 Aug 2009
Posts: 1278
Location: Belgium

PostPosted: Wed Jun 15, 2011 12:56 pm    Post subject:
Reply with quote

dejunzhu wrote:
hi, GuyC,

If PBR is used, partitioning key is required. in this case, the date should be chosen as the partitioning key. Here comes a problem: partitioning range should be specified when the table is defined. and since one partition is used only by one day, shouldn't partitioning key range be explicitly specified for all 4096 partitions???? a horrible task...

I've been considering using PBR UTS, but I'm not sure whether rotating is applicable to PBG UTS.

* Range doesn't have to be 1 day, it can be anything like 1 month, 3 months, 7 days,...
* Partitions can be added as time goes by. no need to define them all immediately.
* ofcourse rotating doesn't apply to PBG. Any reasonable understanding of the concept PBG would have clarified that.
Back to top
View user's profile Send private message
dejunzhu

Active User


Joined: 08 May 2008
Posts: 390
Location: China

PostPosted: Wed Jun 15, 2011 1:04 pm    Post subject:
Reply with quote

GuyC wrote:

* Range doesn't have to be 1 day, it can be anything like 1 month, 3 months, 7 days,...
* Partitions can be added as time goes by. no need to define them all immediately.
* ofcourse rotating doesn't apply to PBG. Any reasonable understanding of the concept PBG would have clarified that.


So, here comes the problem:
If I do not define all partitions immediately, that is, I choose PBG UTS, how can I take advantage of rotating function? as rotating does not apply to PBG.

so, this is contradictory...
Back to top
View user's profile Send private message
GuyC

Senior Member


Joined: 11 Aug 2009
Posts: 1278
Location: Belgium

PostPosted: Wed Jun 15, 2011 1:26 pm    Post subject:
Reply with quote

go back to the books.
not defining partitions immediately <> choosing PBG
Every month generate and execute this script :
ALTER TABLE xxx ADD PARTITION ENDING AT (end-of-month)

Rotating is only interesting when you can start to delete data, ie. for data that is no longer required.
suppose after 36 months (you now have 37 partitions) you start executing this script instead of the previous one :
ALTER TABLE xxx ROTATE PARTITION FIRST TO LAST ENDING AT (end-of-month) RESET
Back to top
View user's profile Send private message
sushanth bobby

Senior Member


Joined: 29 Jul 2008
Posts: 1013
Location: India

PostPosted: Wed Jun 15, 2011 2:57 pm    Post subject:
Reply with quote

dejunzhu,

When you go for PBG, you accept the fact that data grows huge and you cannot predict how much data will grow in the next few months. So, inorder to reduce the maintainence work/task you go for PBG, so that whenever a partition reaches a specified size, a new partition is added.

As, GuyC mentioned, Adding partitions and rotating partitions can be automated. For that you have to use PBR.

Quote:
how can I take advantage of rotating function? as rotating does not apply to PBG.
You cannot. You have to do by other means, during REORG you can discard the older data.

And still did not tell,
How many months of data you want to keep ?
How many transactions per day you have like how transaction row get populated per day ?(since you were planning on partition per day).

Thanks,
Sushanth
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DB2 All times are GMT + 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Unwrap the data based on delimiter X'25' bhavana yalavarthi DFSORT/ICETOOL 18 Fri Dec 09, 2016 10:25 am
No new posts Is the 'prompt' data for the DSLIST p... Willy Jensen TSO/ISPF 2 Tue Dec 06, 2016 4:38 am
This topic is locked: you cannot edit posts or make replies. How to move a long alphanumeric data ... lind sh COBOL Programming 8 Mon Dec 05, 2016 7:51 pm
This topic is locked: you cannot edit posts or make replies. Db2 SQL query to convert rows into co... subratarec DB2 5 Thu Nov 17, 2016 4:51 pm
No new posts ODPP(Optim Data privacy Provider) Iss... Rama kishore IBM Tools 1 Mon Nov 07, 2016 5:46 pm


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us