IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Timestamp difference and its average by date


IBM Mainframe Forums -> DB2
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Thu May 04, 2023 10:15 pm
Reply with quote

Hi Team,

I am working on the call log table where we have connection timestamp between customer(A) and agent (B) at two different TS .Need is to calculate the difference between two connection ts and take the average by date .

Table Data looks like :-

Code:

Connection_TS   Type   Call_ID   Diff  in connection_ts_in_seconds
2023-04-20 09:49:40 UTC   A   190   11
2023-04-20 09:49:51 UTC   B   190   
2023-04-20 09:49:40 UTC   A   191   
2023-04-20 09:49:45 UTC   B   191   5
         
   AVG   11+5/2   = 8



expected output :-

Code:

DATE_Connection_TS   AVG(time_stamp_diff)
20-04-2023   8



Getting difference on TS:-

Code:


--**  Tried with adding only call_id,  i am getting difference as expected -no issues

select call_id,
timestamp_diff(max(connected_ts), min(connected_ts), second)  as ts_diff_seconds
from  db.call_log table
where call_id is not null
and connected_at is not null    
group by call_id
order by call_id
;


However if i add date extracted from connection_ts , sql considers each row as unique and getting 0.0 as output

Code:


select Date(Connected_ts) as connected_ts_date,
timestamp_diff(max(connected_ts), min(connected_ts), second)/
count(distinct call_id)    as avg_ts_diff_seconds"
from  db.call_log table
where call_id is not null
and connected_at is not null    
group by connected_ts_date
order by connected_ts_date
;



Please do help/advise me on how i can achieve the output .

Thanks
Balaji k
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3051
Location: NYC,USA

PostPosted: Fri May 05, 2023 11:51 am
Reply with quote

I don’t find you are referencing column names consistently right in all your queries from the input table data .
Second , you don’t need spaces in last column so you can filter that as well .. you may represent the column names as correct as used in query.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2020
Location: USA

PostPosted: Fri May 05, 2023 2:06 pm
Reply with quote

From you last SELECT code one cannot understand your intentions…

Definitely, one single SELECT is not able to handle several different grouping and distinct fields; you may need to use extra sub-SELECT, or VIEWS, or WITH, or some other tricks.

Please, give a clear sample of your input, and desired output.
Back to top
View user's profile Send private message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Fri May 05, 2023 4:18 pm
Reply with quote

Hi Sergeyken,
I tried with inner join but still i am not getting the expected output.

Table data :-
Code:

id   connected_at   call_id
1   2023-05-05 10:07:15.891023 UTC   1
2   2023-05-05 10:07:46.610037 UTC   1
3   2023-05-05 10:09:27.057297 UTC   2
4   2023-05-05 10:09:46.057827 UTC   2


SQL:-
Code:

select                       
date(prev.connected_at) as connected_date,                     
avg(round(abs(timestamp_diff(prev.connected_at,cur.connected_at,second)),2))  as average_seconds                     
from  dev.call_log_table    cur                     
inner join dev.call_log_table   prev                      
    on cur.id = prev.id + 1                      
    and cur.connected_at <> prev.connected_at                     
group by prev.connected_at                     
having prev.connected_at is not null                     
order by prev.connected_at                     
;                     


Actual result:-

I am not sure why i am getting row 2 with AVG of 100% .


Code:

connected_date   average_seconds
05-05-2023   30
05-05-2023   100
05-05-2023   19



expected output:-
Code:

connected_date   average_seconds
05-05-2023   24.5



Please advise and help.
Back to top
View user's profile Send private message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Fri May 05, 2023 4:51 pm
Reply with quote

Basically, i need to find the difference of connected_at which is timestamp and then group by call_id and take the average of difference by date .
Please correct me if i need to use call_id alone. I feel strongly by call_id and then find TS difference and take average by call_id and then calculate the average = TS(Difference) by call_id/ total call_id of the day
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2020
Location: USA

PostPosted: Sat May 06, 2023 2:10 pm
Reply with quote

As I’ve mentioned above, you need the construction of two SELECTs, the outer one, and the inner one.
Code:
SELECT … FROM (SELECT … FROM call_log_table …) …;

The inner SELECT must be similar to your very first example, extracting time difference values grouped by call_id. Next, the outer SELECT must calculate your average values (from what is returned by the inner SELECT), and grouped by connection date values.

Once again: A SINGLE SELECT CANNOT PROVIDE TWO DIFFERENT GROUPINGS AND/OR DISTINCTS!
Back to top
View user's profile Send private message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Mon May 08, 2023 3:31 pm
Reply with quote

Hi Sergeyken, I am trying the build SQL with subselect (inner and outer SQL's) but the outer SQL fails with reference column (connect_at) is not found and it is not part of inner query.
Can you please help me on how to get aggregation by date(conected_at).

Code:

select
extract(date from connected_at)  as connected_date,
avg(t.time_stamp_diff)
from
(
 select call_id, timestamp_diff(max(connected_at), min(connected_at), second)  as time_stamp_diff
 from dev.call_log_table 
 where call_id is not null
 and connected_at is not null 
 and ended_at is not null
 group by   call_id
 order by call_id
) AS t
group by  connected_date  <-  error (reference column not found in inside Query)
order by 1

;



Please do help me on how to refer correctly.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2020
Location: USA

PostPosted: Mon May 08, 2023 4:50 pm
Reply with quote

I deliberately don’t want to give you a ready-to-copy-and-paste solution, but insist you to construct it by yourself. Otherwise you’ll never be able to complete a similar task for other projects.

Please, try to do it step by step, not throwing all together in one huge pile.

Step 1. Write and debug separately the SELECT supposed to become the inner part of your future combined query.

Step 2. Do it only after step 1 is 100% done. Write the outer SELECT, using the SQL code from step 1 within the brackets after the outer FROM keyword.

The SQL for step 1 may be like this one (though I hate to give out the finished code…)
Code:
SELECT date( min(connected_at) ) as connected_date,
       timestamp_diff( max(connected_at), min(connected_at), second) as time_diff
       // call_id value is not needed outside of this inner query
FROM dev.call_log_table
WHERE … IS NOT NULL
GROUP BY call_id
// order by is not needed here
;


ONLY AFTER YOU HAVE FINISHED THE DEBUGGING OF STEP 1 at 100%, you can try combining it as part of step 2.
Code:
SELECT t.connected_date,
       avg(t.time_diff) as interval
FROM ( insert code from step 1 here) as t
GROUP BY t.connected_date
ORDER BY 1
;

I do not have access to DB2 right now, and I am on vacation, out of the country. If something is missing or mistyped in my code hints, I hope you should be able to fix it.

Good luck!
Back to top
View user's profile Send private message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Mon May 08, 2023 6:59 pm
Reply with quote

Thankyou Sergeyken, its working . I understand on taking the distinct values and calculate the average as we have two entries for each call_id. I calculated randomly for 5 days and it is matching .

Code:

SELECT t.connected_date,
       round(avg(t.time_stamp_diff),2) as Avg_Time_in_seconds
FROM (  select  DISTINCT
 DATE(min(connected_at)) as connected_date,
 timestamp_diff(max(connected_at), min(connected_at), second)  as time_stamp_diff
 from dev.call_log_table 
 where call_id is not null
 and connected_at is not null 
 and ended_at is not null
 group by   call_id
) as t
GROUP BY t.connected_date
ORDER BY 1
;
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2020
Location: USA

PostPosted: Mon May 08, 2023 7:25 pm
Reply with quote

The extra DISTINCT keyword is wrong. It may eliminate required values when the same time_difference amounts do appear more than once on the same date.

This operation must be done ONLY via GROUP BY t.connected_date clause!

This situation remains unnoticed during a primitive test with limited input data, but it may produce wrong results in production run.
Back to top
View user's profile Send private message
balaji81_k

Active User


Joined: 29 Jun 2005
Posts: 155

PostPosted: Tue May 09, 2023 4:21 pm
Reply with quote

Hi Sergeyken,
I have removed the DISTINCT Clause. Thankyou .
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2020
Location: USA

PostPosted: Wed May 10, 2023 11:49 pm
Reply with quote

Just to make this tricky thing more clear. Let's say we have the following input data:
Code:
Connection_TS             Type   Call_ID   
2023-04-20 09:49:40 UTC   A      190   
2023-04-20 09:49:41 UTC   B      190   
2023-04-20 09:49:41 UTC   A      191   
2023-04-20 09:49:42 UTC   B      191 
2023-04-20 09:49:42 UTC   A      192   
2023-04-20 09:49:43 UTC   B      192 
2023-04-20 09:49:43 UTC   A      193   
2023-04-20 09:49:44 UTC   B      193 
2023-04-20 09:49:44 UTC   A      194   
2023-04-20 09:49:45 UTC   B      194 
2023-04-20 09:49:45 UTC   A      195   
2023-04-20 09:49:46 UTC   B      195 
2023-04-20 09:49:46 UTC   A      196   
2023-04-20 09:49:47 UTC   B      196 
2023-04-20 09:49:47 UTC   A      197   
2023-04-20 09:49:48 UTC   B      197 
2023-04-20 09:49:48 UTC   A      198   
2023-04-20 09:49:49 UTC   B      198 
2023-04-20 09:49:49 UTC   A      199   
2023-04-20 09:49:59 UTC   B      199 

After running the inner SELECT without DISTINCT keyword, the intermediate result should look like
Code:
connected_date   time_stamp_diff   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20        1   
2023-04-20       10

Hence, the avg(time_stamp_diff) calculated by the outer SELECT must be (10+9)/10=1.90 sec

After running the inner SELECT with DISTINCT keyword, the intermediate result should look like
Code:
connected_date   time_stamp_diff   
2023-04-20        1   
2023-04-20       10

In that (incorrect) case, the avg(time_stamp_diff) calculated by the outer SELECT must be (10+1)/2=5.50 sec
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DB2

 


Similar Topics
Topic Forum Replies
No new posts Replacing 'YYMMDD' with date, varying... SYNCSORT 3
No new posts To get the count of rows for every 1 ... DB2 3
No new posts Modifying Date Format Using DFSORT DFSORT/ICETOOL 9
No new posts Need to convert date format DFSORT/ICETOOL 20
No new posts Need help to append a date&tsp at... DFSORT/ICETOOL 9
Search our Forums:

Back to Top