View previous topic :: View next topic
|
Author |
Message |
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Hi Team,
I am working on the call log table where we have connection timestamp between customer(A) and agent (B) at two different TS .Need is to calculate the difference between two connection ts and take the average by date .
Table Data looks like :-
Code: |
Connection_TS Type Call_ID Diff in connection_ts_in_seconds
2023-04-20 09:49:40 UTC A 190 11
2023-04-20 09:49:51 UTC B 190
2023-04-20 09:49:40 UTC A 191
2023-04-20 09:49:45 UTC B 191 5
AVG 11+5/2 = 8
|
expected output :-
Code: |
DATE_Connection_TS AVG(time_stamp_diff)
20-04-2023 8
|
Getting difference on TS:-
Code: |
--** Tried with adding only call_id, i am getting difference as expected -no issues
select call_id,
timestamp_diff(max(connected_ts), min(connected_ts), second) as ts_diff_seconds
from db.call_log table
where call_id is not null
and connected_at is not null
group by call_id
order by call_id
;
|
However if i add date extracted from connection_ts , sql considers each row as unique and getting 0.0 as output
Code: |
select Date(Connected_ts) as connected_ts_date,
timestamp_diff(max(connected_ts), min(connected_ts), second)/
count(distinct call_id) as avg_ts_diff_seconds"
from db.call_log table
where call_id is not null
and connected_at is not null
group by connected_ts_date
order by connected_ts_date
;
|
Please do help/advise me on how i can achieve the output .
Thanks
Balaji k |
|
Back to top |
|
|
Rohit Umarjikar
Global Moderator
Joined: 21 Sep 2010 Posts: 3077 Location: NYC,USA
|
|
|
|
I don’t find you are referencing column names consistently right in all your queries from the input table data .
Second , you don’t need spaces in last column so you can filter that as well .. you may represent the column names as correct as used in query. |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2153 Location: USA
|
|
|
|
From you last SELECT code one cannot understand your intentions…
Definitely, one single SELECT is not able to handle several different grouping and distinct fields; you may need to use extra sub-SELECT, or VIEWS, or WITH, or some other tricks.
Please, give a clear sample of your input, and desired output. |
|
Back to top |
|
|
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Hi Sergeyken,
I tried with inner join but still i am not getting the expected output.
Table data :-
Code: |
id connected_at call_id
1 2023-05-05 10:07:15.891023 UTC 1
2 2023-05-05 10:07:46.610037 UTC 1
3 2023-05-05 10:09:27.057297 UTC 2
4 2023-05-05 10:09:46.057827 UTC 2
|
SQL:-
Code: |
select
date(prev.connected_at) as connected_date,
avg(round(abs(timestamp_diff(prev.connected_at,cur.connected_at,second)),2)) as average_seconds
from dev.call_log_table cur
inner join dev.call_log_table prev
on cur.id = prev.id + 1
and cur.connected_at <> prev.connected_at
group by prev.connected_at
having prev.connected_at is not null
order by prev.connected_at
;
|
Actual result:-
I am not sure why i am getting row 2 with AVG of 100% .
Code: |
connected_date average_seconds
05-05-2023 30
05-05-2023 100
05-05-2023 19
|
expected output:-
Code: |
connected_date average_seconds
05-05-2023 24.5
|
Please advise and help. |
|
Back to top |
|
|
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Basically, i need to find the difference of connected_at which is timestamp and then group by call_id and take the average of difference by date .
Please correct me if i need to use call_id alone. I feel strongly by call_id and then find TS difference and take average by call_id and then calculate the average = TS(Difference) by call_id/ total call_id of the day |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2153 Location: USA
|
|
|
|
As I’ve mentioned above, you need the construction of two SELECTs, the outer one, and the inner one.
Code: |
SELECT … FROM (SELECT … FROM call_log_table …) …; |
The inner SELECT must be similar to your very first example, extracting time difference values grouped by call_id. Next, the outer SELECT must calculate your average values (from what is returned by the inner SELECT), and grouped by connection date values.
Once again: A SINGLE SELECT CANNOT PROVIDE TWO DIFFERENT GROUPINGS AND/OR DISTINCTS! |
|
Back to top |
|
|
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Hi Sergeyken, I am trying the build SQL with subselect (inner and outer SQL's) but the outer SQL fails with reference column (connect_at) is not found and it is not part of inner query.
Can you please help me on how to get aggregation by date(conected_at).
Code: |
select
extract(date from connected_at) as connected_date,
avg(t.time_stamp_diff)
from
(
select call_id, timestamp_diff(max(connected_at), min(connected_at), second) as time_stamp_diff
from dev.call_log_table
where call_id is not null
and connected_at is not null
and ended_at is not null
group by call_id
order by call_id
) AS t
group by connected_date <- error (reference column not found in inside Query)
order by 1
;
|
Please do help me on how to refer correctly. |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2153 Location: USA
|
|
|
|
I deliberately don’t want to give you a ready-to-copy-and-paste solution, but insist you to construct it by yourself. Otherwise you’ll never be able to complete a similar task for other projects.
Please, try to do it step by step, not throwing all together in one huge pile.
Step 1. Write and debug separately the SELECT supposed to become the inner part of your future combined query.
Step 2. Do it only after step 1 is 100% done. Write the outer SELECT, using the SQL code from step 1 within the brackets after the outer FROM keyword.
The SQL for step 1 may be like this one (though I hate to give out the finished code…)
Code: |
SELECT date( min(connected_at) ) as connected_date,
timestamp_diff( max(connected_at), min(connected_at), second) as time_diff
// call_id value is not needed outside of this inner query
FROM dev.call_log_table
WHERE … IS NOT NULL
GROUP BY call_id
// order by is not needed here
; |
ONLY AFTER YOU HAVE FINISHED THE DEBUGGING OF STEP 1 at 100%, you can try combining it as part of step 2.
Code: |
SELECT t.connected_date,
avg(t.time_diff) as interval
FROM ( insert code from step 1 here) as t
GROUP BY t.connected_date
ORDER BY 1
; |
I do not have access to DB2 right now, and I am on vacation, out of the country. If something is missing or mistyped in my code hints, I hope you should be able to fix it.
Good luck! |
|
Back to top |
|
|
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Thankyou Sergeyken, its working . I understand on taking the distinct values and calculate the average as we have two entries for each call_id. I calculated randomly for 5 days and it is matching .
Code: |
SELECT t.connected_date,
round(avg(t.time_stamp_diff),2) as Avg_Time_in_seconds
FROM ( select DISTINCT
DATE(min(connected_at)) as connected_date,
timestamp_diff(max(connected_at), min(connected_at), second) as time_stamp_diff
from dev.call_log_table
where call_id is not null
and connected_at is not null
and ended_at is not null
group by call_id
) as t
GROUP BY t.connected_date
ORDER BY 1
;
|
|
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2153 Location: USA
|
|
|
|
The extra DISTINCT keyword is wrong. It may eliminate required values when the same time_difference amounts do appear more than once on the same date.
This operation must be done ONLY via GROUP BY t.connected_date clause!
This situation remains unnoticed during a primitive test with limited input data, but it may produce wrong results in production run. |
|
Back to top |
|
|
balaji81_k
Active User
Joined: 29 Jun 2005 Posts: 155
|
|
|
|
Hi Sergeyken,
I have removed the DISTINCT Clause. Thankyou . |
|
Back to top |
|
|
sergeyken
Senior Member
Joined: 29 Apr 2008 Posts: 2153 Location: USA
|
|
|
|
Just to make this tricky thing more clear. Let's say we have the following input data:
Code: |
Connection_TS Type Call_ID
2023-04-20 09:49:40 UTC A 190
2023-04-20 09:49:41 UTC B 190
2023-04-20 09:49:41 UTC A 191
2023-04-20 09:49:42 UTC B 191
2023-04-20 09:49:42 UTC A 192
2023-04-20 09:49:43 UTC B 192
2023-04-20 09:49:43 UTC A 193
2023-04-20 09:49:44 UTC B 193
2023-04-20 09:49:44 UTC A 194
2023-04-20 09:49:45 UTC B 194
2023-04-20 09:49:45 UTC A 195
2023-04-20 09:49:46 UTC B 195
2023-04-20 09:49:46 UTC A 196
2023-04-20 09:49:47 UTC B 196
2023-04-20 09:49:47 UTC A 197
2023-04-20 09:49:48 UTC B 197
2023-04-20 09:49:48 UTC A 198
2023-04-20 09:49:49 UTC B 198
2023-04-20 09:49:49 UTC A 199
2023-04-20 09:49:59 UTC B 199 |
After running the inner SELECT without DISTINCT keyword, the intermediate result should look like
Code: |
connected_date time_stamp_diff
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 1
2023-04-20 10 |
Hence, the avg(time_stamp_diff) calculated by the outer SELECT must be (10+9)/10=1.90 sec
After running the inner SELECT with DISTINCT keyword, the intermediate result should look like
Code: |
connected_date time_stamp_diff
2023-04-20 1
2023-04-20 10 |
In that (incorrect) case, the avg(time_stamp_diff) calculated by the outer SELECT must be (10+1)/2=5.50 sec |
|
Back to top |
|
|
|