Solving A Hard SQL Problem | SQL ON OFF Problem | Magic of SQL
Вставка
- Опубліковано 23 тра 2022
- In this video we will see the magic of SQL. You will realize how adorable and beautiful SQL is. Fall in love with SQL with this video.
Zero to hero(Advance) SQL Aggregation:
• All About SQL Aggregat...
Most Asked Join Based Interview Question:
• Most Asked SQL JOIN ba...
Solving 4 Trick SQL problems:
• Solving 4 Tricky SQL P...
Data Analyst Spotify Case Study:
• Data Analyst Spotify C...
Top 10 SQL interview Questions:
• Top 10 SQL interview Q...
Interview Question based on FULL OUTER JOIN:
• SQL Interview Question...
Playlist to master SQL :
• Complex SQL Questions ...
Rank, Dense_Rank and Row_Number:
• RANK, DENSE_RANK, ROW_...
script:
create table event_status
(
event_time varchar(10),
status varchar(10)
);
insert into event_status
values
('10:01','on'),('10:02','on'),('10:03','on'),('10:04','off'),('10:07','on'),('10:08','on'),('10:09','off')
,('10:11','on'),('10:12','off');
#sql #dataengineer #magic - Наука та технологія
Please like the video if you learnt something new 🙂
Learnt so many new things but I could like it only once 😜
event_time is varchar and still order by event_time is working?
another approach-
with temp as(
select *,
rank() over( order by event_time) as rnk,
right(event_time,1)-rank() over( order by event_time) as flag from event_status order by event_time)
select min(event_time) as login, max(event_time) as logout, count(status)-1 as cnt from temp
group by flag
Amazing Ankit. Thanks for sharing. Really appreciate all your content and hardwork that you are doing for community
such an amazing and important concept! thanks for explaining this Ankit❤
After seeing your explanation , even harder problem seems to be easier . Keep going man .🥊🥊🥊
🙏🙏
Hi Ankit, great video. I did the grouping using right(time,2)-row_number over( order by right(time,2))since it was varchar and there is no continuous time. It will fail when there is a continuous 10.05 ON event after an 10.04 off because the groups are a difference of continuius row numbers that will still be continuous. So your approach makes more sense in this case.
Brilliantly put together.. SQL baba rocks !!
Haha 😃
with cte as(
select *, sum(new_status) over(order by event_time) as running_sum from(
select *,
case when status = 'on' and lag(status) over(order by event_time) = 'off' then 1 else 0 end as new_status
from event_status)temp)
select min(event_time) as login, max(event_time) as logout, count(running_sum)-1 as cnt
from cte
group by running_sum;
Nice. This is an excellent example of Gaps & Island type problems. I have seen another way to solve where they develop a general counter & than a sequence wise counter & than take a difference to id a specific sequence. This is nice & more straightforward. Usage of case within sum makes it more simpler to grasp.
Summary as I get it:
Step 1: Get previous status using lag
Step 2: In order to create an id for the sequence of on/off, use case within SUM to create a group key. This is basically continuously checking if the order is changing from previous value.
Step 3: Use the group key to get anything.
Very well put. Thank you 😊
excellent question. after years some question stumped me
Hi bansal i used to struggle to cretae group keys but idea of running total is purely genius thank you
Great 👍
Keep going you will grow
This concept was new to me😊. Thank you!
😊
Thanks ..
@AnkitBansal, man you got brains.. every problem statement that you analyze is amazing, Your approach about a problem is very constructive and has simple plan. I am running out of praise word for you. LOVE ALL YOUR videos. Keep rocking big guy.
Thanks a lot for your kind words 🙂
My solution using lead:
select
min(event_time) as log_on
,max(next_event_time) as log_off
,count(1) as cnt
from (
select *
, sum(case when status='off' and next_status = 'on' then 1 else 0 end) over(order by event_time) as grp
from (
select
*
,lead(status,1, status) over(order by event_time) as next_status
,lead(event_time,1, event_time) over(order by event_time) as next_event_time
from event_status
) a
) b
where status 'off'
group by grp
Great video as concept of running sum with lag function to create the groups is used which is very tricky as first sight
i solved it using the traditional idea of creating a sequence of row_number order by event_time asc and take the diff of this with right(event_time,2) and do the aggregation over the groups to get the login,logout and event count for on events.
I also thought about this approach but this will not be applicable in real world.. as event time can have max of 60 in mins/sec value and if hour changes it will start repeating and hence row number starts giving discrepancies.😅
Ankit u r a SQL Revolutionist,u have created a revolution in field of SQL.
Thanks Man👍
Thank you 😊❤️
Creating group_key using case when inside sum() was ingenious.
Here is an alternative approach of using row_number() to create a group key:
with cte as (
select *
,row_number() over(order by event_time) - row_number() over(partition by status order by event_time) grp
,lead(event_time) over(order by event_time) next_event_time
from event_status )
select min(event_time) Log_in, max(next_event_time) Log_out, count(1) Cnt
from cte
where status = 'on'
group by grp
order by Log_in;
Another solution -
with cte as (
select lag(event_time,1,'10:00') over(order by event_time) as starttime, event_time as logout,status, row_number() over(order by event_time) as grp from event_status where status = 'off'
), cte1 as (
select min(event_time) as login, logout, count(event_time) as cnt, grp from cte inner join event_status on event_time between starttime and logout
where event_status.status = 'on'
group by logout, grp)
select login, logout, cnt from cte1
generating Group_key was new trick for me. Thanks for sharing
Interesting
my solution:
with cte as(
select *,minute(event_time)-row_number() over() as rn from event_status)
select min(event_time) as login,max(event_time) as logout,count(1)-1 as cnt
from cte group by rn;
Note; for my comfort calculations I tool time into numbers (ex; 10:01 as 1001) In last I changed to time
;with cte as(
select min(event_time)mtime,max(event_time)mxtime,count(9)-1 total_logins from (
select event_time,
event_time-ROW_NUMBER() over (order by event_time) grp_cl
,status from event_status)ar
group by grp_cl)
select concat(left(mtime,2),':',right(mtime,2) ) as logintime,
concat(left(mxtime,2),':',right(mxtime,2) ) as logouttime ,
total_logins
from cte
Here is a simple but less optimized query
with rank_cte as (
select e.*, s.event_time as next_off, row_number() over (partition by e.event_time order by e.event_time) as rnk
from event_status e
left join event_status s
on e.event_time < s.event_time
and s.status='off'
)
select min(event_time) as login, next_off as logout, count(1) as cnt
from rank_cte
where rnk=1 and status='on'
group by next_off;
Thank you Sir
Your You tube channel is one of the best channel to learn SQL
Keep watching
This was a very good question,showing an alternate solution :
with off_data as
(
select event_time,status,row_number() over (order by event_time asc) as rank_off
from event_status
where status='off'
)
,base as
(
select event_time
,status
,rank_off
, min(rank_off) over (order by rank_off desc ) as new_rank
from
(select t1.event_time
,t1.status
,t2.rank_off
from event_table t1
left join event_table t2 on t1.event_time=t2.event_time and t1.status=t2.status
)
group by 1,2,3
)
select login, logout,cnt from
(
select new_rank
,min(event_time) as login
,max(event_time) as logout
,count(case when status='on' then event_time end) as cnt
from base
)
Amazing Stuff Sir.
Glad you liked it!
with cte as
(select *, subs-row_number() over (order by subs) diff from
(
select *, cast(substring(event_time,4,2) as int) subs from event_status
) tab1),
cte2 as (select *,
(select count(*) from cte as i where status = 'on' and i.diff = cte.diff) cor
from cte)
select min(event_time), max(event_time), cor from cte2
group by cor order by 3 desc
It was magical how powerful SQL can be, thank you :)
Same feeling 😍🔥🔥
Thanks for sharing 😊
🙏🙏
Awesome! learnt a new trick to create group keys!
Exactly 😊
I have been working in SQL for quite long but amazed to see creating group key like this...thank you
Took time to understand the group key using sum 😀
awesome solution Ankit!
Thank you 😊
with cte as(
select *, (case when previous_status='on' then 1 else 0 end) as checks,
sum(case when status='on' and previous_status='off' then 1 else 0 end) over (order by event_time) as group_key from
(select *, LAG(status,1) over (order by event_time) as previous_status from event_status) a
)
select min(event_time) as log_in_time,max(event_time) as log_out_time, sum(checks) as on_count from cte
group by group_key
My approach for this:
with cte as
(
select *
,row_number() over(order by event_time) as rn
from event_status
),
cte2 as
(
select *,
row_number() over(order by event_time) as rn2
from cte
where status='on'
)
select min(event_time),
concat(left(replace(max(event_time),':','')+1,2),':',right(replace(max(event_time),':','')+1,2))
,count(*)
from cte2
group by (rn-rn2)
I was able to create the first part but got stuck while creating group keys, thank you so much for explaining the same, learned a new concept today. Thank you :)
🙏🙏
Same here, was able to get the LAG() part but group key concept is next LEVEL.
@ANKIT bhaiya you are giving NEW concepts in every video.
Dil se dhanyawaad.
Great Explanation Ankit. Thank You very much. Waiting for 5th September to gift you some thing.
This is the biggest compliment I have got . Thank you for the love ❤️
Hi Ankit,
here's my solution
with cte as
(
SELECT *,ROW_NUMBER() over(order by event_time) as timing FROM EVENT_STATUS
)
,cteone as
(
select event_time,format(dateadd(minute,-timing,event_time),'hh:mm') as va,status from cte
)
select min(event_time) as login,max(event_time) as logout,count(va) as cnt from cteone
group by va
This is similar to real project problem which i worked on.That time i couldn't resolve it but now no more.Thanks for the video.It really helps.
Cool
with cte as(
select event_time, status, sum(case when status='on' then 0 else 1 end) over (order by event_time) as grp
from event_status)
select min(event_time) as in_time , max(event_time) as out_time, count(1)-1 as cnt
from cte
group by (case when status='on' then grp else grp-1 end)
**Using Running Total**
with cte1 as(
select event_time,status,lag(status,1,status) over(order by event_time) as prev_status from event_status
),cte2 as(
select *,case when status='on' and prev_status='off' then 1 else 0 end as flag from cte1
),cte3 as (
select *,sum(flag) over(order by event_time rows BETWEEN unbounded preceding and 0 preceding ) as running_total from cte2
)
select min(event_time) as logintime,max(event_time) as logouttime,count(status)-1 as cnt from cte3 group BY
running_total
in count if we could have used case statement it would have been better for user understanding.I suppose we are doing count(1)-1 becuse we know status filed has the on/off values
Here is my approch. I solved this one using gaps and island concept ( row number and dense rank). Although it is lil bit lengthy :
with cte as (
select
event_time
,status
,dr
,dense_rank () over(order by rn-dr) dif
from
(select event_time
,status
,row_number() over(order by event_time) as rn
,dense_rank() over(partition by status order by event_time) as dr
from event_status)a),
cte2 as (select
dif as mindif
,min(event_time) event_time
,count(dif) cnt
from cte
where status='on'
group by dif)
select
b.event_time as login,
a.event_time as logon,
b.cnt as cnt
from cte a join cte2 b
on a.dr = b.mindif
where status ='off'
with cte as (select
*, lag(status,1, status) over(order by event_time) as prev_status
from event_status),
cte2 as (
select
*, case when status = 'on' and prev_status = 'off' then 1 else 0 end as group_id,
sum(case when status = 'on' and prev_status = 'off' then 1 else 0 end) over(order by event_time) as group_id_sum
from cte)
select
min(event_time) as login_t,max(event_time) as logout_t, count(case when status = 'on' then 1 end) as on_cnt
from cte2
group by group_id_sum;
Interesting question explained in an easy way.. Thanks🙏
It will be really helpful if you could u make video on implementation of SCD TYPE 1 and TYPE 2 in SQL.
Thanks in advance.
Yes will do that
V
❤️🩹🤜🏼😍👊😊😊😅😅😅😅😅😩😩😩pl
Using only CTEs:
with event_prev_cte as (
select *,lag(status,1,status) over (order by event_time) as prev_status from event_status),
group_cte as (
select *, sum(case when status='on' and prev_status='off' then 1 else 0 end) over (order by event_time) as group_key
from event_prev_cte)
select min(event_time) as login_time,
max(event_time) as logout_time,
count(*)-1 as cnt
from group_cte
group by group_key
order by login_time;
Thank you 😊
Hi Ankit,
;with cte_1 as
(
select *, lag(status,1,status) over (order by event_time) pre_status from event_status
)
,cte_2 as
(
select *,sum(case when status ='on' and pre_status ='off' then 1 else 0 end ) over (order by event_time) cnt from cte_1
)
select min(event_time) log_in, max(event_time) log_out ,count(cnt)-1 count from cte_2
group by cnt
Thank you
Creating Island:
with cte as (
select *, row_number() over() as rn from event_status),
cte1 as(
select *, rn- row_number() over() as grp from cte where status'off' ),
cte2 as(
select min(event_time) as 'On_time',count(grp) as cnt from cte1 group by grp),
cte3 as(select * from event_status where status='off')
select t1.On_time,t1.cnt,min(t2.event_time) as 'Off_time' from cte2 t1 join cte3 t2 on t2.event_time>t1.On_time
group by t1.On_time,t1.cnt;
super 🤩
Thank you! Cheers!
with cte as (
select *
,min(event_time)over(partition by status , new_id ) as s_time
,max(event_time)over(partition by status , new_id ) as e_time
,count(*)over(partition by status , new_id) as cnt
from (select *
,id-row_number()over(partition by status order by id ) as new_id
from (
select *
,row_number()over() as id
from event_status) as t
) as t2
)
select s_time,e_time,max(cnt)as cnt from cte
where status='on'
group by s_time,e_time
order by id
wow
with t1 as (
select *,row_number() over()-right(event_time,1) as p from event_status)
select min(event_time) as login,max(event_time) as logout ,sum(status='on') as cnt from t1 group by p
Mysql solution
with base as (select *,
cast(substr(event_time,4,2) as UNSIGNED) as sub_event_time,
row_number() over() as id from event_status),
base_diff as (
select *,sub_event_time - id as diff from base)
select min(event_time) as login,max(event_time) as logout,sum(case when status = 'on' then 1 else 0 end) as cnt from base_diff group by diff;
Great tutorial Ankit! Quick question..is there any reason that you begin a line with a comma?
Thanks John. Its just a habit. Nothing specific 😃
Hello Ankit . Questions are tricky .. see this solution without LEAD /LAG .
SELECT
MIN(event_time) AS start_time,
off_time AS end_time
FROM event_status e1
CROSS APPLY (SELECT
MIN(e2.event_time) AS off_time
FROM event_status e2
WHERE e2.event_time > e1.event_time
AND e1.status = 'on'
AND e2.status = 'off'
) t
WHERE off_time IS NOT NULL
GROUP BY off_time
Nice, Just add count(1) to get the on_count. :)
i didnt get -1 in your count(1) can u explain briefly
Please find below my solution,
with cte as
(select *,right(event_time,2) - row_number() over() as time
from event_status)
select min(event_time) as login,max(event_time) as logout,count(*)-1 as cnt
from cte
group by time;
It works for me.. can we query like above, ankit?
Bhai bhai bhai 🫡
with event_cte_2 as
(select *,lag(status,1,status) over(order by event_time asc) as prev_status from event_status),
event_cte as
(select event_time, status ,prev_status,
sum(case when status='on' and prev_status='off' then 1 else 0 end) over(order by event_time) as status_cnt from event_cte_2)
--select * from event_cte
select min(event_time) as login,max(event_time) as logout,count(0)-1 as on_cnt from event_cte group by status_cnt
Don't have enough words to thank you !!! Learning new things every day from your videos . I gave another approach using 2 CTE's .
select min(event_time), max(endt), count(1)
from (
select *, rn-ROW_NUMBER() over(order by event_time) rn2
from (
select *
,lead(event_time,1,event_time) over(order by event_time) endt
,ROW_NUMBER() over(order by event_time) rn
from event_status) t
where status = 'on') t2
group by rn2
Hi Ankit...Thanks for the content....My approach
WITH CTE_RANK AS(
SELECT event_time,status,DENSE_RANK() OVER(ORDER BY event_time ASC) AS row_no,
DENSE_RANK()OVER(PARTITION BY status ORDER BY event_time) AS status_rn
FROM event_status
),
CTE_DIFF AS(
SELECT *,CASE WHEN status='on' THEN (row_no-status_rn) ELSE (status_rn-1) END AS diff
FROM CTE_RANK)
SELECT min(event_time)AS login,max(event_time) AS logout,COUNT(*)-1 AS cnt
FROM CTE_DIFF
GROUP BY diff
Looks good 😊
Other Approach:
with cte as (
select *,ROW_NUMBER() over(order by event_time ) as rn
,DENSE_RANK() Over ( partition by status order by event_time ) as drn
from event_status
)
, cte2 as (
select * , case when status ='off' then lag(rn-drn) over(order by event_time) else rn-drn end as grpkey
from cte
)
select min(event_time) as log_in, max(event_time) as log_out, count(1)-1 as cnt
from cte2
group by grpkey
Nice challenging question for sure!
with cte as
(
select event_time,status,
case when status='on' then 0 else 1 end as sum1,
sum(case when sum1=1 then 1 else 0 end )over (order by event_time rows between unbounded preceding and current row) as sum2,
case when status='off' then sum2-1 else sum2 end as sum3
from temp.event_status )
select sum3,min(event_time) as login ,max(event_time) as logoff,count(*)-1 as ontime from cte
group by 1
Thanks for posting 👏
My solutiuon:
WITH cte AS (
SELECT
event_time,
status,
ROW_NUMBER() OVER(ORDER BY event_time) AS row_num,
CASE
WHEN status LAG(status, 1, 'x') OVER (ORDER BY event_time) THEN 1
ELSE NULL
END AS change_flag
FROM event_status
),
transitions AS (
SELECT
event_time AS logon_time,
status,
LEAD(event_time) OVER(ORDER BY event_time) AS logoff_time,
LEAD(row_num) OVER(ORDER BY event_time) - row_num AS duration
FROM cte
WHERE change_flag IS NOT NULL
)
SELECT logon_time, logoff_time, duration
FROM transitions
WHERE status = 'on';
I wrote this before seeing your solution, but was only possible due to the learnings I have done from your past videos 🙏🙂 Please let me know if it's a too complex solution?
WITH timer_data_1 AS
# CONVERTING TO ACTUAL TIME
(
SELECT
CAST(CONCAT(CURRENT_DATE(),' ',timer,':00') AS TIMESTAMP) AS timer,
RANK() OVER (PARTITION BY status ORDER BY timer) -1 AS rnk,
status,
FROM
`timer_table`
),
# FINDING RANK BASED DIFFERENCE FOR TIMER TO CREATE COMMON GROUP/BUCKET OF DATA
timer_data_2 AS
(
SELECT
timer,
timer - CAST(CONCAT(CURRENT_DATE(),' ','10:0',rnk,':00') AS TIMESTAMP) AS timer_diff,
status
FROM timer_data_1
),
# RANKING AGAIN BASED ON THE GROUPED/BUCKETED DATA
timer_data_3 AS
(
SELECT
timer,
status,
ROW_NUMBER() OVER (PARTITION BY timer_diff ORDER BY timer) AS rnk,
FROM timer_data_2
)
# EXTRACTING RELEVANT INFORMATION & EXCLUDING NULL RECORDS
SELECT
time_on,
time_off,
TIMESTAMP_DIFF(time_off,time_on, MINUTE) AS running_time
FROM
(
SELECT
(CASE WHEN status = 'on' THEN timer END) AS time_on,
LEAD((CASE WHEN status = 'off' THEN timer END)) OVER (ORDER BY timer) AS time_off
FROM timer_data_3
WHERE rnk = 1
)
WHERE time_on IS NOT NULL
ORDER BY 1
Good attempt. A bit long 😊
with cte1 as(
select *, lag(status,1) over(order by event_time) lg, lead(status,1) over(order by event_time) ld,
row_number() over(order by event_time) as rn
from event_status
),
cte2 as(
select case when status='on' and (lg!='on' or lg is NULL) then event_time else NULL end as start_time,
case when status='off' and lg='on' then event_time else NULL end as end_time,
rn from cte1
),cte3 as(
select * from(select row_number() over(order by start_time) rn1, start_time,rn as st from cte2 where start_time is not NULL) a
join (select row_number() over(order by end_time) rn2, end_time,rn as et from cte2 where end_time is not NULL) b
on a.rn1=b.rn2
)
select start_time,end_time,et-st from cte3
Bhai CTE ka name xxx mast tha😂😂😂
Hahaha
drop table event_status;
create table event_status
(
event_time varchar(10),
status varchar(10)
);
insert into event_status
values
('10:01','on'),('10:02','on'),('10:03','on'),('10:04','off'),('10:07','on'),('10:08','on'),('10:09','off')
,('10:11','on'),('10:12','off');
--input
EVENT_TIME STATUS
---------- ----------
10:01 on
10:02 on
10:03 on
10:04 off
10:07 on
10:08 on
10:09 off
10:11 on
10:12 off
--solution
with temp as ( --calculate current and previous state
select EVENT_TIME,STATUS,coalesce(lag(STATUS) over(order by EVENT_TIME),'XX') prev_status
from event_status
),
temp1 as ( select EVENT_TIME,STATUS,prev_status,
-- set flag to 1 when status changes from off to on only
case when prev_status='off' and STATUS='on' then 1 else 0 end flag
from temp
),
temp2 as (
select EVENT_TIME,STATUS,
sum(flag) over( order by EVENT_TIME) flag1 -- every change in state will get new number
from temp1 order by EVENT_TIME
)
select min(EVENT_TIME) login,max(EVENT_TIME) logout,
sum(case when upper(STATUS)='ON' then 1 else 0 end) cnt
from temp2
group by flag1
LOGIN LOGOUT CNT
---------- ---------- ----------
10:01 10:04 3
10:07 10:09 2
10:11 10:12 1
SQL>
Thanks for putting step by step solution 🙂
@@ankitbansal6 🙂