Solving a FAANG Level SQL Question | Aam vs Mentos Zindagi | Analytics
Вставка
- Опубліковано 5 лип 2024
- Today we are going to discuss a very interesting SQL problem asked in namaste SQL Instagram's page. We will solve it using 2 methods.
00:00 Understand the problem
01:50 Aam Solution
08:00 Mentos Solution
Here is the script:
create table clocked_hours(
empd_id int,
swipe time,
flag char
)
insert into clocked_hours values
(11114,'08:30','I'),
(11114,'10:30','O'),
(11114,'11:30','I'),
(11114,'15:30','O'),
(11115,'09:30','I'),
(11115,'17:30','O');
Zero to hero(Advance) SQL Aggregation:
• All About SQL Aggregat...
Most Asked Join Based Interview Question:
• Most Asked SQL JOIN ba...
Solving 4 Trick SQL problems:
• Solving 4 Tricky SQL P...
Data Analyst Spotify Case Study:
• Data Analyst Spotify C...
Top 10 SQL interview Questions:
• Top 10 SQL interview Q...
Interview Question based on FULL OUTER JOIN:
• SQL Interview Question...
Playlist to master SQL :
• Complex SQL Questions ...
Rank, Dense_Rank and Row_Number:
• RANK, DENSE_RANK, ROW_...
#sql #dataengineer
Enroll here for Python for Data Analytics LIVE bootcamp starting Jan 6th 2024:
www.namastesql.com/courses/Namaste-Python-Zero-to-hero---Jan-654f1c5fe4b0a57624c36e31
👉 Actual Course Price: ₹4999/-
👉 Grab now: ₹3350/- [code : NY2024 , 33% Discount]
Classes will be on weekends at 11am.
* Course Key Highlights*
✅ 14+ Live sessions with lifetime access to recordings.
✅ No pre-requisite required ( Learn From Absolute Scratch)
✅ Learn Functional programming
✅ Includes 2 portfolio projects on real datasets
✅ Live Doubts sessions
✅ Learn Advanced Data Analysis libraries like Numpy and Pandas
✅Learn ETL (Extract , Transform ,Load) with Python
Suppose we have one employee who forgot to swipeout , then how would we mark that employee as singleswipe?
Thanks Ankit for all these interesting questions! This is my solve using lead function:
with mycte as
(
select *,
case when flag = 'I' then (lead(swipe,1) over(order by empd_id)) else '00:00:00.0000000' end as logout_time
from clocked_hours
)
select sum(DATEDIFF(HOUR, swipe, logout_time)) as total_time_logged, empd_id
from mycte
where flag = 'I'
group by empd_id
Sir apne moj krdi.
Nice concept of Aam Zindagi and Mentos Zindagi.
I will definitely share with other
Thanks Ankit! for all the questions. Truly appreciate all your efforts!
I tried with lag,
select empd_id, sum(swipe-swipein) as total_hrs
from (select empd_id, swipe,flag , case when flag = 'O' then lag(swipe,1) over (partition by empd_id order by swipe) else swipe end as swipein
from clocked_hours)
where swipe != swipein
group by empd_id
;
make the numbers in hour
Hi Ankit sir,
My approach was to group by empid and sum(swipe) where flag is 'I' and sum(swipe) where flag is 'o' and subtract these two
Hi Ankit Sir , Love your videos
;with cte as (
select *
,case when flag='I' then lead(swipe,1)over(partition by empd_id order by swipe) else null end as punchout
from clocked_hours)
select empd_id, sum (DATEDIFF(hour,swipe,punchout)) as no_of_hours
from cte
group by empd_id
Amazing Ankit , at first i also thought of doing it by lead . Thanks for such videos. Keep it up. 😊
My pleasure 😊
Thanks Ankit.
select empd_id,
sum(Timestampdiff(hour,x.swipe,x.check_out))
from(
select *,
lead(swipe) over(partition by empd_id order by swipe) as check_out
from clocked_hours) x
where x.flag = 'I'
group by empd_id;
Thanks Ankit, I solved the mentos solution on my own and checked yours and mine were the same.
cheers
Excellent 👍
HI Ankit I solved in this way in mysql without watching answer
with cte as (select *,lag(swipe,1,0) over(partition by empd_id order by swipe) as lag_swipe,lag(flag,1,0) over(partition by empd_id order by swipe) as prev_flag from clocked_hours) select empd_id,sum(case when flag = 'O' and prev_flag = 'I' then round(timediff(swipe,lag_swipe) / 10000) else 0 end) as time_period from cte group by empd_id;
my solution using lead window function
select empd_id,sum(diff) from (
SELECT empd_id,swipe,lead,flag,datediff(hour,swipe,lead) as diff
FROM (
SELECT empd_id, swipe, LEAD(swipe) OVER (ORDER BY empd_id) AS "lead", flag
FROM clocked_hours
) AS r1 where flag='I') as r2
group BY empd_id
Thanks Ankit , here is my approach:
with cte1 as
(
select
*,
dense_rank() over(partition by empd_id, flag order by swipe) as rk
from clocked_hours
)
select
t1.empd_id, round(sum(time_to_sec(timediff(t2.swipe, t1.swipe))/3600)) as hours_loged
from cte1 t1 join cte1 t2 on t1.empd_id = t2.empd_id and t1.rk = t2.rk and t1.flag!=t2.flag
where t1.flag < t2.flag
group by 1
Thanks Ankit, this was helpful.
Hey Ankit, thanks for posting such wonderful questions that help us to sharpen our skills.
Here is my approach:
with tempi as (
select empd_id, swipe,
rank() over(partition by empd_id order by swipe) as rnk
from clocked_hours
where flag = 'I'
)
, tempo as (
select empd_id, swipe,
rank() over(partition by empd_id order by swipe) as rnk
from clocked_hours
where flag = 'O'
)
select t1.empd_id, sum(timestampdiff(HOUR, t1.swipe, t2.swipe)) as clocked_hours
from tempi t1
join tempo t2
on t1.empd_id = t2.empd_id and t1.rnk = t2.rnk
group by t1.empd_id;
Nice
Nice..😊
I tried with using CTE,
with intime as(
select empd_id,swipe as intime,row_number() over(partition by empd_id) from clocked_hours
where flag='I'
),
outtime as(
select empd_id,swipe as outtime,row_number() over(partition by empd_id) from clocked_hours
where flag='O'
)
select outtime.empd_id, sum(outtime.outtime - intime.intime) as duration
from intime, outtime
where intime.empd_id=outtime.empd_id
and intime.row_number=outtime.row_number
group by 1
Hi Ankit ..Thanks for doing the FAANG question can you provide a set of questions from FAANG which we can try it out in the mean time and can see/refer your videos later..Thanks for all the videos Ankit
with cte as
(select * , lag (flag,1,flag) over (partition by empd_id order by swipe) as position,
lag (swipe,1,swipe) over (partition by empd_id order by swipe) as restou from clocked_hours
)
select empd_id,sum(case when flag ='I' AND POSITION ='O'
then 0 else -datediff(hour,swipe,restou) end ) as calcuate from cte
group by empd_id
Awesome 👍
Thank you! Cheers!
Hello! Ankit, this is really a good question.
Please check my approach
with cte as(select *, row_number()over(partition by empd_id,flag order by swipe) as rn
from clocked_hours)
select c1.empd_id, sum(datediff(hour,c1.swipe,c2.swipe)) as clocked_hours
from cte c1
inner join cte c2
on c1.empd_id=c2.empd_id and c1.rn=c2.rn and c1.swipe
This is good !!
Hi Ankit
Thank you for posting all the question, It really helps!
can you please make a video how can we import JSON data into SQL Server
Mentos zindagi ❤❤❤🎉🎉
with f as
(select *,lead(swipe) over (partition by empd_id order by swipe) lead
from clocked_hours)
, g as(select *,substr(lead,1,2)-substr(swipe,1,2) diff from f where flag = 'I')
select empd_id,sum(diff) clocked_hours from g group by empd_id
My solution:
select empd_id, round(sum(total_time)/10000,0) tot_time from (
select empd_id, flag,round(abs(swipe-lead(swipe,1) over (partition by empd_id)),0) total_time from clocked_hours) t1
where flag='I'
group by empd_id
with cte as (
select Empd_id,
case
when
flag = 'I' then swipe
else null end as in_time,
case
when
flag = 'O' then swipe
else null end
as OUT_time
from [dbo].[clocked_hours])
select empd_id, min(in_time) as in_time, max(OUT_time) as OUT_time , datediff(hour, min(in_time) , max(out_time)) as total_hours
from cte
group by empd_id
your total hours for 11114 is coming as 7 whereas it should be 6.
with cte as(
select *,
(case when flag ='I' then
lead(swipe) over(partition by empd_id order by swipe) end) as nextswipe
from clocked_hours)
select empd_id,
sum(datediff(hour, swipe, nextswipe)) as diffs
from cte
group by empd_id
select empd_id,
sum(case when flag='I' then timestampdiff(second,swipe,nex_time)/(60*60) end) as time
from (select * ,
lead(swipe)over(partition by empd_id order by swipe) as nex_time
from clocked_hours) as t
group by empd_id
Hello My solution in Sql Server:
with flo as (
select *, sum(case when flag='I' then 1 else 0 end) over(partition by empd_id order by swipe)as grp
from tbl
)
select distinct empd_id, sum(datediff(hour, min(swipe), max(swipe)))over(partition by empd_id)as hours_worked
from flo
group by empd_id, grp
Hope it helps.🙂
My solution
with cte as(
SELECT *,
DATEDIFF(minute, swipe, dd) AS time_diff
FROM (
SELECT *,
LEAD(swipe) OVER(PARTITION BY empd_id ORDER BY swipe) AS dd
FROM clocked_hours
) AS a
where flag ='I')
select empd_id , sum(time_diff)/60 from cte
group by empd_id
with i as (select empd_id,swipe, row_number() over (partition by empd_id order by swipe) 'in' from clocked_hours where flag='I'), o as (select empd_id,swipe, row_number() over (partition by empd_id order by swipe) 'out' from clocked_hours where flag='O') select i.empd_id,left(sum(timediff(o.swipe,i.swipe)),1) from i left join o on i.empd_id=o.empd_id and i.in=o.out group by 1;
Hi Ankit Thanks for the new videos
Please check this:
with cte as (
select *
, row_number() over (partition by empd_id order by swipe) as rn
from clocked_hours
where flag = "I"
),
cte1 as (
select *
, row_number() over (partition by empd_id order by swipe) as rn
from clocked_hours
where flag = "O"
)
select cte.empd_id
, round(sum(((time_to_sec(cte1.swipe) - time_to_sec(cte.swipe))/3600))) as clocked_hour
from cte
join cte1
on cte.empd_id = cte1.empd_id and cte.rn = cte1.rn
group by 1
my solution if you are trying in MYSQL/postgresql, because in mysql datediff( ) function will only take 2 arguments unlike datediff( ) in microsoft sql server (which takes 3 arguments) is as below -
with t1 as (
select *,
lead(swipe) over(partition by empd_id order by swipe) as logout
from clocked_hours
)
select empd_id, extract(hour from time(sum(logout - swipe))) as clocked_hrs
from t1
where flag = 'I'
group by 1
Hi Ankit,
I solved question with mentos solution on my own, but i tried with some modification. let me know your thoughts on this
SQL Query:
with cte as(
select empd_id,
case when flag='O' then lag(swipe,1) over(partition by empd_id order by empd_id) end prev_val,swipe from clocked_hours order by empd_id)
select empd_id,sum(timediff('hour',prev_val,swipe)) t_hrs from cte where cte.prev_val is not null
group by empd_id;
this was my approach slightly different and lengthy
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY empd_id, flag ORDER BY swipe) AS rn
FROM faltu.clocked_hours
),
intime AS (
SELECT *
FROM cte
WHERE flag = 'I'
),
outtime AS (
SELECT *
FROM cte
WHERE flag = 'O'
)
SELECT a.empd_id,
abs(SUM(TIMESTAMPDIFF(HOUR, b.swipe, a.swipe))) AS total_hours
FROM intime a
JOIN outtime b
ON a.rn = b.rn
AND a.empd_id = b.empd_id
GROUP BY a.empd_id;
I used case and lag
WITH clocked_hours AS (
SELECT *
,CASE WHEN flag = 'O' THEN swipe - LAG(swipe) OVER(PARTITION BY empd_id ORDER BY swipe) END AS time_diff
-- ,SUM(CASE WHEN flag = 'O' THEN (swipe - LAG(swipe) OVER(PARTITION BY empd_id ORDER BY swipe)) END) AS total_time
FROM clocked_hours
)
SELECT empd_id
,SUM(time_diff) AS total_time
FROM clocked_hours
GROUP BY empd_id
I used another method. Here, we create two CTEs one for clockout and another one for clockin. Then join the ctes on employee id and row number to get required hours.
with
o as (
select empd_id, swipe, row_number() over (partition by empd_id order by swipe) seq
FROM clocked_hours where flag = 'O'
),
i as (
select empd_id, swipe, row_number() over (partition by empd_id order by swipe) seq
FROM clocked_hours where flag = 'I'
)
select o.empd_id, sum(datediff(HOUR, i.swipe,o.swipe)) hours_in_office
from o inner join i on o.empd_id=i.empd_id and o.seq=i.seq
group by o.empd_id;
This is also good 😊
with cte as(
select *, lead(flag,1) over(partition by empd_id order by swipe) as nxt_flag,
lead(swipe,1) over(partition by empd_id order by swipe) as nxt_swipe
from clocked_hours),
cte2 as(
select *,
case
when flag='I' and nxt_flag='O' and nxt_swipe>swipe then datediff(HH,swipe,nxt_swipe)
else 0
end as diff
from cte)
select empd_id, sum(diff) as clocked_hours
from cte2
group by empd_id;
Sir kindly comment on this approach
with swipe_in as
(
select * ,row_number() over(partition by empd_id order by swipe) as rn
from clocked_hours where flag = 'I')
,swipe_out as
(
select * ,row_number() over(partition by empd_id order by swipe) as rn
from clocked_hours where flag = 'O')
select i.empd_id,sum(datediff(hour,o.swipe,i.swipe)) as time_spent from swipe_in i join swipe_out o on i.empd_id = o.empd_id and i.rn = o.rn
group by i.empd_id
with cte as
(select *,lag(swipe,1,swipe) over(order by empd_id) as prev_date from clocked_hours),
cte1 as
(SELECT empd_id,
CASE
WHEN flag='O' THEN TIMESTAMPDIFF(HOUR, prev_date, swipe)
END AS clocked_hours
FROM cte)
select empd_id, sum(clocked_hours) as clocked_in_time from cte1
group by empd_id
Superb explanation...
Question:
I have an employee table with empid, empname, deptid and salary columns.
Wanted to write a query to get the N highest salary in each department without using window functions. We can use sub queries or correlated sub queries. And N is any number we can pass dynamically. If I pass 2 it will give 2nd highest salary in each department. If I pass 5 it will give 5th highest salary in each department. Please help with sinple solution and explanation.
This is what you want in Sql Server:
declare @nth_sal int=2;
SELECT id, name, salary , dept
FROM emp e1
WHERE @nth_sal -1 = (SELECT COUNT(DISTINCT salary)
FROM emp e2
WHERE e2.dept=e1.dept and e2.salary > e1.salary);
Hope it helps.🙂
SELECT empd_id,SUM(CASE WHEN flag='O' THEN swipe ELSE -swipe END) from clocked_hours GROUP by empd_id will this work in all cases ??
with cte as(select *, lag(swipe,1) over(partition by emp_id) as prev_swipe
from clocked_hours)
,
cte2 as(select *,swipe-prev_swipe as diff
from cte
WHERE Flag='O')
(select emp_id,sum(diff) as total_hrs_cloced_in
from cte2
group by 1
ORDER BY 1)
sir please review my method
In a school there are 10 classes, each class is having 50 or many students.10 subjects are available and each student has to choose a minimum of 3 subjects.
Find out the top 3 students in each class and each subject.
I have one doubt with lead rather than lead(swipe,1) we can directly take lead(swipe) rather than giving offset this will also work fine right ?
By default offset is 1 so it will work. But it's a good practice to specify the value.
@@ankitbansal6 thanks for confirmation
My solution on first attempt:
looks less impactful then even your aam-zindagi solution 🤣
with sorted_tbl as (
select
empd_id,
swipe,
flag,
lead(flag,1) over (partition by empd_id order by swipe,flag) flag_2
from clocked_hours),
tbl_outtime as (
select
* ,
case when flag='I' and flag_2='O' THEN lead(swipe,1) over (partition by empd_id order by swipe,flag) end out_time
from sorted_tbl)
select
empd_id,
sum(DATEDIFF(hour,swipe,out_time)) clocked_hours
from tbl_outtime
where out_time is not null
group by empd_id;
My Solution in PostgreSQL
WITH A AS
(
SELECT
*,
ROW_NUMBER() OVER ( PARTITION BY empd_id, flag
ORDER BY
swipe DESC ) AS out_rno,
ROW_NUMBER() OVER ( PARTITION BY empd_id, flag
ORDER BY
swipe ASC ) AS in_rno
FROM
clocked_hours
)
SELECT
empd_id,
MAX(
CASE
WHEN
(
out_rno = 1
AND flag = 'O'
)
THEN
swipe
END
) - MAX(
CASE
WHEN
(
in_rno = 1
AND flag = 'I'
)
THEN
swipe
END
) AS hours_clocked
FROM
A
GROUP BY
empd_id
how can fresher apply for data engineering job at amazon with knowladge of Python, SQL, dsa, aws, hadoop and spark?
Check the job id which suits you here Amazon.jobs
Not a good one, but this is what i tried before watching your video.
with swipe_in as (
select empd_id, swipe as swipe_in, row_number() over ( partition by empd_id order by swipe) as rn
from clocked_hours
where flag='I'
),
swipe_out as (
select empd_id, swipe as swipe_out, row_number() over ( partition by empd_id order by swipe) as rn
from clocked_hours
where flag='O'
)
select sin.empd_id,
sum(DATEDIFF(HOUR, swipe_in , swipe_out)) clocked_hours
from swipe_in sin
inner join swipe_out sout
on sin.empd_id = sout.empd_id and sin.rn = sout.rn
group by sin.empd_id
Good attempt. Can be simplified though.
In a school there are 10 classes, each class is having 50 or many students.10 subjects are available and each student has to choose a minimum of 3 subjects.
Find out the top 3 students in each class and each subject.
Can you help me on this.
Top 3 on the basis of what ? Share some data.
SELECT empd_id, sum(diff)
FROM
(SELECT *,
out- in_i as diff
FROM
(SELECT empd_id,flag,lag_flag,
case when flag = 'O' then swipe else null END as out,
case when lag_flag = 'I' then lag_time else null END as in_i
FROM
(SELECT *,
--case when flag = 'O' then swipe else null END as out,
LAG(swipe) OVER(PARTITION BY empd_id ORDER BY swipe) AS lag_time,
LAG(flag) OVER(PARTITION BY empd_id ORDER BY swipe) AS lag_flag
from clocked_hours
)
)
)
GROUP BY 1
SIMPLEST ONE ***
select empd_id,
sum(case WHEN flag ='O' then swipe else 0 END)-sum(CASE WHEN flag = 'I' then swipe else 0 END) as hours
from clocked_hours
GROUP by empd_id
It won't work as there are multiple swipes in swipe out. And we need only the clocked hours .
@@ankitbansal6 Hi Ankit, It will work. because swipe out(So) time will always be more than swipe in(Si) and when we try to calculate total working hrds then : (So1-Si1) + (So2-Si2) = (So1+So2)-(Si1+Si2) : It will work even when there are more no of in-outs because all in-outs will be in ascending order and in pairs.
I used join to solve it but the lead function method is great too:
with
cte1 as
(
select
empd_id, swipe, flag, DENSE_RANK() over (partition by empd_id order by swipe) as rn
from [samdb].[dbo].[clocked_hours]
where flag = 'I'
),
cte2 as
(
select
empd_id, swipe, flag, DENSE_RANK() over (partition by empd_id order by swipe) as rn
from [samdb].[dbo].[clocked_hours]
where flag = 'O'
)
select
a.empd_id, SUM(DATEDIFF(hour, a.swipe, b.swipe))
from cte1 a inner join cte2 b on a.rn = b.rn and a.empd_id = b.empd_id
group by a.empd_id;
We could have just taken max and min of swipe for each emp_id and flag and then subtraction
Or by creating two different cte for each flag and then using row number for each flag with emp_id partition and then join back the 1st row and the subtraction
Note: I tried playing with this data in bigquery but 24 hour format does not works well with parse_time function
Any help will be appreciated 😊
By doing this you'll get the difference between total hours between first punch-in and last punch-out. It'll not encounter the time you were out, like for empd_id 11114 you'll get 7 hours instead of 6.
Oh yea it makes sense, that we need to keep gap in the mind. Thanks for the help
@@mohammaddanishkhan7288
WITH clocked_in_data AS(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY empd_id ORDER BY swipe) AS row_rnk
FROM clocked_hours
WHERE flag='I'),
clocked_out_data AS(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY empd_id ORDER BY swipe) AS row_rnk
FROM clocked_hours
WHERE flag='O'),
get_hours_diff AS(
SELECT
A.empd_id,
A.swipe AS in_time,
B.swipe AS out_time,
DATEDIFF(hour, A.swipe, B.swipe) AS hours_diff
FROM clocked_in_data A LEFT JOIN clocked_out_data B
ON A.row_rnk=B.row_rnk AND A.empd_id=B.empd_id)
SELECT
empd_id,
SUM(hours_diff) AS clocked_hours
FROM get_hours_diff
GROUP BY empd_id
Good real world question. Thanks Ankit
Here is my approach in MySQL, using lag, do let me know if it might miss any case.
select empd_id,sum(lag_swipe) as working_hours
from(
select *,
case when flag='O' then round((swipe - lag(swipe,1,swipe) over(partition by empd_id order by swipe))/10000,2) else 0 end as lag_swipe
from clocked_hours) a
group by empd_id
Good one !!
with cte1 as (
select *,
row_number() over() as cc
from clocked_hours
where flag = 'o'),
cte2 as (
select *,
row_number() over() as cc
from clocked_hours
where flag = 'I')
select i.empd_id,sum(round((o.swipe - i.swipe)/10000)) as time
from cte2 as i
join cte1 as o on i.cc = o.cc
group by empd_id
Tried to simplify it using CTE's and then rank filter
with cte1 as (select *, rank() over (partition by empd_id order by swipe) as inrank from clocked_hours
where flag='I'),
cte2 as (select *,
rank() over (partition by empd_id order by swipe) as outrank from clocked_hours
where flag='O' ),
cte3 as (
select cte1.empd_id,inrank,outrank,datediff(hour,cte1.swipe,cte2.swipe) as inbetweenhours from cte1 join cte2 on cte1.empd_id=cte2.empd_id
and inrank=outrank)
select empd_id,sum(inbetweenhours) as clocked from cte3
group by empd_id
using lead
with ech1 as (select *,lead(swipe) over (partition by empd_id order by swipe) as nextops
,lead(flag) over (partition by empd_id order by swipe) as nextswipe from clocked_hours)
select empd_id,sum(datediff(hour,swipe,nextops)) from ech1
where flag='I' and nextswipe='O'
group by empd_id
select empd_id,sum(latest_time) total_time from
(SELECT empd_id,swipe,lead(swipe) over(partition by empd_id order by swipe),
timestampdiff(hour,swipe,lead(swipe) over(partition by empd_id order by swipe)) latest_time,flag
FROM ankit_vedios.clocked_hours) a where flag="I" GROUP BY empd_id
using MYSQL
PySpark Approach and Solution Explanation video for this problem
ua-cam.com/video/CYewGmSPbx8/v-deo.htmlsi=pqRyTRBxb_HuGwaP
My answer without checking for the solution
---------------------------------------------------------------------------------------
select empd_id, SUM(duration) as clocked_hrs from (
select *,
case when flag='I' then DATEDIFF(hour,swipe,(LEAD(swipe,1) OVER (PARTITION BY empd_id ORDER BY swipe ))) end AS duration
from clocked_hours) t
group by empd_id;
---------------------------------------------------------------------------------------