pwc question and answer in sql | top interview questions and answer in sql |
Вставка
- Опубліковано 5 січ 2024
- In this video we will be solving very interesting problem which was asked in the "PwC interview" in SQL.
"pwc question and answer" in sql
"sql question and answer"
Create table Statement :
====================
CREATE TABLE pwc_attandance_log (
emp_id INT,
log_date DATE,
flag CHAR
);
insert into pwc_attandance_log(emp_id,log_date,flag) values
(101,'02-01-2024','N'),
(101,'03-01-2024','Y'),
(101,'04-01-2024','N'),
(101,'07-01-2024','Y'),
(102,'01-01-2024','N'),
(102,'02-01-2024','Y'),
(102,'03-01-2024','Y'),
(102,'04-01-2024','N'),
(102,'05-01-2024','Y'),
(102,'06-01-2024','Y'),
(102,'07-01-2024','Y'),
(103,'01-01-2024','N'),
(103,'04-01-2024','N'),
(103,'05-01-2024','Y'),
(103,'06-01-2024','Y'),
(103,'07-01-2024','N');
PySpark Solution for this problem : • pwc question and answe...
Need Help ? Connect With me 1:1 - topmate.io/dewithdhairy
Let's connect on LinkedIn : / dhirajgupta141
PySpark Installation and Setup : • Spark Installation | P...
DSA In Python Interview Series : • dsa for data engineer ...
PySpark Interview Series : • pyspark interview ques...
Pandas Interview Series : • pandas interview quest...
SQL Interview Series : • sql interview question...
#pwc #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion
Awesome solution😊
We could have also done it using Lead-Lag functions, these would make it little more easier!
Thanks for everything
Thanks,
Do share in your network 😉
with cte as
(select *,
row_number() over() as pw
from pwc_attandance_log),
cte1 as (select emp_id,log_date,flag,pw,
pw - (row_number() over() ) as ps
from cte
where flag = 'Y'),
cte2 as (select *, count(ps) over(partition by ps) as vv
from cte1 )
select emp_id,min(log_date),max(log_date),vv as days
from cte2
where vv >= 2
group by vv,emp_id
order by emp_id
If a employee attend 1-05-2024, then again login 1-06-2024. Then what will happen?
we can do this with one cte
with cte as (
select emp_id,log_date,flag,date-rn as group_val from (
select *,ROW_NUMBER() over (partition by emp_id order by emp_id) as rn
,DATEPART(MONTH,log_date) as date
from pwc_attandance_log where flag = 'Y'
) x
)
select emp_id,MIN(log_date) as start_date, MAX(log_date) as end_date ,COUNT(1) as con_date
from cte group by emp_id,group_val having count(1) > 1
But if some one login on 31sth of the month then next month 1st day then day logic will be 31 and 01. These two days will not come under same group
with cte as
(select *, row_number() over(partition by emp_id order by log_date) as r,
DAY(log_date) AS "DAY", cast(DAY(log_date) as int) - row_number() over(partition by emp_id order by log_date) as diff
from pwc_attandance_log
where flag='Y')
select emp_id,min(log_date) as start_date, max(log_date) as end_date,count(diff)
from cte
group by emp_id,diff
having count(diff) >=2
order by emp_id;
What if the consecutive date are 31&1 ?
I think in that case this solution will not work.
Thanks for the video.
I have one doubt . Lets say we have log_date as 02-01-2024 and the next date is 03--02-2024.
If we take day part out of them we will still get 2 and 3
Thanks if the data Is like that then logic remains the same we just need to apply the group clause on the year and month also then to make it unique.
Got it. Thanks for the response.@@DEwithDhairy
@@DEwithDhairyCan we take difference between current date and date column
@@apk1999 I don't think it will work.
What if the date contains value like 31JAN and 01FEB as consecutive days, in that case it will give wrong group_ value.
I think we can go for "day-of-year" in that case & then we can do the grouping.
Yes, instead of taking month from date we should take difference between current date and date column
can anyone provide solution using lag functionality
using one cte
with cte as(
Select * ,day(log_date) - row_number()over(partition by emp_id order by log_date) as diff_ ,
row_number()over(partition by emp_id order by log_date) as rn,
day(log_date) as day_
from pwc_attandance_log
where flag='Y')
select emp_id,diff_, min(log_date) as start_date ,max(log_date) as end_date,count(*) as consucatie_date
from cte group by emp_id, diff_
having count(*)>1
Great 😃
Gonna try this
The Solution seems to have one issue when the dates are from different month .
My solution :
WITH CTE AS (select *,LAG(log_date) OVER(partition by emp_id order by log_date) as previous_log_date from pwc_attandance_log where flag='Y' )
,PQR AS (select *,CASE WHEN previous_log_date-LAG(previous_log_date) OVER(partition by emp_id order by previous_log_date)>1 then 1 else 0 END as group_no from CTE where DATE_PART('day',log_date)- DATE_PART('day',previous_log_date)=1 )
,XYZ AS (select *,SUM(group_no) OVER(partition by emp_id order by emp_id,previous_log_date) as g1 from PQR)
,last_1 as (select emp_id,g1,count(*)+1 as cnt ,min(previous_log_date) as min_date ,max(log_date) as max_date from XYZ group by emp_id,g1)
select emp_id,cnt as consecutive_days,min_date as start_date,max_date as end_date from last_1;