I think the below one also works.
with cte as
(select *,
row_number() over(partition by buckets) as rn
from(
select *,
ntile(3) over(partition by distance) as buckets
from src_dest_distance))
select source,destination,distance
from cte
where rn
@@kartikeytyagi9119 in the given query two state names are repeated vice versa and i need to fetch the row which was inserted first. if i create buckets, i can get three buckets for three pairs of states, each bucket will have two rows with vice versa state names and if i apply row number for each bucket, i can be able to get first inserted rows(by using rn
So clear ! Thank you Thoufiq. Keep it coming man. More to watch and learn.
I think we can simply write the query using least , Greatest funtions (in Oracle Database)
SELECT DISTINCT
least(SOURCE, DESTINATION) SOURCE,
greatest(SOURCE, DESTINATION) DESTINATION,
DISTANCE
FROM table1;
It will give the desired results
Thanks for sharing these functions. I'm trying to think through the logic of the code to see what the output would be. I'm not sure but it looks like your code will perform 6 sets of comparisons returning 6 rows of data. How do you reduce it down to the 3 rows desired?
@@olu0mg
Least function will always select the first word based on alphabetical order
You may watch some videos on these functions, it is quite easy.
And As we are using 'Distinct' in the query above, it will remove the duplicate records and return only 3 rows
desired output is hyd, delhi, pune in destination column
above query gives hyd, mumbai, Pune as output
Thanks Taufiq, I think visualization of output is necessary in such scenarios
Best of the Best Techtfq, Awesome delivery consistency. great job ciao
Explanation in Excel did the whole trick with focus on ID column, thanks a lot sir
A self anti semi join sounds easier to me:
select *
from src_dest_distance a
where not exists
(select 1 from src_dest_distance a where a.source=b.destination and a.destination=b.source and b.source
I really enjoyed how you explain the win func and make easy for us to understand
Great work and delivery 🎉❤
Very clean and crisp explanation..Thanks Taufiq
thanks for your efforts Toufiq! Jazakallohu hairon.
I guess this can be also one more approach with greatest and least function:
select distinct greatest(source, destination) as source, least(source, destination) as destination, distance from src_dest_distance;
Appreciate it. Could you tell me, is it MySQL or SQL server or others ? Thank you
Thank you for this alternative. I was thinking about something similar. This approach to me is more clever and avoids using the row number, which I feel is a bit hacky (even if it works)
It's great to see tutorials like this posted. However, it should be noted that the solution presented only works for a very specific dataset. Two potential failure cases come to mind:
1) Adding the city pair Bangalore Chennai will fail. As others have noted, the fix for this is to make the where clause check both T1.Source = T2.Dest and T1.Dest = T2.Source
2) It assumes that every city pair shows in both orders. If a city pair is only listed in one direction, it won't match on the self join.
Others have mentioned using least() and greatest(). Not just an alternative, but using such functions solves both of the problems noted above:
SELECT DISTINCT GREATEST(source, destination) AS source, LEAST(source, destination) AS destination, distance FROM cities;
Note: tested with MariaDB, other engines may vary.
This has to be one of the best IT channels out there.
For this data set it's fine but it would be cool to incorporate the possibility of the same city showing up in a different pair.
I guess you could just also add the condition that t2.source = t1.destination
I wouldn't have done it like that but it is a quite neat solution. Thanks.
My senior @jinal used to explain the data in the same way how you do here...loved your presentation.
Thank you TFQ....
This question is asked by me.
Thank you for your reply 🙏 it helps a lot to others as well.
Very Nice explanation 👌 thank you 😊
Actually @techTFQ it would be better if you add t1.dest = t2.source bcz in the sample data we don't have repetition of source but that is mostly possible to uniquely identify we could add this condition
Also creation of ID is not necessary as text can also be compared directly, here source or destination
@@yaminurrahmantopiwala6207 it is required, query isn’t giving required output without id column instead it’s giving whole data
@@aakriti_100 I still disagree. You share your sample data and I will give you perfect query without redundancies
You are a true gem for data community ❤️
you have one of the best channels for practicing SQL queries...great going
Nice explanation Taufiq 👌 👍 👏
well laid out..thanks
thank you bro.........can you do more videos on interview point of view
Nice Presentation.. Keep it up..
Hi sir,
Can you please make a video on how to update data of one table from another table using the merge concept.
It will be very helpful for me.
Looking at the output, it is a alternative row of the input table. In this instance, I believe we can divide the row number by 2. Where ever the value is not equal to zero, the result is the output table.
Select *
From (Select *, row_number() over () as row
from input_table) t1
where row % 2 0
approach for specific dataset
select * from src_dest_distance
where source
we can solve it through not exists operator also right.
with cte as
(select *, row_number() over as id from src_dest_distance)
select * from cte AS t1 where not exists (select * from cte AS t2 where t1.source = t2.destination and t1.destination = t2.source and t1.id>t2.id);
Thanks bro
great explanation ..
We can also run a simple query
select * from src_dest_distance
where source > destination
I would prefer source < destination, that way the source comes first alphabetically.
But neither will return the desired data, where it seems to want to use the first trip in the table.
Great solution, if the source and destination are not strict, we can do simple trick-
select * from src_dest_distance where source
@@jacksparrow3595 You can certainly compare strings in SQL. Protap’s suggestion is the best solution to the stated problem. It’s extremely simple, gives correct results, and even has the benefit that the selected rows will always have the two endpoints in alphabetical order, which, if the actual source were quite long, would make the output easier to use.
Hey
My solution for the same is
;with cte as (
select *,
case when source destination then source else destination end as destination1
from src_dest_distance)
select source1,destination1,max(distance)
from cte
group by source1,destination1
We can also use lead function in Table and then comparing in with the 1st Column. That will result the Unique output.
Could you not in this case also do a bitwise OR of the two fields and select distinct?
select * from src_dest_distance where source < destination;
Again an awesome video with great explanation. Just one suggestion to the join condition. Should we include T1.source = T2.destination and T1.destination = T2.source and T.id < T2.id so that we match only the required rows. The condition T1.source = T2.destination and T.id < T2.id may join non required rows as well which is not present in your data example. Like Bangalore -> Hyderabad and Chennai -> Bangalore. Let me know your thoughts. Thanks
Yes you need that condition T1.destination=t2.source and also you can replace t1.id
agree, the best answer sees beyond the data given to the broader nature of the data and request, and provides a solution that also is robust for handling future edge cases. This is "find all the unique trips" not just "return these 3 specific rows"...
I'd add on that doing an inner join is less ideal for this same reason. It assumes that a "return trip" record will always be present to pair against.
Instead, using a left join with the filter condition placed in the WHERE clause (WHERE T2.id is null) would better handle the potential situation of an unpaired entry down the road. Retain a record when no "return trip" match is found is more robust, assuming that "find all the unique trips" is the mission.
@@mudassirasaipillai6584 That does not avoid duplicates. That will gives the same duplicate values.
For any city pair, you will have 2 rows, they will have ids of id1 and id2 which will be different, so one will be larger than the other; and they will have a different source.
Then for the join in one case you will have T1.id < T2.id, and in the other case you have T1.id > T2.id.
By having one of those as the condition you will only get one of the pair.
But having t1.source t2.source, the condition will be true in both cases and both directions of the journey will be returned.
Does this query work if we later, add another row with source Bangalore but a different destination? We are only checking that Bangalore is present in both columns, but not that they have the same corresponding source.
You're assuming you're getting sorted pairs, can you do it with unsorted source and no ID?
Amazing 👏 keep it up 😇😇
Hi Toufik,
Can you make a video on "case-insensitive pattern matching in PostgreSQL". I recently faced this issue when using wildcard, unlike mySQL Postgre isn't case-insensitive.
Thanks for the resources & study and practice materials you provide, they are very helpful.
Helpful.
Fabulous
What if you have two paths to the same destination from 1 city?
Here is my alternate solution: use append to combine start and destination cities, then use append to combine destination and start cities. Now you have a pair of "unique" (for a given pair of cities) identifiers. Using the same self join and row number, check if start-destination in destination-start before current row.
We can concatenate both the columns and on that column we can get the duplicates rt?
This might sound daft, but given the problem/solution set out at 2:20 why not just SELECT * FROM INPUT WHERE SOURCE IN ('Bangalore','Mumbai','Chennai');
This is what I came up with before I watched your solution:
select src, dest, dist
from cities c1
where (
select count(*)
from cities c2
where c2.src = c1.dest
and c2.src < c1.src) = 0;
This is an easier solution @techTFQ
with cte as (select *, lead(destination,1,destination) over() as sc1 from src_dest_distance)
select source, destination, distance from cte where source = sc1;
Why not filter the original input for SOURCE < DESTINATION? That would eliminate one of the 2 records, and would be much more efficient than a self join...
yep, although it wouldn't change the results, just make sure to use lower function on both sorce and destination to avoid headaches down the line when someone introduces "delhi" and someone's script breaks and nobody knows why
I think wasy way to remove the duplicate records in the table is to use the union set operator.
Can't we write source> destination instead of complex join,row_ num
Hi I think this can be done by using row number function and selecting the odd rows will give the desired output?
This would do the job
SELECT DISTINCT IIF(source > destination, source, destination) AS source, IIF(source < destination, source, destination) AS destination, distance
FROM src_dest_distance
with cte as (
select (case when sourcedestination then source else destination end) as destination,distance
from src_dest_distance ) ,
cte2 as (
select source,destination,distance, row_number()
over(partition by source order by distance) as rnk from cte )
select * from cte2
where rnk = 1
I think we simply do like below.
Select * from tablename
Where source > destination
I tried to run it on Oracle SQL Developer, and I got an error : "FROM keyword not found where expected" ? I also run it at Google's BigQuery, the result is "there is no data to display". Can anyone help me with that? Thanks!
Can we represent or operator with || in SQL
Sir i hv one question. Syntax kaise likhe ..i mean basics may kuch sikhate hay or yaha pe or kuch likhte hay
hi everything is excellent, but you mentioned the concept used in the title, so anyone who saw this video didn't get the opportunity to think on their own coz as the concept used is already mentioned, but you are doing a great job, to data community
very valid point. Hence I have just renamed the title to remove it so it help the future viewers.
What if we just take the odd rows.
Select * from (Select *,row_number() over() as id from table) where (id&1)=1;
I would not hire you if you gave that answer. It solves the problem one time only, based on assumption about the input data. What if the order of input rows was randomized?
select s.source,s.destination,s.distance from src_dest_distance as s
join
src_dest_distance as s1
on s.source=s1.source and ascii(s1.source)>ascii(s.destination)
Suppose there are another data also bangalore to mumbai and mumbai to bangalore. Then how join will be helofull because bangalore will be maped with two data.
Why can't we go for inner join
please share another way of doing that?
select distinct greatest(source,destination),least(source,destination),distance from src_dest_distance
IT IS NOT WORKING IN ORACLE WITH OUT OVER() CLAUSE
Hi, could please give solution for SQL server as well..
getting issue -
The function 'row_number' must have an OVER clause with ORDER BY.
Also dont think this solution will work for all type of combo as below -
can please check once?
insert into src_dest_distance values ('Bangalore', 'Hyderbad', 400);
insert into src_dest_distance values ('Hyderbad', 'Bangalore', 400);
insert into src_dest_distance values ('Bangalore', 'Kolkata', 500);
insert into src_dest_distance values ('Kolkata', 'Bangalore', 500);
insert into src_dest_distance values ('Mumbai', 'Delhi', 400);
insert into src_dest_distance values ('Kasmir', 'Mumbai', 1000);
insert into src_dest_distance values ('Delhi', 'Mumbai', 400);
insert into src_dest_distance values ('Mumbai', 'Kasmir', 1000);
insert into src_dest_distance values ('Chennai', 'Pune', 400);
insert into src_dest_distance values ('Chennai', 'HYD', 100);
insert into src_dest_distance values ('Pune', 'Chennai', 400);
insert into src_dest_distance values ('HYD', 'Chennai', 100);
I think my solution is much simpler and therefore more understandable:
select source c1, dest c2, dist from input where sourcedest
Another Simpler way to solve this :
with xyz as (
select * , lead(destination) over() as LD
from src_dest_distance )
Select source , destination , distance from xyz where source = LD ;
Hi,
My approach to this Question
With cte as
(Select *,ROW_NUMBER() over (order by Distance) as RN from Distance)
Select Source,Destination,Distance from cte where RN %2 =1
I got it output....but is the approach right??
But it won't match if the input table is scrambled. Like delhi to mumbai is not just below mumbai to delhi
Hi sir,
I am trying to solve in sql server but its getting following error. Then which col should i consider for order by clause
Msg 4112, Level 15, State 1, Line 147, The function 'row_number' must have an OVER clause with ORDER BY.
I also got the error even though after i gave order by clause i am still facing that error.
select distinct s,d,distance from
(select *, case when source >destination then source else destination end s,
case when source < destination then source else destination end d
from src_dest_distance)x
one doubt. what happens if data is not on this order
with cte as (
select * , lead(destination) over(order by distance) as Lead_dest
from src_dest_distance )
Select source , destination , distance from cte where source = Lead_dest ;
We can solve it by where function too and much more easier and faster
Easier:
select
source as destination,
destination as source,
distance
from input
union
select
destination,
source,
distance
from input
Explanation: union will automatically remove duplicate rows ;)
Hi sir I know SAS, sql, python, vba, excel guide to get online work I'm unemployeee since a decade
Take a shot every time he says OK, challenge.
I'm trying same but it's shows error
with cte as(
Select *,row_number() over(order by distance ) as rn
from src_dest_distance),
cte2 as (Select *,row_number() over(order by distance ) as rn
from src_dest_distance)
select c1.source,c1.destination,c1.distance,c1.rn
from cte c1,cte2 c2
where c1.rn
Select hash(destination) + hash(source) unique, first(destination), first(source), max(distance)
From table
Group by hash(destination) + hash(source);
is below query work ?
select * from src_dest_distance
where source > destination
Done
Table name assumed - Location
WITH cte as
(SELECT * , LAG(Source,1,0) OVER() as comp
FROM Location)
SELECT Source,Destination,Distance
FROM cte
WHERE Source NOT IN (SELECT Source
FROM Location
WHERE Destination=comp)
There is duplicate name scott 3 time, smith 2 time so.... On.... i want to update like scott1, scott2, scott3, smith1, smith2 so on..... How to update in table without copy with other table
You should use a union which automatically removes duplicates, just inverse the destination and source while unioning. Far lighter, easier above all much much faster.
I love the way bring content and training to the people but I noticed you are really record focused, try to solve issues in sets of data this will eliminate a lot of headache with growing data.
with cte as(
select src_dest_distance.*,ROW_NUMBER()OVER() as "x" FROM src_dest_distance
)
select source,destination,distance from cte where x%2!=0;
Select source , destination, distance from table
Union
Select destination as souce, source as destination, distance from table
? Is it possible ?
OK
with cte as(
select source,destination,distance,row_number() over(order by distance) as row from src_dest_distance)
select source,destination,distance from cte where row in (1,3,5)
This Query also gave the same result. Is that correct?
No, as in the example the distances are all the same, the order is more or less random. With real distances it would be still a problem, because it is possible that two different pairs of cities have the same distance.
Taufiq, what if the entry is like:
bangalore hyderabad 400
delhi bangalore 1200
will this condition not treat these as duplicate records ?
Can we use the condition: T1.source=T2.destination and T1.destination=T2.source ?
select source,destination,distance From (select *,
LEAD(Source,1,Source) OVER() as Source1,
LEAD(Destination,1,Destination) OVER() as Destination1
FROM src_dest_distance) as a
WHERE source=destination1 AND destination=source1;
Is that correct please check?
if you can use row numbering and sub-clauses, just do:
select *
from
( select *, row_number() as rid
from table)
where modulo(rid,2) = 0
And if modulo not allowed just do a diff even check, as wonky as:
round(rid/2) = rid/2
Solution is written in the video header. This spoils the opportunity of trying to solve the question.
got it. Hence I have just renamed the title to remove it so it help the future viewers.
Write a sql query on input table is like Item, no_items Apple, 8 Potato, 4 Banana,6 Tomato,2 And Output table should be like Item, sum_item Vegitables, 8 Fruits, 14
Problem with MS Sql is it wont let you create row_number() without putting an order by in the over() clause. I solved like this:
with CTE as
(
select t.*, row_number() over( partition by distance order by distance) as [id]
from travel_routes t
)
select source, destination, distance from CTE where id=1
May be the CTE is overkill, but for someone who might find this useful.
Thank you for the Video. This SQL only works if there are always two sets of records (A->B, B->A). If there is only a single source - destination pair (S->D) it will not be part of the result.
I just learned JOIN today in SQL and this was a great lesson to add to my earlier lesson. I also liked the addition of the ID column. Thank you!