This is the greatest visualization of nearest neighbors I've ever seen. You sir, are a blessing beyond words, I'm not religious but I might as well be after watching your video. Thank you.
I came to you from 2 independent great courses not understanding a specific scenario where they ignore an entire section without clarifying the reason but you explained it extremely better than both; Well done! Thank you :)
This is an interview question. How to design an algorithm for a flight computer to find closest airport during the flight in an emergency? Well explained along with lucid diagrams... Thanks for sharing your knowledge...
Nice explanation, I have a question though. Since this is not a balanced binary search tree, the worse case is still O(n) right? Or is there a way to use balancing techniques (like in a treap) with kd trees?
Excellent question and you are right on the money - KD-Tree has the potential to be heavily unbalanced, which results in O(n) search. The simplest approach to ensure a balanced (statistically speaking) tree that I could think of, in the case where all of the points are specified at the start, is to just shuffle the points before constructing the tree. But there are more sophisticated techniques that pick a good median value on each insertion. Also check out KDB-Trees (en.wikipedia.org/wiki/K-D-B-tree) they are balanced by design. Alternatively, if you have lots of inserts and deletes to a KD Tree, you could keep track of the number of such operations and then just rebuild the whole tree after some threshold. I am curious if there are better approaches - let me know what you find!
How does this compare to quad-trees? I’ve always wondered why they chose 4 partitions instead of 2. KD-trees seem the closest to this idea, but still a bit different.
so if this was a 3D tree we would have to check potentially all 4 quadrants the neighbour might be in ? and if i had a 10D tree I would have to check 2^9 quadrants ?
Good question. The tree nodes still have only two child nodes, even if you have 10 properties per node. Take a look at the source code - the distSquared(point0, point1) uses all properties (3, 10, whatever) but since there are only two child nodes, the plane is always split into just two. I hope this answers your question but let me know if I can clarify further.
nextbranch or otherbranch can be null, then temp will be null, then we put null in the function that gives us the closest point, so we put null instead of the point. seems like a problem of this pseudocode to me. What do we do in this situation? Just output as best the non null one or we go to the other branch?
First of all Very nice explanation , Do you have any video or code reference on Designing and implementing a binary space partitioning (BSP) tree with functions for adding nodes, printing the nodes in descending order and searching for the spatially closest node (along the one dimension)?
@@shivamverma-mt6kp OK, here is a quick rundown: so you have a tree node, with left and right pointers (I called it KDNode). The "value" in that node could be a vector/array of integers, or encapsulated in some other object that just has that array/vector. I went with the second option, calling it KDPoint. class KDNode { KDNode left KDNode right KDPoint point ... } class KDPoint { List props // this could just be an array of int ... } // this is your main class with the algorithm to find the nearest neighbor: class KDTree { // reference to the root is all we need KDNode root // main method, described in the video: private KDNode nearestNeighbor(KDNode root, KDPoint target, int depth) {
if (root == null) return null;
KDNode nextBranch = null; KDNode otherBranch = null; // compare the property appropriate for the current depth if (target.get(depth) < root.point.get(depth)) { nextBranch = root.left; otherBranch = root.right; } else { nextBranch = root.right; otherBranch = root.left; }
// recurse down the branch that's best according to the current depth KDNode temp = nearestNeighbor(nextBranch, target, depth + 1); KDNode best = closest(temp, root, target); long radiusSquared = distSquared(target, best.point); /* * We may need to check the other side of the tree. If the other side is closer than the radius, * then we must recurse to the other side as well. 'dist' is either a horizontal or a vertical line * that goes to an imaginary line that is splitting the plane by the root point. */ long dist = target.get(depth) - root.point.get(depth);
if (radiusSquared >= dist * dist) { temp = nearestNeighbor(otherBranch, target, depth + 1); best = closest(temp, best, target); } return best; } // just picks the better option between n0 and n1, but does not traverse anywhere else KDNode closest(KDNode n0, KDNode n1, KDPoint target) { if (n0 == null) return n1; if (n1 == null) return n0; long d1 = distSquared(n0.point, target); long d2 = distSquared(n1.point, target); if (d1 < d2) return n0; else return n1; } ... } I believe this should be very close to C++ code. I hope this helps!
what if the data has more than 2 dimensions, let's say 3? in the first layer if the chosen dimension is x what will be the dimension chosen in the second layer and third?
Thanks for posting a question. KD-Tree sequentially goes through each of the dimensions and then loops back. For example, if there are 3 dimensions, then it'd go in this sequence: 1, 2, 3, 1, 2, 3, 1, 2, 3, ... Typically this is easily implemented using MOD operator and the depth of the tree. So as you traverse deeper and deeper into the tree, you pass the 'depth' counter along. Then for a K dimensional tree, you'd pick dimension depth % K. Take a look at the source code linked in description for a sample implementation. Hope this helps!
Thanks for the compliment. Deleting a node is not trivial, since you would have to either re-index all of the children nodes recursively or come up with some other scheme. Here are a couple from the top of my head: -just mark the node as deleted but not actually remove it. Then rebuild the tree once there are too many of these fake-deleted nodes. -move one of the leaf nodes in places of the deleted node
hi, thank you! I just ask myself what i need to do if i want to find not only my nearest neighbour but for example 5 or 6 nearest neighbours. Thanks :)
Sure, that would be a natural extension to this problem. I think the most intuitive solution would be to use an additional max-heap into which you keep adding points. Restrict the size of the heap to K (5 or 6 in your example). Whenever you add a point to the heap and the size becomes greater than K, remove the top element. The top element will be the one with the greatest distance. So ultimately you'll end up amassing K points with the shortest distance.
@@nchm2625 If you are coding in Java, take a look at PriorityQueue class - it's Java's heap implementation. It can do exactly what you are looking for.
@@nchm2625 Try searching for "k nearest neighbors using kdtree", such as: stackoverflow.com/questions/34688977/how-do-i-traverse-a-kdtree-to-find-k-nearest-neighbors
@@Артем-х7п6с still not, One stupid method is like delete the first node that you have found from the tree and then go for first node again, it will be your second.
50 min lecture material wrapped in a 7 min video. Honestly amazing!
In 6 and a half minutes, you managed to teach me the fundamentals of something I thought would take a lot of time. Amazing, thank you!
This is the greatest visualization of nearest neighbors I've ever seen. You sir, are a blessing beyond words, I'm not religious but I might as well be after watching your video. Thank you.
Thanks for such a great compliment! I am happy to hear that the video resonated with you so much!
Same here. I have never seen such teacher. He is GOD level.
Awesome tutorial on KDtree data structure! The nearest neighbor search algorithm is easy to follow here. Thanks, keep em coming!
Thanks for the compliment! More tutorials are on the way =)
One of the most useful searching algo in data science
Definitely a good one to know. I have seen it pop up in interviews as well.
This video hits all the right notes: short, useful and clear. Keep up the excellent work!
This tutorial was very precise, direct and comprehensive and it really helped me understand how to use kd trees. Thank you so much sir!
You are very welcome!
I came to you from 2 independent great courses not understanding a specific scenario where they ignore an entire section without clarifying the reason but you explained it extremely better than both; Well done! Thank you :)
thanks for leaving a few good words!
This cleared up k-d Trees for me. Thanks for putting up the video.
Great visualization of a complex concept
Concise and yet very informative. Enjoyed. Thanks!
Thanks for the compliment! Glad to know that it was helpful.
simple ,short and precise. Awesome
This is an interview question. How to design an algorithm for a flight computer to find closest airport during the flight in an emergency? Well explained along with lucid diagrams... Thanks for sharing your knowledge...
Wow !new episode updated , i already cannot wait to learn it!thank you so much .
=) I hope you liked it!
Fantastic explanation and visualisation, thank you so much! :)
Thanks for leaving a compliment!
I love this channel please upload more
Thank you
Nice explanation, I have a question though. Since this is not a balanced binary search tree, the worse case is still O(n) right? Or is there a way to use balancing techniques (like in a treap) with kd trees?
Excellent question and you are right on the money - KD-Tree has the potential to be heavily unbalanced, which results in O(n) search. The simplest approach to ensure a balanced (statistically speaking) tree that I could think of, in the case where all of the points are specified at the start, is to just shuffle the points before constructing the tree. But there are more sophisticated techniques that pick a good median value on each insertion. Also check out KDB-Trees (en.wikipedia.org/wiki/K-D-B-tree) they are balanced by design.
Alternatively, if you have lots of inserts and deletes to a KD Tree, you could keep track of the number of such operations and then just rebuild the whole tree after some threshold.
I am curious if there are better approaches - let me know what you find!
Great video! Thank you for making math visible :)
Your code saved me.
Good to hear it!
Really nice explanation, thank you!
Thanks for this video. Could we adapt the algorithm to not get only the closest point but rather all points under a certain distance ?
+++
Nice vid and nice animation! Short and digestible😃
How does this compare to quad-trees? I’ve always wondered why they chose 4 partitions instead of 2. KD-trees seem the closest to this idea, but still a bit different.
Amazing tutorial. Why the point (2, 6) splits the space into up and bottom and not left and right as the root node
(5, 4) did please?
Thanks for your clear explanation!
Glad you liked it! Cheers!
so if this was a 3D tree we would have to check potentially all 4 quadrants the neighbour might be in ? and if i had a 10D tree I would have to check 2^9 quadrants ?
Good question. The tree nodes still have only two child nodes, even if you have 10 properties per node. Take a look at the source code - the distSquared(point0, point1) uses all properties (3, 10, whatever) but since there are only two child nodes, the plane is always split into just two. I hope this answers your question but let me know if I can clarify further.
Thank you from UC Berkeley!
Thank you, I think it solved my problems with making kd-tree and searching it.
Great video. Especially due to explaining r'.
Thanks for the compliment! Yeah, the radius concept is the tricky part.
Thank you very much for making this video !
Nice and clear explanation, thanks!
Thanks for the compliment! Cheers!
Amazing explanation!!!
Thanks for the compliment! Glad to hear that you enjoyed it!
Did you notice the corn cob in a plastic pot? Some parsley would have worked better. Great video btw
I'll try parsley in my next video =)
Thank you for great explanation !
You are very welcome!
just an amazing explanation! thanks so much!
Great explanation !! thanks
0:44 if all coordinate shares same x-coordinate then it becomes easier as we only need to check at most 2 points.
thaks dear i cant believe that what a amazing way to explanation it is a kind of unblievable again thank you sir and a lot of love
how come the point (13,3) is closer to (9,4) than (8,7)? and because of that we never traverse to the point (10,2), why does this happen?
nextbranch or otherbranch can be null, then temp will be null, then we put null in the function that gives us the closest point, so we put null instead of the point. seems like a problem of this pseudocode to me. What do we do in this situation? Just output as best the non null one or we go to the other branch?
Hi. Your videos are great. Do you have any other new channel or web page to follow you? Thanks.
Nice explanation. Thanks!
Cheers!
Awesome!❤️ Can you please make videos on B and B+ trees?
Thanks for the compliment and a great suggestion. I'll add B and B+ trees to the "to do" list =)
Great video, thank you for doing it!!
Thanks! I am glad to hear it!
Can you make a video on R-Trees and Multi-Dimensional Indexes?
That's a great topic and it'd definitely fit with the current discussion of KD-Trees. Thanks for suggesting it - adding it to the list!
Great explaination. Took my prof 1h and not even half as clear
8-) Thanks!
Well done, great explanation!
Thanks for the compliment!
Excellent lesson! Thank you!
Thanks!
What about N_Nearest_Neighbors? The logic for Nearest_Neighbor only works for the first.
First of all Very nice explanation , Do you have any video or code reference on Designing and implementing a binary space partitioning (BSP) tree with functions for adding nodes,
printing the nodes in descending order and searching for the spatially closest node (along the one dimension)?
Nice
And good visualization with the radius r
Great video! Thanks!
Cheers!
Beautiful explanation. Thank you. )))
Excellent! Thank you so much!!
que facilito de explicar. so good
amazy,thinks~ expect the next
You are very welcome!
Great video.
Please post the code in c++ or complete pseudo code.
I do have the full source code in Java: bitbucket.org/StableSort/play/src/master/src/com/stablesort/kdtree/KDTree.java
@@stablesort Data structures in java is another trouble for me.
I do data structures in c++
Please give the pseudo code atleast.
@@shivamverma-mt6kp OK, here is a quick rundown:
so you have a tree node, with left and right pointers (I called it KDNode). The "value" in that node could be a vector/array of integers, or encapsulated in some other object that just has that array/vector. I went with the second option, calling it KDPoint.
class KDNode {
KDNode left
KDNode right
KDPoint point
...
}
class KDPoint {
List props // this could just be an array of int
...
}
// this is your main class with the algorithm to find the nearest neighbor:
class KDTree {
// reference to the root is all we need
KDNode root
// main method, described in the video:
private KDNode nearestNeighbor(KDNode root, KDPoint target, int depth) {
if (root == null) return null;
KDNode nextBranch = null;
KDNode otherBranch = null;
// compare the property appropriate for the current depth
if (target.get(depth) < root.point.get(depth)) {
nextBranch = root.left;
otherBranch = root.right;
} else {
nextBranch = root.right;
otherBranch = root.left;
}
// recurse down the branch that's best according to the current depth
KDNode temp = nearestNeighbor(nextBranch, target, depth + 1);
KDNode best = closest(temp, root, target);
long radiusSquared = distSquared(target, best.point);
/*
* We may need to check the other side of the tree. If the other side is closer than the radius,
* then we must recurse to the other side as well. 'dist' is either a horizontal or a vertical line
* that goes to an imaginary line that is splitting the plane by the root point.
*/
long dist = target.get(depth) - root.point.get(depth);
if (radiusSquared >= dist * dist) {
temp = nearestNeighbor(otherBranch, target, depth + 1);
best = closest(temp, best, target);
}
return best;
}
// just picks the better option between n0 and n1, but does not traverse anywhere else
KDNode closest(KDNode n0, KDNode n1, KDPoint target) {
if (n0 == null) return n1;
if (n1 == null) return n0;
long d1 = distSquared(n0.point, target);
long d2 = distSquared(n1.point, target);
if (d1 < d2)
return n0;
else
return n1;
}
...
}
I believe this should be very close to C++ code. I hope this helps!
awesome video!
Thanks for the compliment!
Well, great work.
Cheers!
excellent !
Great content!
what if the data has more than 2 dimensions, let's say 3? in the first layer if the chosen dimension is x what will be the dimension chosen in the second layer and third?
Thanks for posting a question. KD-Tree sequentially goes through each of the dimensions and then loops back. For example, if there are 3 dimensions, then it'd go in this sequence:
1, 2, 3, 1, 2, 3, 1, 2, 3, ...
Typically this is easily implemented using MOD operator and the depth of the tree. So as you traverse deeper and deeper into the tree, you pass the 'depth' counter along. Then for a K dimensional tree, you'd pick dimension depth % K.
Take a look at the source code linked in description for a sample implementation. Hope this helps!
Great video 👍🏻 could you please upload the code for deleting a node from the tree?
Thanks for the compliment. Deleting a node is not trivial, since you would have to either re-index all of the children nodes recursively or come up with some other scheme. Here are a couple from the top of my head:
-just mark the node as deleted but not actually remove it. Then rebuild the tree once there are too many of these fake-deleted nodes.
-move one of the leaf nodes in places of the deleted node
thanks for explanation in video. Please dont add background music, its distracting and annoying.
hi, thank you! I just ask myself what i need to do if i want to find not only my nearest neighbour but for example 5 or 6 nearest neighbours. Thanks :)
Sure, that would be a natural extension to this problem. I think the most intuitive solution would be to use an additional max-heap into which you keep adding points. Restrict the size of the heap to K (5 or 6 in your example). Whenever you add a point to the heap and the size becomes greater than K, remove the top element. The top element will be the one with the greatest distance. So ultimately you'll end up amassing K points with the shortest distance.
@@stablesort That would be a great addition to this video though :D thank you for that one, now i need to learn about heap maps
@@nchm2625 If you are coding in Java, take a look at PriorityQueue class - it's Java's heap implementation. It can do exactly what you are looking for.
@@stablesort Do you know if there is any literature about this specific problem, because i can't find anything? I'm greatful for every idea!
@@nchm2625 Try searching for "k nearest neighbors using kdtree", such as:
stackoverflow.com/questions/34688977/how-do-i-traverse-a-kdtree-to-find-k-nearest-neighbors
Beautiful
why do you have a corn in a flower pot???
thank you!🤞🤞
how to search for second nearest neighbor?
Have you found a solution? I need to find the 5 closest points, but I can't figure out how to do it
@@Артем-х7п6с still not, One stupid method is like delete the first node that you have found from the tree and then go for first node again, it will be your second.
we just gonna ignore the corn in the flower pot?
:-p
The shown code is incorrect!!!
So cool
Cheers!
thankss
cheers!
Is this Mlepnos?
Jokes aside, nice explanation.
ty for video
10/10 :-)
Awesome
3:59 good
🫡
Kevin Durant tree lesss go
orz