Thanks for the video! Wondering if there could also be a DFS version of this video since it would follow what was previously done in course schedule 1 and 2, somewhat creating a template kind of answer and building up on it
class Solution: def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: graph = collections.defaultdict(list) def has_cycle(node1, node2): """ Performs a depth-first search (DFS) to check if there is a path between node1 and node2 in the current graph, excluding the edge between node1 and node2. """ visited = set() def dfs(current_node, target_node): # Mark the current node as visited visited.add(current_node) # If we reached the target node, a cycle exists if current_node == target_node: return True # Explore the unvisited neighbors of the current node for neighbor in graph[current_node]: if neighbor not in visited: if dfs(neighbor, target_node): return True return False # Start the DFS from node1, targeting node2 return dfs(node1, node2) for node1, node2 in edges: # If there is already a path between node1 and node2 in the current graph, # the edge between node1 and node2 would create a cycle, so it's the redundant edge if has_cycle(node1, node2): return [node1, node2] # Otherwise, add the edge to the graph graph[node1].append(node2) graph[node2].append(node1) # If no redundant edge is found, return None return None This is how you can do it in dfs, but it is different from course schedule 1 or 2
@@PedanticAnswerSeeker Should we add a condition to check whether the visited node has the parent as its neighbor tho since this is an undirected graph?
DFS can be O(N) and >90%. I wouldn't recommend learning rare patterns by heart if not required. My solution uses a dict instead of a set to find the cycle because it preserves insertion order and is also O(1) to delete/add/find: adj_list, cycle = {}, {} for a,b in edges: adj_list.setdefault(a,[]).append(b) adj_list.setdefault(b,[]).append(a) def dfs(node, parent): if node in cycle: for k in list(cycle.keys()): if k == node: return True del cycle[k] cycle[node] = None for child in adj_list[node]: if child != parent and dfs(child,node): return True del cycle[node] return False dfs(edges[0][0],-1) for a,b in edges[::-1]: if a in cycle and b in cycle: return (a,b)
This is genius actually! no need to learn any new algorithm. @germanguisado1276 The del cycle[k] line can be read as follows: "When I find a cycle, then, remove those nodes I visited before finding the one with the cycle. At the final step (after dfs) consider only those nodes I visited after the node causing the cycle."
You loop every edges and does cycle check and revisit all adj list nodes using DFS. I don't think your solution is O(V), it is O(V^2 + E). Although Neetcode explains in his other graph video that union-find find operation is amortised log(V). But it can be further reduced down to almost constant time (inverse ackermaan function a(V)) if we join the disjoint graph to the longer branch like Neetcode does it in this video. (balancing the parent search tree). I think the reason your solution finishes the given problem in similar runtime in comparison to optimal union find is because that it is only asking for 1 redundant edge. So your algorithm doesn't always iterate V^2 times and prematurely terminates. That del operation does some optimisation too. While the cycle check can be up to O(V), but it is often not in practice.
The DFS function does visit all adj nodes in the worst case but this just means the DFS function visits all vertices in the worst case. The statement: if adj != parent , stops the function from looking at vertices that have already been considered (with the exception of when we rejoin the cycle and node in cycle is true). This makes the DFS function O(V). He then loops through every list in edges but each iteration of the loop performs two hashtable lookups (which are constant time) making each iteration of the loop constant time. This makes the loop O(E) since in the worst case, there are E iterations and each iteration is constant time. The loop happens after the DFS function has been called, meaning the overall time complexity is O(V+E), V=E in this case so the time complexity simplifies to O(V). @@illu1na
Thanks for the effort, here is a note on the path compression; this code doesn't make par[n] point to the ultimate root parent of the sub-tree, it only makes its parent does that!
12:20 the path compression doesn't make par[n] point to the ultimate root parent of the sub-tree, it only makes its parent, par[par[n]] have this property, because you used par[p] = par[par[p]], where p is initially = par[n]. Ideally, you want all intermediate parents to point to the root parent, but the code skips one level. Interesting to compare with ua-cam.com/video/8f1XPm4WOUc/v-deo.html, where you do it correctly, by changing L7 (first line under find()) to be p = n. In this way, you don't skip one level.
Here is the simplified C++ version, the core idea is to find the redundant, and we don't really need the rank here: Hope it helps vector findRedundantConnection(vector& edges) { vector parent(edges.size() + 1, -1); for (auto e: edges) { int x = findRoot(e[0], parent); int y = findRoot(e[1], parent); // find cycle if (x == y) { return {e[0], e[1]}; } else { parent[y] = x; } } return {}; } int findRoot(int n, const vector& parent) { return parent[n] == -1 ? n : findRoot(parent[n], parent); }
The path compression you have done looks like return par[par[par[n]]]; I think the find function should be implemented recursively to update the parents for each node in the root. Also, the question asks to return the edge that satisfies the condition and is nearer to the end of the input array, has it been done above? Please let me know if I'm missing something.
Hey since here we have |E| = n, and |V| = n, the DFS approach would take O(|E|+|V|) = O(n). Union Find is doesn't particularly do any better in time, maybe it's slightly better memory optimized.
At 12:45: wouldn't your path compression code only shorten the links (up to the root) by 1? For example if we have: 5 -> 4 -> 3 -> 2 -> 1, where 1 is the root and we call find() on 5. At the end of your find() function, I believe the parent of 5 will be 3, not 1, is that correct?
def get_parent(node): temp = node while parent[node] != node: node = parent[node] parent[temp] = node return node yeah this one is shorter and easier to understand too
@neetcode is there any other video of yours that you would recommend to watch union find? I feel like union find is not explained in detail very much in this video.
Great explanation, thanks! Btw, when we join parent 2 to parent 1, we update the rank of parent 1, which is the addition of both ranks. I wonder, should we also update the rank of parent 2, which is equal to zero?
At 3:14, he added 4 to the graph but that was not possible. The question said the graph was a tree with out cycle before adding an extra edge. The extra edge had to choose two nodes from an already existed list. You cant just add a 4? I am reading the question correctly?
This solution is easier to understand, hope it helps. class Solution: def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: # variables n = len(edges) parents = [i for i in range(n + 1)] def find_group_root(node): group_root = node while group_root != parents[group_root]: group_root = parents[group_root] return group_root for v1, v2 in edges: group_root1 = find_group_root(v1) group_root2 = find_group_root(v2) if group_root1 == group_root2: return [v1, v2] # union two groups parents[group_root2] = group_root1
I think you can do O(N) with DFS. You shouldn't tackle the problem "which edge should we add?" You should tackle the problem: "traverse the whole graph, find all the edges part of a cycle and then figure out which one is the last edge from the input." Traversing the whole graph while finding all the edges part of a cycle is O(V+E). Figuring out which edge is the last one is O(E) thanks to storing the edges that are part of a cycle in a cycle set. I use a stack and a set combined to keep track of the cycle path. I keep the edges that are part of a cycle stored in a set and then I just iterate from the last input edge towards the first input edge and check my cycle set. That should be O(V+E) The JavaScript code is a bit messy, I'll refactor it for more learnings. I use closures a lot so I can have "scoped globals". It means I don't need to worry about return semantics, I can just store whatever result I need in my "scoped global". /** * @param {number[][]} edges * @return {number[]} */ var findRedundantConnection = function(edges) { const cycleSet = new Set() //stores edges found in cycles //create adjacency list const adjList = {} for (const [source, dest] of edges) { adjList[source] = new Set() adjList[dest] = new Set() } for (const [source, dest] of edges) { adjList[source].add(dest) adjList[dest].add(source) } const visited = new Set() findCycles(1) //assuming graph is connected for (let i = edges.length - 1; i >= 0; i--) { const edgeKey = edges[i].sort((a, b) => a - b).join(',') if (cycleSet.has(edgeKey)) return edges[i] } return //done function findCycles(node) { const pathSet = new Set() const pathStack = [] dfs(node) function dfs(node) { if (pathSet.has(node)) { //cycle found const last = pathStack[pathStack.length - 1] const edge = [last, node].sort((a, b) => a - b).join(',') if (pathStack.length === 1) return cycleSet.add(edge) for (let i = pathStack.length - 1; i >= 0; i--) { const prev = pathStack[i-1] const curr = pathStack[i] if (curr === node) break const edgeKey = [prev, curr].sort((a, b) => a - b).join(',') cycleSet.add(edgeKey) } return } if (visited.has(node)) return pathSet.add(node) pathStack.push(node) visited.add(node) for (const neighbor of adjList[node] ) { adjList[neighbor].delete(node) dfs(neighbor) } pathSet.delete(node) pathStack.pop() } } };
Two easier approaches to that 'find' function: (TypeScript) //1 function findParent(node: number): number { if (node != par[node]) return findParent(par[node]) else return node; } //2 function findParent(node: number): number { while (node != par[node]) { node = par[node] } return node; }
It's because you only return the edge when the edge being added connects two nodes who have the same parent. By definition of the algorithm above, connecting two nodes who have the same parent creates a cycle. Also by definition, the edge that creates a cycle is the last edge that was needed to close the cycle. So the result is guaranteed to be the last edge. Neetcode goes through this from 5:30- about how it's guaranteed we get a cycle when we have n edges and n nodes.
@hanjiang9888 a simple explanation, you are iterating through all edges and slowly "connecting" the graph. The first edge you encounter that creates a cycle is the last eligible edge in the input that could create a cycle, since you are building it up from fully disconnected.
Consider the first example. Any one of the edges could be a correct answer since they all create a cycle. But as we go through our algorithm, it's only the last added edge that confirms it is indeed a cycle. Since our algorithm is essentially building/connecting the graph up from scratch, the one that confirms the cycle is guaranteed to be the final edge of the possible correct answers. And since edges is limited to size n and the graph was originally a connected tree, we know there aren't enough edges to create any more cycles down the line.
create a hashset to remember nodes, and just go through the edges array, and check whether both the two nodes are in the hashset, if not just add them, if yes, then that is the edge that is causing the issue?
Consider this counter-example: (1,2), (3,4), (1,3), (1,4). By the second edge, the hashset would be (1,2,3,4). Now for (1,3), both nodes are in the hashset but adding it doesn't create a cycle. Yet your code would return (1,3).
@Neetcode, can you explain this to me please? The code for path compression in recursive format is def find(n): if par[n] != n: par[n] = find(par[n]) return par[n] In iterative format like the one you did with while loop, shouldn't it be def find(n): while n != par[n]: par[n] = par[par[n]] n = par[n] return par[n]
This is 10 months late, but I'm assuming you were referring to `return par[n]` in your code compared to `return p` in the video. Both will have the same result because `n == par[n]` when the while loop exits, so it doesn't matter which one is returned (The variables p and n are basically equivalent).
@@oacl are you referring to my iterative code? What i meant is the structure of the iterative code used in the video is different from i what found on google
The time complexity of this algorithm is on average when doing M actions of union and N actions of find Theta(M+N) which means each action on average would take Theta(1)
Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow. Space complexity: O(n)
is keeping track of rank necessary? This solution works for me: class Solution: def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: parent = list(range(len(edges) + 1)) def find(x): if x != parent[x]: parent[x] = find(parent[x]) return parent[x] def union(x, y): rootX, rootY = find(x), find(y) if rootX == rootY: return [x, y] parent[rootX] = rootY for x, y in edges: if union(x, y): return [x, y]
Can also construct the disjoint set DS with negative values meaning parents, and absolute value in the parent as the number of nodes in that disjoint set.
Thanks for the video. I am one comment regarding the rank array. Do we really need the rank array for this problem specifically? As we already know there's one solution and return the first redundant edge. So we can return the redundant edge at the union method when we find p1 == p2. Please let me know if that's the case.
3 months late, but I thought so too, until I attempted to code up the solution by returning [n1, n2] if n1 and n2 have the same parents. I couldn't find a way to do this- the thing is, you want to run union(n1, n2) until you find two nodes whose parents are the same, but how do you make the caller call union(n1,n2) until it returns a [n1, n2] when it finds a redundant edge? You have to return something for the nonproblematic cases as well (in this soln's case, return a boolean True). The easiest way to do it is run it until it returns a boolean (False), and then return [n1, n2].
Here's the summary I wrote out for my Anki flashcard in case anyone else finds it helpful: --- union-find Explanation via analogy: imagine you have chopsticks and Play-Doh balls with alphabetical letters on them, and you have a list of instructions that specify two balls to connect via a chopstick, similar to the instructions in a Lego kit. You will connect two Play-Doh balls with a chopstick as you read each new instruction (each new edge). As you go through the instructions (edges), you'll notice that early on you will have multiple different groups of balls-and-chopsticks (aka disjoint sets), but as you go on you will start combining the different groups. With each new instruction you will combining two groups (where a ball by itself is considered a group), UNTIL you get to a very special instruction that asks you to combine two Play-Doh balls that are already in the same group. That's the instruction (edge) you will return as output. My summary: If a question ever mentions a tree, you should remember that a disjoint set is one type of tree data structure. Iterate through the edges, adding each edge one-at-a-time to a disjoint-set data structure (aka a union-find data structure); this involves maintaining one array to keep track of the reference/parent vertex that each vertex has as its ultimate parent, and another array to keep track of the size of the different sets (so you know how to most-efficiently combine sets to make it fast to look up the reference/parent vertex for any given vertex). When you come across an edge that would be redundant (both vertices are already in the same set / have the same reference vertex), that's the one you want to return. One optimization you can use in this case is path compression, since we don't need to maintain the original shape of the graph to answer the question of which edge is redundant.
because when taking union of two graph we don't know if there is only one node in that second graph. SO that is the reason we add rank of p2 (i.e all the nodes of p2 graph ).
@@ychennay no, that code is incorrectly adding the ranks but in the equality case the rank should only be increased by one and handled separately. Also, path compression is buggy in this code too.
i have a request i love your solutions, and they perhaps are intuitive, but i have trouble coming to the solutions myself, i mean can you feature more hints that you pick up when reading a question that helps you come up with the solution, somthing like, oh the question is asking for unique entries in the answer perhaps we can just use a set, this is just an example but you get what i mean?
like in this question, you outright tell us we can use the union find, and after that i was able to code the solution myself, but stuff like, hey we are given edges, so if we start grouping them together(union) we may find out when a vertex is already in the group we were adding it to , so the edge that caused us to detect this is the answer. see i was struggling with this question but as soon as you said union find, i was like duhh how stupid am i, what does it take to come to the point where i can come to the answer on my own
@@chaitanyasharma6270 You just need more practice, that's how most people get good at this. I've seen a lot of very good competitive programmers who always say the same thing, if you want to get good like them you need a lot of practice. In time you will get better.
Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow. Space complexity: O(n)
So when we updating the parent we are actually updating the parent of the updating node to point parent of other node . this took me a while to understand since it is not seen every testcase simpler solution class Solution { int[] par; int[] rank; public int[] findRedundantConnection(int[][] edges) { int n = edges.length; par = new int[n + 1]; rank = new int[n + 1]; for (int i = 1; i 0) { n = par[n]; } return n; } public boolean union(int n1, int n2) { int p1 = find(n1); int p2 = find(n2); if (p1 == p2) { return false; // Already in the same set } if (rank[p1] > rank[p2]) { par[p2] = p1; rank[p1] += rank[p2]; } else { par[p1] = p2; rank[p2] += rank[p1]; } return true; // Successfully unioned }
mmmm i didn't feel like the ranks and compressed path-find were elements of the resolution and the algorithm kinda works just fine without them? I think it feels simpler without them wdyt? def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: n = len(edges) par = [i for i in range(n+1)] def find(n): p = par[n] while par[p]!=p: p = par[p] return p def union(n1,n2): p1,p2 = find(n1), find(n2) if p1==p2: return False par[p2] = p1 return True for a,b in edges: if not union(a,b): return [a,b]
While the code is indeed simpler, the time complexity suffers - which is obviously not ideal in an interview setting. By removing union by rank and path compression the find (and as a consequence, the union) functions become a linear time complexity operation. In the scope of this question that brings us from an O(nlogn) algorithm up to an O(n^2) algorithm.
Path compression and union by rank are making the tree as flat as possible, making the find/union functions have nearly constant time complexity for any practical input size. With path compression/union by rank: find/union have O(α(n)) amortized time complexity where α(n) is the inverse Ackermann function which grows extremely slow. Without those optimization, the find/union functions would have linear complexity in the worst case
Pre-req to solve this problem: Know union() and find(), know union find by rank, know Path compression Also isn't it UNION BY SIZE instead of UNION BY RANK? Because we are using size of disjoint sets/ trees and not height/ depth
you messed up the drawing based on your code, the first iteration would make 1 a child of 2. par = [1, 2, 3]: rank = [1, 1, 1]: Processing [1, 2]: Since the ranks are equal, node 1 becomes the child of node 2. par = [2, 2, 3] rank = [1, 2, 1]
This is more like By Size, not by Rank and path compression wastes a cycle by not going to parent first. Also we can use maps/objects instead of arrays for more speed. Here is a JS implementation employing both path compression and ranking: ```js function findRedundantConnection(edges) { const par = {}, rank = {} for(let i = 1; i par[x] === x ? x : par[x] = find(par[x]) function union(n1, n2) { let p1 = find(n1), p2 = find(n2) if (p1 == p2) return true if (rank[p1] < rank[p2]) par[p1] = p2 else if (rank[p1] > rank[p2]) par[p2] = p1 else par[p1] = p2, rank[p2]++ } for (const [n1, n2] of edges) if (union(n1, n2) === true) return [n1, n2] } ```
So I'm getting ready for an interview and try to find the time complexity. I think he forgot to mention it. In this case the find function is O(logV) because of the line 9, where we assign the grandparent to the current parent, so we shorten the search. Otherwise it works without that line, but then the time complexity would be O(V). So due to it, the time complexity of finding the redundant connection is O(ElogV).
The part where he assigns grandparent as the parent is essentially halving the height so that future find operations will be cheaper, the operation always has h accesses where h is height, the key point is that h is only big occasionally, most of the time the find function will take near constant time. The amortized time complexity of find function is α(n) as proven by Robert Tarjan. Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow. Space complexity: O(n)
Hi there, thanks for the video explanations which help a lot. Am I right that you have mentioning n_edges == n_nodes leads a cycle? I doubt it since for n_edges == n_nodes == 5, we can have a disconnected graph.
There is actually no reason to use a rank array in this problem. It doesn't matter which component's root node will end up being the root node of two previously unconnected components.
@@nghiavo6263 after doing some examples, I think it helps you traverse up the hierarchy quicker for the current find(n). Additionally, since it updates par[p], the next time you need to find par[p], it will be done faster.
Why are ranks getting added to each other? As per my understanding, it is something like: if rank[p1] < rank[p2]: parent[p1] = p2 elif rank[p1] > rank[p2]: parent[p2] = p1 else: parent[p2] = p1 rank[p1] += 1 # Rank of a tree only increases by 1 if both trees have equal ranks. # If the ranks are unequal, then we append the smaller tree below the larger tree's root and the larger tree's rank stays the same.
while node != parent[node]: parent[node] = parent[parent[node]] node = parent[node] return node or def findParent(node): p = parent[node] while p != parent[p]: parent[p] = parent[parent[p]] p = parent[p] parent[node] = p return p
I think we don't need both rank and parent to track the cycle, with parent array itself we can achieve... Below is my solution beats 98% in leetcode and simple array addition + find ------------------------------------------------------------------------------------------------------------------------------------ def findRedundantConnection(self, edges: List[List[int]]) -> List[int]: parents = [-1] * (len(edges) + 1) def find_parent(node): if parents[node] == -1: return node return find_parent(parents[node]) def union(node1, node2): parents[node1] = node2 for edge in edges: node1, node2 = edge node1_parent, node2_parent = find_parent(node1), find_parent(node2) if node1_parent == node2_parent: return edge union(node1_parent, node2_parent) ----------------------------------------------------------------------------------------------------------------------------------------- Thank you soo much for all your videos, From your videos, I can see a good improvement in my thinking towards a solution
using rank is more optimized, with rank you always attach the smaller tree to the root of the bigger tree making a shorter path. the performance on leetcode is misleading since their testcases are small, in large testcase rank will be faster.
💡 GRAPH PLAYLIST: ua-cam.com/video/EgI5nU9etnU/v-deo.html
Thanks for the video! Wondering if there could also be a DFS version of this video since it would follow what was previously done in course schedule 1 and 2, somewhat creating a template kind of answer and building up on it
class Solution:
def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
graph = collections.defaultdict(list)
def has_cycle(node1, node2):
"""
Performs a depth-first search (DFS) to check if there is a path between node1 and node2
in the current graph, excluding the edge between node1 and node2.
"""
visited = set()
def dfs(current_node, target_node):
# Mark the current node as visited
visited.add(current_node)
# If we reached the target node, a cycle exists
if current_node == target_node:
return True
# Explore the unvisited neighbors of the current node
for neighbor in graph[current_node]:
if neighbor not in visited:
if dfs(neighbor, target_node):
return True
return False
# Start the DFS from node1, targeting node2
return dfs(node1, node2)
for node1, node2 in edges:
# If there is already a path between node1 and node2 in the current graph,
# the edge between node1 and node2 would create a cycle, so it's the redundant edge
if has_cycle(node1, node2):
return [node1, node2]
# Otherwise, add the edge to the graph
graph[node1].append(node2)
graph[node2].append(node1)
# If no redundant edge is found, return None
return None
This is how you can do it in dfs, but it is different from course schedule 1 or 2
@@PedanticAnswerSeeker Should we add a condition to check whether the visited node has the parent as its neighbor tho since this is an undirected graph?
@@rrt19254 Yes, that's one of the conditions required. Otherwise, you'd be revisiting and detecting fake cycles imo
DFS can be O(N) and >90%. I wouldn't recommend learning rare patterns by heart if not required. My solution uses a dict instead of a set to find the cycle because it preserves insertion order and is also O(1) to delete/add/find:
adj_list, cycle = {}, {}
for a,b in edges:
adj_list.setdefault(a,[]).append(b)
adj_list.setdefault(b,[]).append(a)
def dfs(node, parent):
if node in cycle:
for k in list(cycle.keys()):
if k == node:
return True
del cycle[k]
cycle[node] = None
for child in adj_list[node]:
if child != parent and dfs(child,node):
return True
del cycle[node]
return False
dfs(edges[0][0],-1)
for a,b in edges[::-1]:
if a in cycle and b in cycle:
return (a,b)
The part with removing all the keys up to the point of detected cycle is brilliant!
whats the logic behind the del cycle[k] line? I dont quite understand the intention there
This is genius actually! no need to learn any new algorithm.
@germanguisado1276
The del cycle[k] line can be read as follows:
"When I find a cycle, then, remove those nodes I visited before finding the one with the cycle.
At the final step (after dfs) consider only those nodes I visited after the node causing the cycle."
You loop every edges and does cycle check and revisit all adj list nodes using DFS. I don't think your solution is O(V), it is O(V^2 + E).
Although Neetcode explains in his other graph video that union-find find operation is amortised log(V). But it can be further reduced down to almost constant time (inverse ackermaan function a(V)) if we join the disjoint graph to the longer branch like Neetcode does it in this video. (balancing the parent search tree).
I think the reason your solution finishes the given problem in similar runtime in comparison to optimal union find is because that it is only asking for 1 redundant edge. So your algorithm doesn't always iterate V^2 times and prematurely terminates.
That del operation does some optimisation too. While the cycle check can be up to O(V), but it is often not in practice.
The DFS function does visit all adj nodes in the worst case but this just means the DFS function visits all vertices in the worst case. The statement: if adj != parent , stops the function from looking at vertices that have already been considered (with the exception of when we rejoin the cycle and node in cycle is true). This makes the DFS function O(V).
He then loops through every list in edges but each iteration of the loop performs two hashtable lookups (which are constant time) making each iteration of the loop constant time. This makes the loop O(E) since in the worst case, there are E iterations and each iteration is constant time.
The loop happens after the DFS function has been called, meaning the overall time complexity is O(V+E), V=E in this case so the time complexity simplifies to O(V).
@@illu1na
Thanks for the effort, here is a note on the path compression; this code doesn't make par[n] point to the ultimate root parent of the sub-tree, it only makes its parent does that!
12:20 the path compression doesn't make par[n] point to the ultimate root parent of the sub-tree, it only makes its parent, par[par[n]] have this property, because you used par[p] = par[par[p]], where p is initially = par[n]. Ideally, you want all intermediate parents to point to the root parent, but the code skips one level.
Interesting to compare with ua-cam.com/video/8f1XPm4WOUc/v-deo.html, where you do it correctly, by changing L7 (first line under find()) to be p = n. In this way, you don't skip one level.
oh... this video make me headche and the leetcode323. explanation cure my headche. this is the worst explanation I ever seen from neetcode
Nice catch! I didn't even notice that until I saw your comment.
I am really grateful that you have explained the Union Find Algo brilliantly using this example. Thanks buddy 😄👍👍
This dude is changing lives. I wonder if he grasps the full extent of it.
Here is the simplified C++ version, the core idea is to find the redundant, and we don't really need the rank here:
Hope it helps
vector findRedundantConnection(vector& edges) {
vector parent(edges.size() + 1, -1);
for (auto e: edges) {
int x = findRoot(e[0], parent);
int y = findRoot(e[1], parent);
// find cycle
if (x == y) {
return {e[0], e[1]};
} else {
parent[y] = x;
}
}
return {};
}
int findRoot(int n, const vector& parent) {
return parent[n] == -1 ? n : findRoot(parent[n], parent);
}
oh my god this is actually beautiful!!
The path compression you have done looks like return par[par[par[n]]]; I think the find function should be implemented recursively to update the parents for each node in the root. Also, the question asks to return the edge that satisfies the condition and is nearer to the end of the input array, has it been done above? Please let me know if I'm missing something.
Hey since here we have |E| = n, and |V| = n, the DFS approach would take O(|E|+|V|) = O(n). Union Find is doesn't particularly do any better in time, maybe it's slightly better memory optimized.
At 12:45: wouldn't your path compression code only shorten the links (up to the root) by 1? For example if we have:
5 -> 4 -> 3 -> 2 -> 1, where 1 is the root and we call find() on 5. At the end of your find() function, I believe the parent of 5 will be 3, not 1, is that correct?
def get_parent(node):
temp = node
while parent[node] != node:
node = parent[node]
parent[temp] = node
return node
yeah this one is shorter and easier to understand too
@neetcode is there any other video of yours that you would recommend to watch union find? I feel like union find is not explained in detail very much in this video.
ua-cam.com/video/ayW5B2W9hfo/v-deo.html
i also want the same
@@BurhanAijaz ua-cam.com/video/wU6udHRIkcc/v-deo.html
The explanation keeps getting better 🔥 BRO!
Great explanation, thanks!
Btw, when we join parent 2 to parent 1, we update the rank of parent 1, which is the addition of both ranks. I wonder, should we also update the rank of parent 2, which is equal to zero?
Thank you so much. I learnt union find because of you.
This is the best channel, your explanation is great and code is clear. Keep it up
At 3:14, he added 4 to the graph but that was not possible. The question said the graph was a tree with out cycle before adding an extra edge. The extra edge had to choose two nodes from an already existed list. You cant just add a 4? I am reading the question correctly?
You could implement edge compression in the find() function to make it more efficient. If you have a long path parent
he has done that, right?
He did
This is so easy to understand! But could you also cover 685. redundant connection II and compare these two..?
This solution is easier to understand, hope it helps.
class Solution:
def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
# variables
n = len(edges)
parents = [i for i in range(n + 1)]
def find_group_root(node):
group_root = node
while group_root != parents[group_root]:
group_root = parents[group_root]
return group_root
for v1, v2 in edges:
group_root1 = find_group_root(v1)
group_root2 = find_group_root(v2)
if group_root1 == group_root2:
return [v1, v2]
# union two groups
parents[group_root2] = group_root1
I think you can do O(N) with DFS. You shouldn't tackle the problem "which edge should we add?" You should tackle the problem: "traverse the whole graph, find all the edges part of a cycle and then figure out which one is the last edge from the input." Traversing the whole graph while finding all the edges part of a cycle is O(V+E). Figuring out which edge is the last one is O(E) thanks to storing the edges that are part of a cycle in a cycle set.
I use a stack and a set combined to keep track of the cycle path. I keep the edges that are part of a cycle stored in a set and then I just iterate from the last input edge towards the first input edge and check my cycle set. That should be O(V+E)
The JavaScript code is a bit messy, I'll refactor it for more learnings.
I use closures a lot so I can have "scoped globals". It means I don't need to worry about return semantics, I can just store whatever result I need in my "scoped global".
/**
* @param {number[][]} edges
* @return {number[]}
*/
var findRedundantConnection = function(edges) {
const cycleSet = new Set() //stores edges found in cycles
//create adjacency list
const adjList = {}
for (const [source, dest] of edges) {
adjList[source] = new Set()
adjList[dest] = new Set()
}
for (const [source, dest] of edges) {
adjList[source].add(dest)
adjList[dest].add(source)
}
const visited = new Set()
findCycles(1) //assuming graph is connected
for (let i = edges.length - 1; i >= 0; i--) {
const edgeKey = edges[i].sort((a, b) => a - b).join(',')
if (cycleSet.has(edgeKey)) return edges[i]
}
return //done
function findCycles(node) {
const pathSet = new Set()
const pathStack = []
dfs(node)
function dfs(node) {
if (pathSet.has(node)) { //cycle found
const last = pathStack[pathStack.length - 1]
const edge = [last, node].sort((a, b) => a - b).join(',')
if (pathStack.length === 1) return
cycleSet.add(edge)
for (let i = pathStack.length - 1; i >= 0; i--) {
const prev = pathStack[i-1]
const curr = pathStack[i]
if (curr === node) break
const edgeKey = [prev, curr].sort((a, b) => a - b).join(',')
cycleSet.add(edgeKey)
}
return
}
if (visited.has(node)) return
pathSet.add(node)
pathStack.push(node)
visited.add(node)
for (const neighbor of adjList[node] ) {
adjList[neighbor].delete(node)
dfs(neighbor)
}
pathSet.delete(node)
pathStack.pop()
}
}
};
Two easier approaches to that 'find' function: (TypeScript)
//1
function findParent(node: number): number {
if (node != par[node])
return findParent(par[node])
else return node;
}
//2
function findParent(node: number): number {
while (node != par[node]) {
node = par[node]
}
return node;
}
why is the result guaranteed to be the last redundant edge in the input?
I am also wondering this.
It's because you only return the edge when the edge being added connects two nodes who have the same parent.
By definition of the algorithm above, connecting two nodes who have the same parent creates a cycle.
Also by definition, the edge that creates a cycle is the last edge that was needed to close the cycle. So the result is guaranteed to be the last edge.
Neetcode goes through this from 5:30- about how it's guaranteed we get a cycle when we have n edges and n nodes.
@hanjiang9888 a simple explanation, you are iterating through all edges and slowly "connecting" the graph. The first edge you encounter that creates a cycle is the last eligible edge in the input that could create a cycle, since you are building it up from fully disconnected.
Consider the first example. Any one of the edges could be a correct answer since they all create a cycle. But as we go through our algorithm, it's only the last added edge that confirms it is indeed a cycle. Since our algorithm is essentially building/connecting the graph up from scratch, the one that confirms the cycle is guaranteed to be the final edge of the possible correct answers.
And since edges is limited to size n and the graph was originally a connected tree, we know there aren't enough edges to create any more cycles down the line.
create a hashset to remember nodes, and just go through the edges array, and check whether both the two nodes are in the hashset, if not just add them, if yes, then that is the edge that is causing the issue?
Consider this counter-example: (1,2), (3,4), (1,3), (1,4). By the second edge, the hashset would be (1,2,3,4). Now for (1,3), both nodes are in the hashset but adding it doesn't create a cycle. Yet your code would return (1,3).
@Neetcode, can you explain this to me please? The code for path compression in recursive format is
def find(n):
if par[n] != n:
par[n] = find(par[n])
return par[n]
In iterative format like the one you did with while loop, shouldn't it be
def find(n):
while n != par[n]:
par[n] = par[par[n]]
n = par[n]
return par[n]
This is 10 months late, but I'm assuming you were referring to `return par[n]` in your code compared to `return p` in the video. Both will have the same result because `n == par[n]` when the while loop exits, so it doesn't matter which one is returned (The variables p and n are basically equivalent).
@@oacl are you referring to my iterative code? What i meant is the structure of the iterative code used in the video is different from i what found on google
Great job explaining this! I finally understand it.
Hi @neetcode, Is in the worst case O(nlogn) time will take this approach because of find operation?
Please correct me if am wrongly analysis
The time complexity of this algorithm is on average when doing M actions of union and N actions of find Theta(M+N) which means each action on average would take Theta(1)
Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow.
Space complexity: O(n)
very good explanation, thanks for the idea
feels like you jumped a bit too much on this one man.
is keeping track of rank necessary? This solution works for me:
class Solution:
def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
parent = list(range(len(edges) + 1))
def find(x):
if x != parent[x]:
parent[x] = find(parent[x])
return parent[x]
def union(x, y):
rootX, rootY = find(x), find(y)
if rootX == rootY:
return [x, y]
parent[rootX] = rootY
for x, y in edges:
if union(x, y):
return [x, y]
thanks for video. Nice explanation. path compression can be improved here. def find(self, p):
if self.parent[p] != p:
self.parent[p] = self.find(self.parent[p]) # Path compression
return self.parent[p]
Excellent video. We don't need to do par[p] = par[par[p]]
while n!=par[n]
n = par[n]
return n
would suffice too
It is needed to improve both the find and union time complexity. It is called path compression.
Omg I'm terrified. What a coincidence. I have a SC interview in a week. 😭
Good luck, you're a viewer of mine so I'm sure you're gonna kill it 😅
how was your interview?? any suggestions
What is SC? Standard Chartered?
@@vineethsai1575 snap chat
@@GetUpAndWatchVideos oh wow. Haven’t seen the tagged company😂, was preparing this for Google
Nice Explanation
Can we apply union find for both directed and undirected graphs?
no
Can also construct the disjoint set DS with negative values meaning parents, and absolute value in the parent as the number of nodes in that disjoint set.
How is this O(n) though ? As we are looping through every edge we are using find function which has a while loop. Shoudn't it be O(n^2) ?
Wow, very clear explanation ~ help me a lot, TY!
oh, i get it, there is a condition that two nodes have the same parent which leads to a cycle by connecting them since they form a triangle.
Hey!
Why are we connecting subtree root having a lesser rank with another having larger? Can't we connect opposite of this?
study disjoint set union data structure ,this is the optimization technique, union by rank
Thanks for the video. I am one comment regarding the rank array. Do we really need the rank array for this problem specifically? As we already know there's one solution and return the first redundant edge. So we can return the redundant edge at the union method when we find p1 == p2. Please let me know if that's the case.
3 months late, but I thought so too, until I attempted to code up the solution by returning [n1, n2] if n1 and n2 have the same parents.
I couldn't find a way to do this- the thing is, you want to run union(n1, n2) until you find two nodes whose parents are the same, but how do you make the caller call union(n1,n2) until it returns a [n1, n2] when it finds a redundant edge? You have to return something for the nonproblematic cases as well (in this soln's case, return a boolean True).
The easiest way to do it is run it until it returns a boolean (False), and then return [n1, n2].
I did it without the rank array or without the fucntion UNION
Just let n1 becomes n2's parent if n1 != n2.
Here's the summary I wrote out for my Anki flashcard in case anyone else finds it helpful:
---
union-find
Explanation via analogy: imagine you have chopsticks and Play-Doh balls with alphabetical letters on them, and you have a list of instructions that specify two balls to connect via a chopstick, similar to the instructions in a Lego kit. You will connect two Play-Doh balls with a chopstick as you read each new instruction (each new edge). As you go through the instructions (edges), you'll notice that early on you will have multiple different groups of balls-and-chopsticks (aka disjoint sets), but as you go on you will start combining the different groups. With each new instruction you will combining two groups (where a ball by itself is considered a group), UNTIL you get to a very special instruction that asks you to combine two Play-Doh balls that are already in the same group. That's the instruction (edge) you will return as output.
My summary: If a question ever mentions a tree, you should remember that a disjoint set is one type of tree data structure. Iterate through the edges, adding each edge one-at-a-time to a disjoint-set data structure (aka a union-find data structure); this involves maintaining one array to keep track of the reference/parent vertex that each vertex has as its ultimate parent, and another array to keep track of the size of the different sets (so you know how to most-efficiently combine sets to make it fast to look up the reference/parent vertex for any given vertex). When you come across an edge that would be redundant (both vertices are already in the same set / have the same reference vertex), that's the one you want to return. One optimization you can use in this case is path compression, since we don't need to maintain the original shape of the graph to answer the question of which edge is redundant.
best channel forever
Thank you so much for proving best explanation ever
@Neetcode At 14:15 why did you use rank[p1] += rank[p2] instead of rank[p1] += 1?
because when taking union of two graph we don't know if there is only one node in that second graph. SO that is the reason we add rank of p2 (i.e all the nodes of p2 graph ).
@@blaiserodrigues2990 it still passes the test cases
@Neetcode At 14:15, did you take care of the cases where the rank of both are the same?
Doesn't the else condition (line 22 of the code) take care of that?
@@ychennay no, that code is incorrectly adding the ranks but in the equality case the rank should only be increased by one and handled separately. Also, path compression is buggy in this code too.
i got it. you are right. Sorry about the misception
Awesome explanation
i have a request i love your solutions, and they perhaps are intuitive, but i have trouble coming to the solutions myself, i mean can you feature more hints that you pick up when reading a question that helps you come up with the solution, somthing like, oh the question is asking for unique entries in the answer perhaps we can just use a set, this is just an example but you get what i mean?
like in this question, you outright tell us we can use the union find, and after that i was able to code the solution myself, but stuff like, hey we are given edges, so if we start grouping them together(union) we may find out when a vertex is already in the group we were adding it to , so the edge that caused us to detect this is the answer. see i was struggling with this question but as soon as you said union find, i was like duhh how stupid am i, what does it take to come to the point where i can come to the answer on my own
@@chaitanyasharma6270 You just need more practice, that's how most people get good at this. I've seen a lot of very good competitive programmers who always say the same thing, if you want to get good like them you need a lot of practice. In time you will get better.
Great video!
Is time complexity O(nlogn) and Space Complexity O(n)?
Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow.
Space complexity: O(n)
what is the time and space complexity here @NeetCode ?
Thank you mate, you are great!
Thanks!
So when we updating the parent we are actually updating the parent of the updating node to point parent of other node . this took me a while to understand since it is not seen every testcase
simpler solution
class Solution {
int[] par;
int[] rank;
public int[] findRedundantConnection(int[][] edges) {
int n = edges.length;
par = new int[n + 1];
rank = new int[n + 1];
for (int i = 1; i 0) {
n = par[n];
}
return n;
}
public boolean union(int n1, int n2) {
int p1 = find(n1);
int p2 = find(n2);
if (p1 == p2) {
return false; // Already in the same set
}
if (rank[p1] > rank[p2]) {
par[p2] = p1;
rank[p1] += rank[p2];
} else {
par[p1] = p2;
rank[p2] += rank[p1];
}
return true; // Successfully unioned
}
}
Correct "Path compression" code alternatives here:
1. Recursive:
def find(node):
if par[node] != node:
par[node] = find(par[node]) # Path compression, 1 line recursive
return par[node]
2. Iterative:
def find(node):
root = node
while root != par[root]:
root = par[root]
while root != node: # Path compression, while loop
nxt = par[node]
par[node] = root
node = nxt
return root
mmmm i didn't feel like the ranks and compressed path-find were elements of the resolution and the algorithm kinda works just fine without them? I think it feels simpler without them wdyt?
def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
n = len(edges)
par = [i for i in range(n+1)]
def find(n):
p = par[n]
while par[p]!=p:
p = par[p]
return p
def union(n1,n2):
p1,p2 = find(n1), find(n2)
if p1==p2:
return False
par[p2] = p1
return True
for a,b in edges:
if not union(a,b):
return [a,b]
While the code is indeed simpler, the time complexity suffers - which is obviously not ideal in an interview setting. By removing union by rank and path compression the find (and as a consequence, the union) functions become a linear time complexity operation. In the scope of this question that brings us from an O(nlogn) algorithm up to an O(n^2) algorithm.
Path compression and union by rank are making the tree as flat as possible, making the find/union functions have nearly constant time complexity for any practical input size.
With path compression/union by rank: find/union have O(α(n)) amortized time complexity where α(n) is the inverse Ackermann function which grows extremely slow.
Without those optimization, the find/union functions would have linear complexity in the worst case
Pre-req to solve this problem: Know union() and find(), know union find by rank, know Path compression
Also isn't it UNION BY SIZE instead of UNION BY RANK? Because we are using size of disjoint sets/ trees and not height/ depth
you messed up the drawing
based on your code, the first iteration would make 1 a child of 2.
par = [1, 2, 3]:
rank = [1, 1, 1]:
Processing [1, 2]:
Since the ranks are equal, node 1 becomes the child of node 2.
par = [2, 2, 3]
rank = [1, 2, 1]
This is more like By Size, not by Rank and path compression wastes a cycle by not going to parent first. Also we can use maps/objects instead of arrays for more speed. Here is a JS implementation employing both path compression and ranking:
```js
function findRedundantConnection(edges) {
const par = {}, rank = {}
for(let i = 1; i par[x] === x ? x : par[x] = find(par[x])
function union(n1, n2) {
let p1 = find(n1), p2 = find(n2)
if (p1 == p2) return true
if (rank[p1] < rank[p2]) par[p1] = p2
else if (rank[p1] > rank[p2]) par[p2] = p1
else par[p1] = p2, rank[p2]++
}
for (const [n1, n2] of edges) if (union(n1, n2) === true) return [n1, n2]
}
```
How would. you explain the time complexity for this problem during an interview?
and would you use path compression at all?
So I'm getting ready for an interview and try to find the time complexity. I think he forgot to mention it. In this case the find function is O(logV) because of the line 9, where we assign the grandparent to the current parent, so we shorten the search. Otherwise it works without that line, but then the time complexity would be O(V). So due to it, the time complexity of finding the redundant connection is O(ElogV).
Yeah that's the only issue I have with Neetcode, he skips time complexity analysis too often
In his other video "Top 5 Most Common Graph Algorithms for Coding Interviews". He did go over time complexity analysis for union find.
@@illu1nawhat? he literally went over it in the beginning 0:21
The part where he assigns grandparent as the parent is essentially halving the height so that future find operations will be cheaper, the operation always has h accesses where h is height, the key point is that h is only big occasionally, most of the time the find function will take near constant time. The amortized time complexity of find function is α(n) as proven by Robert Tarjan.
Time complexity: O(α(n)*n) which is O(n) for any practical input size, α(n) is the inverse Ackermann function which grows extremely slow.
Space complexity: O(n)
Hi there, thanks for the video explanations which help a lot. Am I right that you have mentioning n_edges == n_nodes leads a cycle? I doubt it since for n_edges == n_nodes == 5, we can have a disconnected graph.
where would the 5th unique edge go on a disconnected graph?
great tutorial....thanks :)
Thanks
Looks like you don't need ranks for this particular problem
thaank
very nice
def find(x):
if par[x] != x:
return find(par[x])
else:
return x
What an explanation 😃
if would just add - if par[par[p]]: par[p] = par[par[p]] on row 9.
This part is very confusing:
while p!= par[p]:
par[p] = par[par[p]]
p = par[p]
awesome
I think this should be a hard problem.
There is actually no reason to use a rank array in this problem.
It doesn't matter which component's root node will end up being the root node of two previously unconnected components.
No, more nodes will have to search 1 iteration longer during the find function to find the parent, so it is a (slight) optimization
How does this problem have 64% submission rate?
685. Redundant Connection II
yeah
I just really wish you didn't use the term "rank" here. A graph's rank is a completely unrelated concept in graphs.
p = par[n] while p!= par[n] doesn't make sense. do you mean n != par[n]
If someone is not able to understand the above explanation Below is one of the easy explanation
ua-cam.com/video/P4alLDv9rCk/v-deo.html
and also for those of you who are confused about why line 9, the code works just without that line
Nice name
I also confused about. thanks
Five months late, but line 9 is what conducts path compression. It's not supposed to change the behavior; it optimizes the 'find' function.
@@nghiavo6263 after doing some examples, I think it helps you traverse up the hierarchy quicker for the current find(n). Additionally, since it updates par[p], the next time you need to find par[p], it will be done faster.
Dang gave me spoilers before explaining the question. :(
Leetcode 968 please
First time I’ve had to slow down playback speed on YT. I’d suggest going slower with complex things like graphs
Why are ranks getting added to each other?
As per my understanding, it is something like:
if rank[p1] < rank[p2]:
parent[p1] = p2
elif rank[p1] > rank[p2]:
parent[p2] = p1
else:
parent[p2] = p1
rank[p1] += 1
# Rank of a tree only increases by 1 if both trees have equal ranks.
# If the ranks are unequal, then we append the smaller tree below the larger tree's root and the larger tree's rank stays the same.
It shouldn't stay the same, you've just appended a subtree. This is an optimization, weighted or ranked union find.
You are correct, that's how union by rank is done. The optimization done in this video is union by size.
Please solve leetcode 749: contain virus
def findParent(node):
while node != parent[node]:
parent[node] = parent[parent[node]]
node = parent[node]
return node
or
def findParent(node):
p = parent[node]
while p != parent[p]:
parent[p] = parent[parent[p]]
p = parent[p]
parent[node] = p
return p
us
Check this video out for better solution!
ua-cam.com/video/P4alLDv9rCk/v-deo.html
nope
1489,1553,1011,124,684,2328
I think we don't need both rank and parent to track the cycle, with parent array itself we can achieve... Below is my solution beats 98% in leetcode and simple array addition + find
------------------------------------------------------------------------------------------------------------------------------------
def findRedundantConnection(self, edges: List[List[int]]) -> List[int]:
parents = [-1] * (len(edges) + 1)
def find_parent(node):
if parents[node] == -1:
return node
return find_parent(parents[node])
def union(node1, node2):
parents[node1] = node2
for edge in edges:
node1, node2 = edge
node1_parent, node2_parent = find_parent(node1), find_parent(node2)
if node1_parent == node2_parent:
return edge
union(node1_parent, node2_parent)
-----------------------------------------------------------------------------------------------------------------------------------------
Thank you soo much for all your videos, From your videos, I can see a good improvement in my thinking towards a solution
using rank is more optimized, with rank you always attach the smaller tree to the root of the bigger tree making a shorter path. the performance on leetcode is misleading since their testcases are small, in large testcase rank will be faster.
Thanks!
thanks