Having recently taken Linear Algebra, I got so excited at around 15:55 when I realized he was about to talk about eigenvectors! What a cool connection!
Or even earlier ca. 6:50 regarding stationary states -- admittedly it's been years since I took LA myself, but I remember this point fondly from e.g. 3b1b's videos: a stationary point, or angle or vector or what have you, is exactly what the eigenvector&value tells you. :)
Wow excellent idea and video quality. PageRank hits on a lot of hugely important topics.. Monte Carlo.. finding stationary distributions.. creating an internet empire! Seriously top notch. Love seeing the technical details too. Reminds me there is demand for that on UA-cam
Just finished my freshman year at UC Berkeley as a CS student, and while watching this video my “Oski senses” were tingling. having found out 2 of my favorite UA-camrs were Cal alumni, I dug around for a bit and found out you too went to UC Berkeley and even taught CS61A. Keep making great videos and providing intuitive explanations to really complex problems. Go Bears!
I think it's the first time I've seen a video showing the movement of a Markov chain so dynamically. It really helps to get a lot of intuition. thank you!
Yeah, when I was looking at other videos on the topic, I was rather surprised to see that no one had shown that way of thinking about it. For me, it just felt like an absolute necessity to understand it!
There actually isn't any computational difference between the "System of Equations" method and the "Eigenvalues/Eigenvectors" method. Given that you already know that the matrix has an eigenvalue of 1, the eigenvector calculation boils down to the exact same system of equations. Eigenvectors are just an interesting way of framing the problem.
19:21 I think the "brute force" method can also be interpreted as the power method for calculating the largest eigenvalue/eigenvector. Since you already mentioned beforehand that the stationary solution is unique for non-periodic irreducible Markov chains, the corresponding matrix would also have only one unique eigenvalue (1) and eigenvector (stationary solution) pair. Overall, great explanation of the algorithm!
Yup, that's indeed a more formal and accurate way to put it, but thinking about it from the perspective of someone who has just started learning Markov chains, I really do feel that it is, in a sense, the first and most simple thought of what you might do to compute the stationary distribution. That thought paired with the fact that it's generally the most computationally practical method to solve for it makes it super satisfying.
@@Reducible Agreed. Monte Carlo simulations also came to mind, though in this case we are guaranteed that just one simulation is required to get the answer.
I’m an incoming freshman at Carnegie Mellon University who is well excited but also somewhat intimidated by the vast field of computer science. I just want to say that this video is beyond amazing. You are doing God’s work. Elegant MANIM visualization, lucid explanations, and the structure of the video all comes together and creates this powerful delivery of clarity. Although your name is Reducible, I don’t want to reduce you down to “The Grant Sanderson of Computer Science.” You’re much more than that. A great explanation is indistinguishable from art and you’re an excellent artist. Your videos inspire me in many ways and fuel my passion and interest in CS. I believe I’m not the only one and there are millions more to be inspired. So please keep it up. We appreciate your work. Thank you.
One of the most heart-warming comments I've received Shawn! Thank you so much! As cheesy as it does sound, I really do try to create something that feels like art to me. It's amazing to hear that others interpret it as such. In some ways, I think of the videos I make as the explanations I wish I had when I was in your position right now, an incoming student that was excited, but also worried about thoughts of whether I would be able to learn and understand the complex ideas in the field. Or the student that had just been completely lost by something in lecture that if given the right perspective, would have clicked. One thing that really helped me get through those worrisome and stressful thoughts was finding joy and beauty in the topics themselves, and the "art" I create hopefully conveys that. Good luck at CMU -- you're at the start of an incredible journey, and no matter where it takes you, it's a time to cherish.
@@mastershooter64 plus Im sure if he speaks with his professors they'd be more then happy to go in depth on their subject. People often fail to realize that formal education requires standards and that these are optimized for limited time and relative importance of topics.
@@NottoriousGG oh yes talking to professors too! like yes i get it there are some bad and unfriendly professors but most professors explicitly encourage you to ask them question and to talk to them, they absolutely love it whenever a student shows genuine interest in something they teach and would love to teach you more about that topic
Awesome explanation! Just a couple of remarks: - Related to the power method (for largest eigenvalue) that some people are commenting: when all entries of P are nonzero it can be proven that the limit P^n where n tends to infinity is a matrix where all rows are exactly equal to the asymptotic distribution vector. This property is a consequence of the Perron-Frobenius theorem applied to "strongly connected" graphs. "Strongly connected" means here that each "website" can always be reached from another in a (finite) number of steps, which I think is related to the aperiodic and irreducible property that you explain (I'm not sure). - Assuming P with nonzero entries, we can prove that the eigenvalue 1 of P^T has multiplicity 1 (both algebraic and geometric). With this, it can be deduced that only one extra equation is needed (sum of probs = 1) to get exactly one solution. I remark this because is not immediately clear that just with the extra equation the solution is unique. Also, this eigenvalue 1 happens to be the largest one, which is related to the fact that P^infinity does converge (again, power method without normalizing).
I thought I was ready for this level of math when I clicked on the video, but nope. That went all over my head though. Sounds very interesting, I must watch again someday
It is technically correct to call that the brute force method, but it's actually the power method, the simplest iterative eigenvalue/eigenvector finding algorithm. It would have been nice if you stated that the largest eigenvalue of the matrix is 1 which is why it works, and that actually lines up with the physical nature of the process quite well.
@@grekiki At it is a stochastic matrix, each row sums to 1. We can see that if we have the vector v = (1, 1, ..., 1)^T, then the elements of Pv will all be 1, so Pv=1v, and 1 is an eigenvalue with v as it's eigenvector. To show it is the largest eigenvalue possible, suppose there is a larger one, lambda. Because each row sums to 1, each component of Pv is a convex combination of the components of v. If v_k is the largest component of v, then the largest convex combination of components will be v_k by using the coefficients (0,...1,..., 0) where 1 is in the k'th position. This means the largest component of Pv will also be v_k. According to our assumption we had Pv=lambda*v>v component-wise as lambda>1. Therefore, v_k > v_k, which is a contradiction. Considering I have proved this myself before and have final exams around the corner, it's a little worrying I had to get this off stack exchange lol (math.stackexchange.com/questions/40320/proof-that-the-largest-eigenvalue-of-a-stochastic-matrix-is-1)
@@henryginn7490 That's a great summary. One note I'll mention to avoid misunderstanding is that v = (1, 1, ..., 1)^T is a right eigenvector of P (since Pv = v). Whereas for the stationary problem, we need the left eignenvector (pi * P = pi). That's why we can't just use (1, 1, ..., 1)^T as the stationary solution. And I think there's a proof that the left and right eigenvectors share the same set of eigenvalues, but I can't recall it.
@@grekiki Another way to think about it is that every vector can be seen as a weighted combination of the eigenvectors. So when you keep multiplying the probability matrix, the vector will morph towards the eigenvector with the largest eigenvalue, since it's weight will grow fastest and so it will take up more and more of the share of the vector. We want our initial distribution to end at a stationary distribution, which by our definition requires an eigenvalue of 1. So if 1 is not the largest eigenvalue, then over time the vector will shoot off to some other vector that's not the stationary distribution for the problem.
I randomly clicked on this out of pure curiosity and was completely stunned to see exactly the same math that I use everyday in matrix population models for modelling wildlife populations! The scalar lambda is considered the population growth rate and >1 the population is growing,
Seriously this is top notch content. As an engineer when I see a big problem I always think how in the world would I ever solve it. But seeing this guys and 3b1b videos I always feel that I should have developed a skill of analysis, observe the properties, solving smaller problems and try to predict generalizations repeat the same process again until you end with a satisfactory solution (Actually that is the process for most education but generally we do not go as far as these guys in understanding some concept). Never in the world would I have thought that I would sit and listen to a class about markov chains for 30 minutes. When I saw the text book there was a big diagram with some matrix multiplications and that's it. But now as I see this beauty unravel before my eyes I feel how better I could have learnt about Maths and CS when I was in college.
I am so happy that I finally grew enough in my knowledge to watch this video and understand 100% of it! I am so glad! Thakns Reducible and thanks Jesus that I am can learn so much in this ERA! This is now one of my favourite videos!
I want you to make more videos, please, especially on graphs and tough topics that students run away from (like Graphs, DP, trie, trees etc). Your one video is actually enough to imprint the logic to do questions and how to think of an approach. Now I can do them myself, this is the power of extraordinary but simple explanation, world-class animation for understanding, and Good voice.
Another amazing video! Love how well you explained Markov Chains while still getting into some of the more gritty details- that was beautifully done :)
Watching these videos with no knowledge of advanced algebra or whatever other math classes is fun because I just get to listen to the guy talk about funny magic with cool drawings to go with it
Timestamps (Powered by Merlin AI) 00:06 - PageRank uses Markov chains to rank web page importance based on user navigation probabilities. 02:33 - Markov chains can rank web pages based on user visit probabilities. 04:43 - Initial uniform distribution in Markov chains influences state transitions. 06:58 - Stationary distributions in Markov chains can be unique or multiple depending on their properties. 09:09 - Convergence in Markov chains leads to a unique stationary distribution. 11:28 - Understanding periodic and aperiodic Markov chains in web ranking. 13:44 - Methods to find stationary distribution in Markov chains. 15:58 - Eigenvalues and eigenvectors are crucial for finding stationary distributions in Markov chains. 18:05 - Brute force methods sometimes outperform complex algorithms in web networks. 19:59 - Addressing challenges in ranking web pages using Markov chains. 21:56 - Damping factor in PageRank ensures proper probability distribution. 23:56 - PageRank algorithms transform web networks into actionable rankings for search engines.
I used PageRank for social networks in my thesis and found in really interesting how the engineering solution was just "We don't care about the exact solution as long as the power method is deterministic"
Absolutely mind blowing. Thank you for your work on explaining things in an easy way. My take from the video: eigenvectors can be seen as the stacionary vectors of a transformation which represents a dynamic system changing over time.
Two easy refinements for the PageRank specific use case of these Markov chains : 1. there are no actual dead ends as the user can hit the back button making all links two-way. (Perhaps back button clicks, instead of counting as a new edge in the graph, could decrease the weight of the edge taken before the backup, or simply undoing the data that that user took the edge and the backup at all? 2. A jump isn't quite random: users don't want to jump to where they already are. So you can eliminate all links from any node to itself from the graph., and force a new node by setting the main diagonal of Random jump matrix R to 0 (R[ j, j ] = 0)
If you want to know where the connection to the Eigenvalue decomposition comes from, check out how to compute powers with matrix exponents and why that works. The same connection exists there, related to computing a limit of higher and higher powers of a matrix.
Excellent video and cool idea to present. As a follow-up suggestion, you could explore what happens with the Markov Chain when we have a continuous set of states instead of a discrete one. This ends with the Perron-Frobenius operator, and the Markov Chain is a discrete approximation obtained through the Ulam method. One application example is mapping the flow structure of the ocean's currents.
The assumption in the "simplified real world use case" (!?) that you mention at 2:03 stating that "...these web pages will often reference each other..." does not tally with my experience in the last, say, 20+ years. I would even be initially inclined to believe that, in 2022, it is demonstrably not the case. Has this aspect been the subject of some formal research pointing to that conclusion? Anyone? Thank you so much for uploading the video!
I could really use some help with the ergodicity property at 12:40. 1. Is there a practical test for the aperiodic and reducible properties? 2. W.r.t convergent behavior, what alternatives are considered _in principle,_ except for periodic oscillation and convergence (unique or not)? I can think only of chaotic dynamics and divergence to infinity. Since the transition matrix is a linear map in the distribution space, chaos is impossible (it would require both non-linearity and dissipation); since it's both locally and globally non-dilating w.r.t the distribution support, divergence is also impossible. I cannot understand what makes the statement about convergence of the difference equation non-trivial; it seems to be simply the only remaining possibility (uniqueness of the fixpoint _is_ non-obvious tho).
Fantastic video, thank you so much! I would love to see more videos on search algorithms, I find it endlessly fascinating how all these purely mathematical concepts combine to an algorithm that seems like it understands language and what is relevant to human beings.
Awesome video! One question: if the solution to periodic/irreducible chains is to use some value alpha to transition to a random state, wouldn't that result in that particular state being seen as ranking higher? Since a user will be "trapped" there for a few steps before breaking out with alpha probability?
Fantastic question! From a super high level perspective, PageRank is fundamentally recursive in nature. When a page is ranked highly, it's because usually, several other highly ranked pages link to it. So in the case of when we have an absorbing state/cycle, if many high ranking pages link to it, the pages should generally have a high rank anyways. So in that case, we are fine. And when the ranks of the web pages pointing to it settle to a lower rank, in general if we think about the user perspective, there won't be many users that actually end up being absorbed in that state, just because the neighboring states have low number of users reaching it. In this case, there might be a bit of artificial increase in density due to the absorbing nature, but as long as the alpha is tuned appropriately, studies have found it doesn't actually end up affecting rankings too much. The nice thing about this too is if you are worried this adversely affecting the ranking, you can crank up the alpha parameter to prevent too much of that absorption from happening. As another side note, I think there was some paper that I read that mentioned that absorbing states/cycles actually end up being a relatively small proportion of the web network topology, so that withstanding, this is unlikely to make a massive impact in the global rankings.
What I thought at 16:11: "Wait, isn't that...?" I never realize why to learn calculating eigenvalues and eigenvectors. It is a great example for it, thx.
Perfect video. I recently built a web crawler, and I want to use the data to build a search engine. I watched the video through, I don't quite get it all, but I'm just working on creating a list of important information to collect. I'm thinking I'll probably use pagerank first for my page sorting algorithm.
I’m only now realizing why learning eigenvectors in school felt like pulling teeth - the curriculum never mentioned any practical applications. Great Video
okay this is slightly abstracted because people don't "randomly" pick links to visit. They have intention. Also I can't tell you how many times the number 1 hit wasn't what I was actually looking for. Also your "user" started at state 2 so per the path set by the Markov chain's pathing the Stationary Distribution will always be state 2. If you change the pathing it will change. TL;DR the entire video should be taken with a grain of salt.
17:15 I'm glad to report that my university (UC Berkeley) intro Linear Algebra class for engineers uses this exact example about the stationary points of transition matrices as one of the first introductions to learning about eigenvalues/eigenvectors. I guess you and that course staff see eye to eye on this example.
Haha, where do you think I went to school? Back when I was in Berkeley, I learned this in EE16A without direct reference to Markov chains. This video has flavors of inspiration from that class and some of the more intense Markov chain theory in EE126.
Over a decade ago, google results would slowly become less relevant as you changed pages. Nowadays, there is a sudden drop in quality from page 1 to 2. Once upon a time if you copied 7 words that weren't a common phrase and put them in without quotes, odds are high it would end up on the first 7 pages but now often it won't appear if you don't quote it. Google certainly isn't using PageRank anymore. Is anyone else annoyed at how things were changed to make the pages beyond the first pointless? Like I can put in a few lines about a star trek episode and the first page is all about that episode and you'd think as I go down the search results I get an increasing number of non Star Trek results sprinkled in, but now if I don't use quotes odds are very high the second page will be filled with irrelevant results.
Well, you can also think of Markov chain as water flowing through the pipes. Or light oscillating through space. Or just any sort of particle/finite amount traversing different spatial cells.
Excellent video! I took an information retrieval course last semester and this was kind of a refresher to me, I somehow have forgotten about half the stuff from it ^^ Btw, can I recommend a topic for your next videos? I think posit number system is a good topic, I haven't seen many explanations of it on UA-cam.
Hi @Reducible, thanks for the video, I'm already 3 minutes in... I want to share a feedback on sound processing. Looks like the volume of the background music is reduced automatically when the voice-over track activated (it's called "ducking"). The music loudness sways are excessive, and when the narrator pauses, music blows through perceivably as louder as the voice was, only to be subdued in half a second, time and time again. I noticed that even my muscles are involuntary tightening up as soon as the narration pauses, as if I'm expecting a blow from the next music blast. If you guys aren't mixing manually, maybe you could consider increasing the timeout in the effect processor, and also match background music loudness ceiling to ~6dB below voice, so it really stays in the background?
Really interesting video! A picture quality note: I see some flashing of grey visual artifacts in the black background on the left side when the diagrams transition up until around minute 3.
Honestly, this was one I couldn't quite figure out. In Final Cut Pro, I did not see these artifacts, but when I uploaded to UA-cam, I noticed them. Pretty strange, UA-cam compression should not result in artifacts like that.
@@Reducible yeah, I wish I could help, but I don't know anything about video authoring software or UA-cam recompression. Thanks for the great content though!
14:00 immediately i think: let A = the matrix. let v = the vector. at each step: A = A * A; v = v * A; compare v to the previous v. This exponentially increases the number of matrix steps until it performs so many steps so fast that it MUST converge
re-thinking it: in order to perform A = A * A ONE TIME, you would have to perform as many calculations of a * M for vectors a as the matrix's size, which means simply performing v * M (for example) 1000 times is already 1001 times more efficient than performing M = M * M and then v * M 500 times, which itself is 1001 times more efficient than performing M = M * M again and then v * M 250 times; etc. This method is only good for very small matrices (< 10) that require more than n^2 steps to converge (which they never do, except when n < 4), so this method is excellent for 2x2 matrices, but for n < 5 it's already easy to solve for the eigenvectors in constant time, since explicit formulas exist for all n < 5
I wish mathematicians constructed computationally coherent formal structures/transforms bc what if there is exists a transform that reduces the computation to a couple operations that can beat monte carlo methods
@pyropulse I got that part. Maybe I phrased it weirdly. Like how would a sink-node look (in terms of visitor-percentage) when you add a probabilty to leave that node.
Using my ultra instincts I have watched the entirity of this half hour long video and have concluded that it is GOOD.
Lol
I concluded it in the first 15 secs
I concluded that it is magic
@@karanrao116 hello person with the same defalt profile pic as me
Having recently taken Linear Algebra, I got so excited at around 15:55 when I realized he was about to talk about eigenvectors! What a cool connection!
Or even earlier ca. 6:50 regarding stationary states -- admittedly it's been years since I took LA myself, but I remember this point fondly from e.g. 3b1b's videos: a stationary point, or angle or vector or what have you, is exactly what the eigenvector&value tells you. :)
I'd like to learn linear algebra
Wow excellent idea and video quality. PageRank hits on a lot of hugely important topics.. Monte Carlo.. finding stationary distributions.. creating an internet empire! Seriously top notch.
Love seeing the technical details too. Reminds me there is demand for that on UA-cam
Just finished my freshman year at UC Berkeley as a CS student, and while watching this video my “Oski senses” were tingling. having found out 2 of my favorite UA-camrs were Cal alumni, I dug around for a bit and found out you too went to UC Berkeley and even taught CS61A.
Keep making great videos and providing intuitive explanations to really complex problems.
Go Bears!
Ha, nice investigation! A lot of the inspiration for the approach to this video was actually from EE126: Probability and Random Processes.
Go bears!!
gob ears
I think it's the first time I've seen a video showing the movement of a Markov chain so dynamically. It really helps to get a lot of intuition. thank you!
Yeah, when I was looking at other videos on the topic, I was rather surprised to see that no one had shown that way of thinking about it. For me, it just felt like an absolute necessity to understand it!
@@Reducible which software do you use to make such wonderful videos?
@@DevKumar-dg2vo I believe he uses manim which is the same animation engine that 3b1b uses for his videos :)
There actually isn't any computational difference between the "System of Equations" method and the "Eigenvalues/Eigenvectors" method.
Given that you already know that the matrix has an eigenvalue of 1, the eigenvector calculation boils down to the exact same system of equations. Eigenvectors are just an interesting way of framing the problem.
19:21 I think the "brute force" method can also be interpreted as the power method for calculating the largest eigenvalue/eigenvector. Since you already mentioned beforehand that the stationary solution is unique for non-periodic irreducible Markov chains, the corresponding matrix would also have only one unique eigenvalue (1) and eigenvector (stationary solution) pair.
Overall, great explanation of the algorithm!
Yup, that's indeed a more formal and accurate way to put it, but thinking about it from the perspective of someone who has just started learning Markov chains, I really do feel that it is, in a sense, the first and most simple thought of what you might do to compute the stationary distribution. That thought paired with the fact that it's generally the most computationally practical method to solve for it makes it super satisfying.
@@Reducible Agreed. Monte Carlo simulations also came to mind, though in this case we are guaranteed that just one simulation is required to get the answer.
I’m an incoming freshman at Carnegie Mellon University who is well excited but also somewhat intimidated by the vast field of computer science. I just want to say that this video is beyond amazing. You are doing God’s work.
Elegant MANIM visualization, lucid explanations, and the structure of the video all comes together and creates this powerful delivery of clarity.
Although your name is Reducible, I don’t want to reduce you down to “The Grant Sanderson of Computer Science.” You’re much more than that. A great explanation is indistinguishable from art and you’re an excellent artist. Your videos inspire me in many ways and fuel my passion and interest in CS. I believe I’m not the only one and there are millions more to be inspired. So please keep it up. We appreciate your work. Thank you.
One of the most heart-warming comments I've received Shawn! Thank you so much! As cheesy as it does sound, I really do try to create something that feels like art to me. It's amazing to hear that others interpret it as such.
In some ways, I think of the videos I make as the explanations I wish I had when I was in your position right now, an incoming student that was excited, but also worried about thoughts of whether I would be able to learn and understand the complex ideas in the field. Or the student that had just been completely lost by something in lecture that if given the right perspective, would have clicked. One thing that really helped me get through those worrisome and stressful thoughts was finding joy and beauty in the topics themselves, and the "art" I create hopefully conveys that.
Good luck at CMU -- you're at the start of an incredible journey, and no matter where it takes you, it's a time to cherish.
@@Reducible You might want to make a video about the "string art algorithm".
Clearer than MIT lecture on Markov chain.
As a computer science major in college, this amazing channel goes so much more in depth than my courses could ever hope. Thank you so much! :)
Lol then just get a graduate level book on the topic
@@mastershooter64 plus Im sure if he speaks with his professors they'd be more then happy to go in depth on their subject.
People often fail to realize that formal education requires standards and that these are optimized for limited time and relative importance of topics.
@@NottoriousGG oh yes talking to professors too! like yes i get it there are some bad and unfriendly professors but most professors explicitly encourage you to ask them question and to talk to them, they absolutely love it whenever a student shows genuine interest in something they teach and would love to teach you more about that topic
I did my masters' project on MCMC algorithms and you have beautifully explained everything I learned in 6 months.
Awesome explanation!
Just a couple of remarks:
- Related to the power method (for largest eigenvalue) that some people are commenting: when all entries of P are nonzero it can be proven that the limit P^n where n tends to infinity is a matrix where all rows are exactly equal to the asymptotic distribution vector. This property is a consequence of the Perron-Frobenius theorem applied to "strongly connected" graphs.
"Strongly connected" means here that each "website" can always be reached from another in a (finite) number of steps, which I think is related to the aperiodic and irreducible property that you explain (I'm not sure).
- Assuming P with nonzero entries, we can prove that the eigenvalue 1 of P^T has multiplicity 1 (both algebraic and geometric). With this, it can be deduced that only one extra equation is needed (sum of probs = 1) to get exactly one solution. I remark this because is not immediately clear that just with the extra equation the solution is unique.
Also, this eigenvalue 1 happens to be the largest one, which is related to the fact that P^infinity does converge (again, power method without normalizing).
0:48 was a cooooool intro!
I thought I was ready for this level of math when I clicked on the video, but nope. That went all over my head though. Sounds very interesting, I must watch again someday
It is technically correct to call that the brute force method, but it's actually the power method, the simplest iterative eigenvalue/eigenvector finding algorithm. It would have been nice if you stated that the largest eigenvalue of the matrix is 1 which is why it works, and that actually lines up with the physical nature of the process quite well.
Wait why is the largest eigen value 1?
@@grekiki At it is a stochastic matrix, each row sums to 1. We can see that if we have the vector v = (1, 1, ..., 1)^T, then the elements of Pv will all be 1, so Pv=1v, and 1 is an eigenvalue with v as it's eigenvector.
To show it is the largest eigenvalue possible, suppose there is a larger one, lambda. Because each row sums to 1, each component of Pv is a convex combination of the components of v. If v_k is the largest component of v, then the largest convex combination of components will be v_k by using the coefficients (0,...1,..., 0) where 1 is in the k'th position. This means the largest component of Pv will also be v_k. According to our assumption we had Pv=lambda*v>v component-wise as lambda>1. Therefore, v_k > v_k, which is a contradiction.
Considering I have proved this myself before and have final exams around the corner, it's a little worrying I had to get this off stack exchange lol (math.stackexchange.com/questions/40320/proof-that-the-largest-eigenvalue-of-a-stochastic-matrix-is-1)
@@henryginn7490 Thanks for summarizing still not extremely comfortable with linear algebra here either
@@henryginn7490 That's a great summary. One note I'll mention to avoid misunderstanding is that v = (1, 1, ..., 1)^T is a right eigenvector of P (since Pv = v). Whereas for the stationary problem, we need the left eignenvector (pi * P = pi). That's why we can't just use (1, 1, ..., 1)^T as the stationary solution. And I think there's a proof that the left and right eigenvectors share the same set of eigenvalues, but I can't recall it.
@@grekiki Another way to think about it is that every vector can be seen as a weighted combination of the eigenvectors. So when you keep multiplying the probability matrix, the vector will morph towards the eigenvector with the largest eigenvalue, since it's weight will grow fastest and so it will take up more and more of the share of the vector.
We want our initial distribution to end at a stationary distribution, which by our definition requires an eigenvalue of 1. So if 1 is not the largest eigenvalue, then over time the vector will shoot off to some other vector that's not the stationary distribution for the problem.
This is EASILY one of the best videos I've ever watched on UA-cam
I remeber the PageRank being shown in one of my Linear Algebra classes at my University like ~16 years ago. Thanks for the flashback.
I studied Linear Algebra last semester and it was wonderful to see how what I learned is actually applied on such a large scale!
I randomly clicked on this out of pure curiosity and was completely stunned to see exactly the same math that I use everyday in matrix population models for modelling wildlife populations! The scalar lambda is considered the population growth rate and >1 the population is growing,
Thank you for making Markov Chains less scary to learn about and making all of us a little bit smarter.
Seriously this is top notch content. As an engineer when I see a big problem I always think how in the world would I ever solve it. But seeing this guys and 3b1b videos I always feel that I should have developed a skill of analysis, observe the properties, solving smaller problems and try to predict generalizations repeat the same process again until you end with a satisfactory solution (Actually that is the process for most education but generally we do not go as far as these guys in understanding some concept). Never in the world would I have thought that I would sit and listen to a class about markov chains for 30 minutes. When I saw the text book there was a big diagram with some matrix multiplications and that's it. But now as I see this beauty unravel before my eyes I feel how better I could have learnt about Maths and CS when I was in college.
16:36 This connection here.
This is still the "click" moment that I treasure the most during my times at uni.
You have nicely concluded Markov chains, I wasn't expecting this much intuition from youTube.
I am so happy that I finally grew enough in my knowledge to watch this video and understand 100% of it! I am so glad! Thakns Reducible and thanks Jesus that I am can learn so much in this ERA! This is now one of my favourite videos!
The whole world holds on such individuals like you!
Outstanding video dude. I've been self-studying linear algebra and its very cool to see the concepts applied.
glad to see you're still alive. As per usual excellent work.
Just discovered you today and binged lots of your work. Super excited to see your library grow! Here's to many more videos (:
I love this channel so much! Happy to say I've been here since the dynamic array video :)
I want you to make more videos, please, especially on graphs and tough topics that students run away from (like Graphs, DP, trie, trees etc). Your one video is actually enough to imprint the logic to do questions and how to think of an approach. Now I can do them myself, this is the power of extraordinary but simple explanation, world-class animation for understanding, and Good voice.
I love how the first couple of sentences would just be absolute nonsense to someone 100 years ago.
Another amazing video! Love how well you explained Markov Chains while still getting into some of the more gritty details- that was beautifully done :)
Just in time for my exam tomorrow including Markov chains thanks for the great initiative explaination!
I swear you could teach TAX LAW and make it interesting. You are a born teacher. Thanks for another amazing video!
Watching these videos with no knowledge of advanced algebra or whatever other math classes is fun because I just get to listen to the guy talk about funny magic with cool drawings to go with it
yo that new intro is sick
Timestamps (Powered by Merlin AI)
00:06 - PageRank uses Markov chains to rank web page importance based on user navigation probabilities.
02:33 - Markov chains can rank web pages based on user visit probabilities.
04:43 - Initial uniform distribution in Markov chains influences state transitions.
06:58 - Stationary distributions in Markov chains can be unique or multiple depending on their properties.
09:09 - Convergence in Markov chains leads to a unique stationary distribution.
11:28 - Understanding periodic and aperiodic Markov chains in web ranking.
13:44 - Methods to find stationary distribution in Markov chains.
15:58 - Eigenvalues and eigenvectors are crucial for finding stationary distributions in Markov chains.
18:05 - Brute force methods sometimes outperform complex algorithms in web networks.
19:59 - Addressing challenges in ranking web pages using Markov chains.
21:56 - Damping factor in PageRank ensures proper probability distribution.
23:56 - PageRank algorithms transform web networks into actionable rankings for search engines.
You keep uploading videos exactly when we have the topics at our uni
I used PageRank for social networks in my thesis and found in really interesting how the engineering solution was just "We don't care about the exact solution as long as the power method is deterministic"
Also love the new intro
You don‘t only explain „what“ is the mathematic, but also explain „why“ is the mathematic
Absolutely mind blowing. Thank you for your work on explaining things in an easy way. My take from the video: eigenvectors can be seen as the stacionary vectors of a transformation which represents a dynamic system changing over time.
This was an assignment question for my intro linear algebra class. This video takes it a lot deeper
Two easy refinements for the PageRank specific use case of these Markov chains :
1. there are no actual dead ends as the user can hit the back button making all links two-way. (Perhaps back button clicks, instead of counting as a new edge in the graph, could decrease the weight of the edge taken before the backup, or simply undoing the data that that user took the edge and the backup at all?
2. A jump isn't quite random: users don't want to jump to where they already are. So you can eliminate all links from any node to itself from the graph., and force a new node by setting the main diagonal of Random jump matrix R to 0 (R[ j, j ] = 0)
Read a paper a month back and chad Reducible walks in to better describe it
Really an excellent video with incredible visuals.
Again, my mind was blown
Really well explained and now I want to know more, time to dive in the rabbit hole
Wow! finally someone explaining those boaring things off my syllabus in a way they should be.👍Hope to see poisson process and queing chains next.
If you want to know where the connection to the Eigenvalue decomposition comes from, check out how to compute powers with matrix exponents and why that works. The same connection exists there, related to computing a limit of higher and higher powers of a matrix.
Your videos always blow me away. What an amazing explanation.
Man im doing my thesis. This is an excellent explaination
Excellent video and cool idea to present. As a follow-up suggestion, you could explore what happens with the Markov Chain when we have a continuous set of states instead of a discrete one. This ends with the Perron-Frobenius operator, and the Markov Chain is a discrete approximation obtained through the Ulam method. One application example is mapping the flow structure of the ocean's currents.
Ooh, that looks super interesting -- I'll definitely have to read up on that, but sounds like a great topic!
Great video and explaination!!
I love the Video, I love the new intro! keep up the amazing work!
Just so you know, I doooooo have notifications on sir.
The assumption in the "simplified real world use case" (!?) that you mention at 2:03 stating that "...these web pages will often reference each other..." does not tally with my experience in the last, say, 20+ years. I would even be initially inclined to believe that, in 2022, it is demonstrably not the case. Has this aspect been the subject of some formal research pointing to that conclusion? Anyone? Thank you so much for uploading the video!
I could really use some help with the ergodicity property at 12:40. 1. Is there a practical test for the aperiodic and reducible properties? 2. W.r.t convergent behavior, what alternatives are considered _in principle,_ except for periodic oscillation and convergence (unique or not)? I can think only of chaotic dynamics and divergence to infinity. Since the transition matrix is a linear map in the distribution space, chaos is impossible (it would require both non-linearity and dissipation); since it's both locally and globally non-dilating w.r.t the distribution support, divergence is also impossible. I cannot understand what makes the statement about convergence of the difference equation non-trivial; it seems to be simply the only remaining possibility (uniqueness of the fixpoint _is_ non-obvious tho).
Maybe that way of thinking wasn't that obvious back then, graph theory and markov processes were born in the last century if i'm correct anyway
Really great videos loved them wondering if you could increase the frequency of the uploads.
Ah yes, another video from Reducible, another piece of quality content
Fantastic video, thank you so much! I would love to see more videos on search algorithms, I find it endlessly fascinating how all these purely mathematical concepts combine to an algorithm that seems like it understands language and what is relevant to human beings.
8:42 "Roll Credits." Eyy!
Could've released this video a couple months earlier and saved me a ton of headache brother
markov chains are such a rich topic for explain videos
I'd love to see more of them around :(
Hey man!! Great video.... really appreciate your effort!
Awesome video! One question: if the solution to periodic/irreducible chains is to use some value alpha to transition to a random state, wouldn't that result in that particular state being seen as ranking higher? Since a user will be "trapped" there for a few steps before breaking out with alpha probability?
Fantastic question! From a super high level perspective, PageRank is fundamentally recursive in nature. When a page is ranked highly, it's because usually, several other highly ranked pages link to it. So in the case of when we have an absorbing state/cycle, if many high ranking pages link to it, the pages should generally have a high rank anyways. So in that case, we are fine.
And when the ranks of the web pages pointing to it settle to a lower rank, in general if we think about the user perspective, there won't be many users that actually end up being absorbed in that state, just because the neighboring states have low number of users reaching it. In this case, there might be a bit of artificial increase in density due to the absorbing nature, but as long as the alpha is tuned appropriately, studies have found it doesn't actually end up affecting rankings too much.
The nice thing about this too is if you are worried this adversely affecting the ranking, you can crank up the alpha parameter to prevent too much of that absorption from happening. As another side note, I think there was some paper that I read that mentioned that absorbing states/cycles actually end up being a relatively small proportion of the web network topology, so that withstanding, this is unlikely to make a massive impact in the global rankings.
What I thought at 16:11: "Wait, isn't that...?"
I never realize why to learn calculating eigenvalues and eigenvectors. It is a great example for it, thx.
What the heck. I was just reading Sergey's 1998 paper on PageRank (literally hours ago)
Amazing Video
Perfect video. I recently built a web crawler, and I want to use the data to build a search engine. I watched the video through, I don't quite get it all, but I'm just working on creating a list of important information to collect. I'm thinking I'll probably use pagerank first for my page sorting algorithm.
I’m only now realizing why learning eigenvectors in school felt like pulling teeth - the curriculum never mentioned any practical applications. Great Video
okay this is slightly abstracted because people don't "randomly" pick links to visit. They have intention. Also I can't tell you how many times the number 1 hit wasn't what I was actually looking for.
Also your "user" started at state 2 so per the path set by the Markov chain's pathing the Stationary Distribution will always be state 2. If you change the pathing it will change.
TL;DR the entire video should be taken with a grain of salt.
3b1b levels of good... maybe better
“Long term behaviour of systems is tied to eigenvalues/eigenvectors“ **war flashbacks to my control systems class in uni**
So amazing. Thank you for this!
17:15 I'm glad to report that my university (UC Berkeley) intro Linear Algebra class for engineers uses this exact example about the stationary points of transition matrices as one of the first introductions to learning about eigenvalues/eigenvectors. I guess you and that course staff see eye to eye on this example.
Haha, where do you think I went to school?
Back when I was in Berkeley, I learned this in EE16A without direct reference to Markov chains. This video has flavors of inspiration from that class and some of the more intense Markov chain theory in EE126.
@@Reducible this whole video gave me flashbacks to EE16A homeworks lol
Over a decade ago, google results would slowly become less relevant as you changed pages. Nowadays, there is a sudden drop in quality from page 1 to 2. Once upon a time if you copied 7 words that weren't a common phrase and put them in without quotes, odds are high it would end up on the first 7 pages but now often it won't appear if you don't quote it. Google certainly isn't using PageRank anymore. Is anyone else annoyed at how things were changed to make the pages beyond the first pointless? Like I can put in a few lines about a star trek episode and the first page is all about that episode and you'd think as I go down the search results I get an increasing number of non Star Trek results sprinkled in, but now if I don't use quotes odds are very high the second page will be filled with irrelevant results.
Excellent video!
Legendary. Thank you!
Great content and work. Thank you.
Well, you can also think of Markov chain as water flowing through the pipes. Or light oscillating through space. Or just any sort of particle/finite amount traversing different spatial cells.
I was not expecting my mind to be blow today
We usually define the so-called markov chain a directed graph, because it is better assumed as a graph for further computations.
Your videos are so good
Excellent video
I gotta say man, procedurally generated videos always look cool when it's about CS, AI, or math.
Thanks for this video
22:00 this kind of reminds me of adding a prior when predicting sports team's true strength
Excellent video! I took an information retrieval course last semester and this was kind of a refresher to me, I somehow have forgotten about half the stuff from it ^^
Btw, can I recommend a topic for your next videos? I think posit number system is a good topic, I haven't seen many explanations of it on UA-cam.
kinda insane that I get this video in my recommended after missing my class lecture about pagerank
Most Linear Algebra doesn't mention Probability at all. I think it would be best to have classes showing connections between different math topics
Hi @Reducible, thanks for the video, I'm already 3 minutes in... I want to share a feedback on sound processing. Looks like the volume of the background music is reduced automatically when the voice-over track activated (it's called "ducking"). The music loudness sways are excessive, and when the narrator pauses, music blows through perceivably as louder as the voice was, only to be subdued in half a second, time and time again. I noticed that even my muscles are involuntary tightening up as soon as the narration pauses, as if I'm expecting a blow from the next music blast. If you guys aren't mixing manually, maybe you could consider increasing the timeout in the effect processor, and also match background music loudness ceiling to ~6dB below voice, so it really stays in the background?
Really interesting video! A picture quality note: I see some flashing of grey visual artifacts in the black background on the left side when the diagrams transition up until around minute 3.
Honestly, this was one I couldn't quite figure out. In Final Cut Pro, I did not see these artifacts, but when I uploaded to UA-cam, I noticed them. Pretty strange, UA-cam compression should not result in artifacts like that.
@@Reducible yeah, I wish I could help, but I don't know anything about video authoring software or UA-cam recompression. Thanks for the great content though!
The 1blue3brown of computer science
14:00 immediately i think: let A = the matrix. let v = the vector. at each step: A = A * A; v = v * A; compare v to the previous v. This exponentially increases the number of matrix steps until it performs so many steps so fast that it MUST converge
re-thinking it: in order to perform A = A * A ONE TIME, you would have to perform as many calculations of a * M for vectors a as the matrix's size, which means simply performing v * M (for example) 1000 times is already 1001 times more efficient than performing M = M * M and then v * M 500 times, which itself is 1001 times more efficient than performing M = M * M again and then v * M 250 times; etc. This method is only good for very small matrices (< 10) that require more than n^2 steps to converge (which they never do, except when n < 4), so this method is excellent for 2x2 matrices, but for n < 5 it's already easy to solve for the eigenvectors in constant time, since explicit formulas exist for all n < 5
I wish mathematicians constructed computationally coherent formal structures/transforms bc what if there is exists a transform that reduces the computation to a couple operations that can beat monte carlo methods
Here before this goes viral lol
An Eigenvalue decomposition of the transition matrix will tell you everything: reducibility, periodicity and all stationary distributions.
lemme just download this before google decides to delete it
Would have loved a visualization of what happens to the stationary distributions after adding the alpha term/step
@pyropulse I got that part. Maybe I phrased it weirdly. Like how would a sink-node look (in terms of visitor-percentage) when you add a probabilty to leave that node.
I love how eigenvectors pop up...