Wow, this really got a lot of attention... thanks everyone!! I'd normally engage with the comments directly but I'm about as efficient as the naïve Fibonacci algorithm at that sort of thing... since there are some trends in the comments, I figured I'd at least address the most common questions/concerns that I came across: 1. *Runtime of the "linear" algorithm.* I swept the detail under the rug (didn't seem like the right time, but hindsight is far acuity), but it's explained a bit more in the definitely-not-hard-to-find greyed-out paragraph at 10:53. Briefly, the linear algorithm is O(n^2), where n is the *index* of the Fibonacci sequence, and the digit-length of the nth Fibonacci number has O(n) digits thanks to Binet's formula! 2. *Colour palette.* Classic "works on my machine" moment; the colours looked a *lot* better on my computer before it got uploaded to UA-cam. Sorry about that! I won't be using the same colour scheme in future videos; lesson learned. You can still read the actual source code at github.com/GSheaf/Fibsonicci 3. *Choice of number base.* Speaking of source code, there are some comments regarding my choice of using base-256 for my big integers. Just to clarify, I only made this restriction when dealing with Fourier transforms, with the justification being that double-precision floats wouldn't be able to handle larger bases. For the other, simpler algorithms, I used larger bases! This is summarised in the README for the source code. Most algorithms use base-2^32 (so that I could cast to 64-bits to do digit-wise products), and the "linear" algorithm uses base-2^64. 4. *Avoiding precision errors.* Many people mentioned the Number-Theoretic Transform as a correction to the FFT that doesn't suffer from the precision error. This would be a natural next step, at the cost of having to figure out a way of getting a sufficiently large prime p that is equal to 1 modulo the sequences being convolved (a headache I didn't want to get into after 25min of video). Alternatively, you can also implement "adjustable fixed-precision floats" to account for this. 5. Binet's formula doesn't render this problem "solved": how do you compute phi^n? 6. Memoisation is *not* a typo, and I'll die on this hill. Anyway, definitely enjoying reading all of the comments here!
Hello! I really like this video, but some of the concepts are beyond what I’ve learned. What would you recommend to first look at to get a greater understanding of the video?
@@SheafificationOfG are you sure you can't go bigger than base 256 for the fourier algo? Doubles have 51 bit mantissa's. 256 is 8 bits. Memoisation is correct
@somatia350 I think it'll depend on what concepts you want to go in more detail with, but a safe bet is probably The Book on algorithms (I.e., Cormen, Leiserson, Rivest, Stein). Not sure what it says regarding Fourier, but the book is excellent for giving you all the foundations in this kind of stuff, and you can build from there pretty easily.
@tolkienfan1972 The reason for reducing the base to 2^8 is to ensure that the mantissa is large enough to hold the (implicit) *sums* of digits in the underlying convolution. Decreasing the digit size allows for larger sums. If I were to use 16-bit digits, I might see FFT break down after only the 47000th Fibonacci number, based on the same rough calculation in the video (granted this might be too pessimistic of a bound). On the other hand, if I used 4-bit numbers, I would be able to take the computations much further (past the 48 millionth Fibonacci number)... if I could compute that far.
@@SheafificationOfG if you add n 16bit (16 to hold the products) numbers you need ceil(16+lg(n)) bits. 51 - 16 is 35. That's about 32 billion limbs. Did I make a mistake?
Let's all just appreciate that this man used DFT, FFT, Binet formula, Karatsuba's multiplication, Linear algebra, Complex numbers and Galois groups just to compute some Fibonacci numbers, whereas SIMD just left the chat
@@asdfghyterTrue, but that doesn't mean it's going to be slower. The problem statement is how many can be calculated in 1 second, not which algorithm had the most efficient computation logic. Although that is what the video is about. Technically a SIMD or GPU solution could be faster even with a naïve implementation
@@pumpkinhead002 not with the most naive solution, no. that would be very impossible, since it's exponential. with any of the better algorithms it could indeed be faster though, as long as it's at least polynomial some of the last steps only gave improvements by a factor, so those might very well be surpassed by a SIMD or GPU implementation though, it's not completely obvious how to parallelize this problem, as the key part of the definition is a recursion. the main component that is parallel is the basic arithmetic operations, which are basically inherently SIMD already, but SIMD might be used to make bigint implementations faster. (and of course the matrix operations, which i forgot when first writing this)
@@asdfghyter The fact that the most naive solutions are exponential still does not mean that they can't be faster during one second after a constant speedup from SIMD, it is in this case unlikely given the large problem size, but not impossible. You don't understand asymptotical performance measures.
This video is extremely refreshing! I've seen people claim that you can compute Fibonacci numbers in O(log n) time, because they saying that arithmetic operations take constant time, and there are only O(log n) operations. This approximation is often useful, but in the Fibonacci case, you cannot discount the added cost of adding/multiplying large integers. The way you showed the actual runtime increasing with graphs really sells this point.
@@Avighnawhich is basically useless considering Fibonacci has linear memory requirements. Its similar in nature to the fallacy of O(1) hashmaps (maximum memory access speed is O(n^(1/3) for n bits, which has practical effects in the case of the various cache levels of a cpu and ram)
@@palmberry5576 It is a common task in competitive programming. And besides, why is knowing the 2 millionth Fibonacci number not under a modulus useful? It’s all theoretical anyway.
@@tolkienfan1972 Certified "Mathematicians don't know how to program" moment. 1:55 You are going to run out of memory before your slightly better scaling algorithm catches up in speed to one that was actually well written to utilize the computer well.
@@tolkienfan1972 You'd be surprised how much speed-up and overall efficiency you can gain when you're conscious of things like memory allocations, cache locality, and the hardware in general (that the program must run on in the end after all). The linear speed-up tends to be several orders of magnitude. Furthermore, there are absolutely cases where low-level details can affect the time complexity itself. On the flip side, even when the algorithm has a lower order, the constants that are left out of in the Big O notation can make it absolutely much worse for any inputs of interest. And I'd argue that actually implementing the thing and analysing it, can absolutely help with coming up with a better algorithm, especially if you find out all the redundant things a given program is doing.
This reminds me of the time our discrete math course had a quiz that asked us to "compute f_300" and I, naive and brave, unironically tried doing it by hand. I was and still am pissed off to an unprecedented degree that "compute" apparently meant "Express the general term of the sequence as a linear combination of exponentials and substitute 300 into the free variable without doing any reduction"-they could have just told us to do that yk
@@landsgevaer Proving it and using it was my original strategy but the computation part was left half finished as I handed the paper in. I also attempted it after the fact on a blank sheet and figured it would have taken far too much time at that point anyway. But after this whole ordeal I am now less naive than I used to be regarding how computations scale as numbers grow
Currently taking a discrete math shmmer course, our textbook linearized the recursive Fibonacci formula, it looked very complicated, can’t imagine doing that on a test lol
When you got in the lecture to 'an F for you, and F''s for your five closest friends', UA-cam cut to a commercial beginning 'An actual letter to [advertiser]'. I stuck around in the ad far too long waiting for your punchline, when I realized that it wasn't your joke, it was a practical joke from The Algorithm. I'm a computer scientist. I'm familiar with Schönhage-Straßen, and with Moler and Van Loan's 'nineteen dubious ways to compute the matrix exponential,' but your discussion is hilarious, in the flavour of Carl Linderholm's 'Mathematics Made Difficult'. Bravo!
If you want to calculate the multiplication of very large integers as fast as possible, use the GMP library. The authors have done a huge amount of work to make it as efficient as possible.
For the matrix method, you can have a 4x4 matrix derived from F(n), F(n-1), F(n-2), F(n-3). This is nice because you can express those with coefficients as powers of 2, which means you can use SIMD and process multiple numbers at the same time. You could even reasonably do this for an 8x8 matrix and get to use AVX2, but it's a tradeoff. Asymptotically nothing would change, but having a 4-8x speedup because of SIMD sure is helpful in real life. This is getting deep into the territory of optimizing big numbers (and at that point, why handroll your implementation instead of wrapping GMP?)
I'm not sure SIMD would be helpful, moving data in and out of registers has quite an overhead. But the Number class should have been 2^32 based, that's basically just free speedup, because uint8_t is still done on the same ALU unit.
SIMD would be a bit more work to set up since the number class and its arithmetic is nontrivial, but those are ideas (and yeah, reinventing the wheel is definitely never worth it unless you think you're smarter than the many people who developed GMP haha but where's the fun in that). Also @janisir4529, while I used uint8_t for my FFT-based multiplications (for the sake of controlling the errors), everything else was done with uint32_t (the number class is actually templated)!
Using Schonhage-Strassen multiplication (basically fft over a finite field, so no using doubles and being limited by precision) and matrix exponentiation by squaring, I get the 67108864th fibb number in 1.02s. Doing the same using arithmatic over the number field gets the same in 993ms, basically not improving. Pari somehow computes the 200000000th number in under a second. Would be interesting to show how this was achieved
@@spacebusdriver If we know the spec of each processor and memory, you could probably make some kind of generic average based off the performance stats, ie your processer is 2GHz and you get the 1,000,000th number, and i have a 3GHz processor and get the 1,800,000th number, we could scale these down to 1GHz on each machine, we would find you get 500,000th number, and I get 600,000th number, thus my solution is better. In saying that, it wouldn't be that accurate, but it might give a decent estimation of performance comparison. True performance equivalence is to have some standardized machine that people could have or test it on and run it. Could be a nifty website idea, you slap in your code and see it performs compared to others.
Your channel is truly one of the best math channels around right now. I know I know, opinions might vary depending on whats your level of math, but I can say that it perfect for me. And you do not lie to your audience that everything is EASY and then hit them with axioms they are supposed to absorb in 5sec. You know your math and you are not afraid to show it.
Do template meta programming. Technically it just prints out a number, the compilation taking a very long time doesn't matter, as the task was ill defined.
The real answer is that you hardcore the largest number into the binary. Then the 1s time limit is mostly spend reading a number from disk and printing it again.
Man I loved this video! Though I didn't understand much past the linear algebra, it was still interesting to see your analysis of the runtime and the possible solutions to improve it. Kudos!
This video is packed with easter eggs that are barely visible on top of a rapid if smooth delivery. I have not laughed so hard at a mathy video ever. Nor rewound so many times. Straight talk from a meme master.
Great video, had a great time watching it. Looking forward to your next one! One peice of feedback though, dark blue text on a black background is very hard to read due to the contrast. It was difficult to read your code sometimes.
Yeah, the video definitely looked better on my computer ("It runs on my computer" moment). I'll be changing my choice of colours for code in the future for sure, thanks!
As soon as you started explaining digital multiplication, I immediately realized you were going the Karatsuba route A few months ago I started working on an arbitrary precision integer library (for Fun and Profit™), and spent a whole bunch of time benchmarking exactly where the crossover should be to switch back to doing traditional multiplication vs the crazy allocation cost of doing recursive Karatsuba
Loved this video! I'm a math & cs student, I learned a lot from watching how you connected all of these different areas in math/cs to solve a deceptively simple sounding problem! Please do more stuff like this, it's invaluable how you seamlessly showcased the usage of linear algebra, complexity analysis, complex numbers, Fourier transforms, bit/byte representation of the numbers, optimizing multiplications (and anything else I missed) for optimizing this. I've read and studied these concepts but it was never made THIS clear to me how they could be utilized in practice in such a cohesive video. If you read this, I'm curious how long did it take you to optimize this and get all the material for the video?
Really appreciate the comment! I kinda threw the code together a month ago, and then did some major refactoring halfway through before making it public. I kinda kept things honest (except for the bit-reversal in the FFT implementation), so I tried not to stress myself out with fine-tuning my optimisations, and I was already aware of the algorithms I was going to use before I put the video together.
This is essentially a speedrun in computer science. Well done! Imagine having this class on the first day of computer science and then learning all the details about this masterpiece.
Even knowing everything in the video already, the humour was quite good and I was thoroughly entertained, and seeing the runtime graphs was pleasing. Another banger from my favourite sheaf!
You don't need to compute evey Fibonacci number, only the largest - so your exponential matrix multiplication can just keep doubling for the entire second to get something huge
I haven't been as stimulated and entertained and educated by a video as by this in the past 6 years. I felt like a kid again having newly discover numberphile and minutephysics on UA-cam. love it. thank you so much. love you man.
If you want an easy quick followup video. See how long it takes each other function to hit the number reached by the gold metalist number. (feel free to not caluclate it with the recussive function... pretty sure we'll hit the heat death of the universe before that 1 gets done)
This has been one of the best videos I've seen on UA-cam. While I'm already familiar with all of the steps you've taken, the way you merged them together neatly while still respecting and addressing the imprecisions added when you use the fourier transform made the video a very enjoyable and elegant demonstration. By addressing the issues at the end you scratched that itch at the back of my head and I thank you for that.
This is great work! I loved seeing more and more complex math theory appear to solve a seemingly simple problem faster and faster. Thank you for taking the time to produce this video and share it with us!
As soon as you said the problem could be written using matrices I immediately thought "It could be a good idea to diagonalize the matrix!" and kept going crazy because you just wouldn't do it (until the end). Good video!
i remember when and how i was taught the fibonacci sequence, it was year 4 and we were learning about sequences of numbers and the teacher said that this is a sequence not even mathematicians could figure out until they were told it and wrote the fibonacci sequence on the board, she gave us an attempt to figure out the pattern and no one did it
@@thefunseeker9545 i had never heard of that but looking it up i guess so, it was more that we (or at least i) hadnt been taught yet how flexible sequences were, as in, i had never seen a sequence before that required n-1 to work out n, the previous ones had been things like nx2 or n^2 or n+8 if you get what i mean, everything could be reduced down to a formula that could be worked out without the previous numbers (even though we didnt know how to do that, im just explaining the difference)
This python code gets past the four millionth Fibonacci number in half a second on my laptop. Normally, python would be disastrous for speed, but most of the time is spent inside CPython's schoolbook(?) multiplication doing the last three squarings. The way I wrote this code was by starting from repeated squaring of {{0,1},{1,1}} and then simplifying by realizing the intermediate matrices always had the form {{a,b},{b,a+b}}. def fib_power_of_2(exponent: int) -> int: a, b = 0, 1 while exponent: a2 = a**2 b2 = b**2 ab2 = (a+b)**2 a = a2 + b2 b = ab2 - a2 exponent -= 1 return b
Yeah, I very conveniently left out how easy it is to outdo my implementation using well-established large number classes like those used in CPython or GMP :^)
@@fplancke3336 Hey, you're right! I thought Python used schoolbook but I searched "karatsuba" in the CPython's github repo and found where they switch to it. They also seem to be making decisions based on if it's squaring instead of multiplying. They don't seem to be using Schonnage-Strassen or SSE instructions, though.
This is an incredibly neat demonstration of optimisation techniques that typical programmers like myself aren't familiar/comfortable with. Great video, well done!
Liked it! The last solution was beyond what I knew from school. You've inspired me to start studying maths again because i haven't thought of an eigenvalue in years.
Mhmm, you hit a spot in my soul I never knew existed. All I have is this sub and this like. The love I give you freely. This was a wonderful experience.
The analysis at 7:30 is incorrect. The sum of numbers is proportional to the length of the number, not its size, so it doesn't grow with n, rather with log(n). So the algorithm isn't O(n^2) but O(nlog(n)). Huge difference.
Nah, it is correct, actually. It indeed grows with n if you define n as the number of digits, like you say. Now, since the Fibonacci numbers grow asymptotically exponentially, their number of digits relates linearly to the index (roughly one extra digit every five steps), and that index is used as n in the video. So the video looks correct to me.
I thought you were going to explain finite-field FFT (a.k.a. Number Theoretic Transform) at the end. FFT can be suitably modified to work on Z_p instead of C, for certain primes p. The main requirement on p is that 2^k | p-1 for some k > log2(N), because k bounds how many times you can do the FFT trick of splitting into even and odd parts Not only does NTT not have precision issues, it is also usually faster because it uses half as much space and basic operations are done on integers.
I fucking love watching videos that delve into topics that I clearly don't/shouldn't understand I don't even know how this crept into my recommended. But I love this
@@lih3391 I kinda want to multithread this, but I don't think it's possible. The matrix multiplication could be parallelized theoretically, but by the time starting a thread for a single multiplication becomes worth it, we no longer fit into memory.
I saw bionicle and had to like. Moreover, awesome video in regards to the consequences of abiding by ‘Big O’ notation for efficiency while ignoring practical limitations of memory. It also shows a good peek into the depths of optimization for beginners in the realm of coding. Thanks for the treat.
Wow I never considered the field method at the end, it can come out useful for other stuff whenever one knows you're working with just specific roots! I wonder how one can generalize this for fast diagonalization of any matrix, since eigenvalues will always be roots of polynomials, I will think about it, right after liking and subscribing!
This is a really cool video! In particular a very obvious reason for why the vanilla "linear" fibonacci is O(n^2) rather than O(n), which I didn't realise at first. Also having the direct form of the nth fibonacci number via diagonalisation is so neat! I knew the proof for it from a different kind of proof (en.wikipedia.org/wiki/Recurrence_relation), but the diagnonalisation is much more intuitive. Nicely edited as well! Might have forgotten this, but would have liked to know bit more about the specs of your laptop
The Binét formula was always going to win this competition. However you perhaps ought to have started by examining the Lucas equations to find better quick relations for obtaining large Fibonacci terms.
I love the idea that someone would come across this as their first introduction to the Fibonacci sequence, be able to immediately understand what it means that it's a "recurrence relation," and then make it through the whole video.
I started watching this video shortly after it was posted, and decided to implement this all in Rust using benchmarking. I thought this would be a fun project since I am new to Rust. 6h later, and things are getting off the ground. I'll edit this comment and add a link to my repo when it is finished :)
@@SheafificationOfG I started in python and C# as my main, and then got into Haskell. I'm loving Rust, but it's definitely a chore to learn. It was difficult to find the time to learn between semesters 😩
Man the world of Math is truly wild. I'd have done a+b=c, then b->a, c->b and repeat. Seeing you use vastly more complex things that I am unable to comprehend was just as fascinating as it was confusing to me. I learned absolutely nothing, understood even less than that and somehow, I was still entertained. Incredible.
You can still use FFT, just do it over a finite field of some kind iirc. Pretty sure you can do it in the ring mod 2^2^n+1 as well which works nicely because you can use 4 as a root of unity or something?
@@DarthWho01 Yeah that's basically what number theoretic transform is, if I remember right. Though 253 is also a nice prime because it turns things into bytes. And it may be faster just to use a prime near 2^64.
This was such an amazing video, thank you so much for making it. I have always wondered where the closed-form formula came from for the Fibonacci numbers.
“[It] is known as the Cooley-Tukey algorithm, so-called because these insights are due to none other than the same person who discovered the Fourier transform… Gauss.” LMAOOO
9:53 I am sad to say I did actually notice this in less time than it took for it to calculate 2^19-1 fibonacci numbers, and furthermore knew precisely what powers of two they were
Great video, one criticism though: dark blue is really not that legible on a black background, changing the colours of the code highlighting to something more contrasting (e.g. around 3:54) would help a lot!
The 6 million fibbonaci number limit is just because you're using double floating point numbers, right? If you swapped to quads or arbitrary precision you could go past that then. Though that's probably a little bit too much of a rabbit hole for the scope of this video...
An alternate way to get from 4 to 2 numbers is to realize that calculating Mⁿ is the same as evaluating the polynimial Xⁿ at M. Because M² = M + 1 this evaluation factors through Z[X]/, which means you can just calculate X^n in this ring (with exponentiation by squaring) instead and then ebaluate at M. The last step is the same as adding the coefficients in front of X⁰ and X¹. The same can easily be done for a general linear recurrence by calculating X^n in R[X]/ and linearly mapping to R by mapping Xⁱ to the i-th starting value.
I was waiting for the Binet algorithm from the start, but the journey was actually interesting, so I stayed. I honestly never considered the fact that if you work with numbers with undetermined bit size, you'd need fft just to compute a product of two integers, that's pretty crazy
50 lines of C code with GMP and matrix approach (4 multiplications) with -O0 can go for 67'108'864-th fibonacci number in one second. Life lesson: do not rewrite yourself highly optimized code.
@RuslanKovtun used -O0 to really show me up, since I used -O3 and -march=native But you're right, there's unfounded hubris in thinking you can outdo the work of well-established large number libraries, even Python can put my golden output to shame.
@@SheafificationOfG , you and @luigidabro missed that GMP is a library that is statically liked as -lgmp and has all optimizations in it. Yes, it is like in python where your code just calls it, but I will argue that python is still slow even for single "for" loop.
The thing is, GMP has like ten different multiplication algorithms to choose from, and it selects the presumed fastest given the input arguments. There is no purpose in doing a Fourier transform for 6 * 4.
what the hell this video is sick so many great jokes, and a lot to learn. I'd never have thought to use Master Theorem to analyze an algorithm like this, and I would never have guessed that Binet's formula actually would turn out to be faster in the end, given how much floating point multiplication I assumed would have to be done
The whole point was that binets formula uses an implementation without floating point multiplication by expressing it as a field (except that there \is\ floating point multiplication hidden in the fast Fourier transform)
the second I saw the 2x2 matrix equation at 5:28, matrix diagonalisation immediatly came to mind. Probably PTSD from math class. I wondered when you will talk about this and I was not disappointed when it came out !
Before watching: My immediate thought would be to use x86 assembly and run the following pseudocode int x = 0 int y = 1 while(onTheClock): x += y y += x print(lastModified) print(numIterations*2) There’s obviously some cleanup to be done, but essentially, just adding and storing the result repeatedly between two 64-bit registers After finishing the video: How can I have forgotten the fast Fourier transform, and the myriad of other things are definitely did not fly over my head! How silly of me (incredible video!!)
@23:30 You can use Number Theoretic Transform to achieve the same time complexity as FFT without loss of precision. You just need to find a very large prime number (slow) and hard code it (fast).
I finished watchin the video and was like, great video! And then "the ugly truth" started playing, adressing the only potential criticism I had. Instant subscribe.
Great video! My only qualm is the choice of "Which axis is which" on the graph. Like, the huge slowdowns in the graph at 9:45 look like huge JUMPS in progress, lol -Paintspot Infez Wasabi!
You need to be math major + CS both to understand this video properly I sucks at both both I am junier in math and I don't know much of C+ bit for what i know it old even making it more consuming at first ,😳
Wow, this really got a lot of attention... thanks everyone!!
I'd normally engage with the comments directly but I'm about as efficient as the naïve Fibonacci algorithm at that sort of thing... since there are some trends in the comments, I figured I'd at least address the most common questions/concerns that I came across:
1. *Runtime of the "linear" algorithm.* I swept the detail under the rug (didn't seem like the right time, but hindsight is far acuity), but it's explained a bit more in the definitely-not-hard-to-find greyed-out paragraph at 10:53. Briefly, the linear algorithm is O(n^2), where n is the *index* of the Fibonacci sequence, and the digit-length of the nth Fibonacci number has O(n) digits thanks to Binet's formula!
2. *Colour palette.* Classic "works on my machine" moment; the colours looked a *lot* better on my computer before it got uploaded to UA-cam. Sorry about that! I won't be using the same colour scheme in future videos; lesson learned. You can still read the actual source code at github.com/GSheaf/Fibsonicci
3. *Choice of number base.* Speaking of source code, there are some comments regarding my choice of using base-256 for my big integers. Just to clarify, I only made this restriction when dealing with Fourier transforms, with the justification being that double-precision floats wouldn't be able to handle larger bases. For the other, simpler algorithms, I used larger bases! This is summarised in the README for the source code. Most algorithms use base-2^32 (so that I could cast to 64-bits to do digit-wise products), and the "linear" algorithm uses base-2^64.
4. *Avoiding precision errors.* Many people mentioned the Number-Theoretic Transform as a correction to the FFT that doesn't suffer from the precision error. This would be a natural next step, at the cost of having to figure out a way of getting a sufficiently large prime p that is equal to 1 modulo the sequences being convolved (a headache I didn't want to get into after 25min of video). Alternatively, you can also implement "adjustable fixed-precision floats" to account for this.
5. Binet's formula doesn't render this problem "solved": how do you compute phi^n?
6. Memoisation is *not* a typo, and I'll die on this hill.
Anyway, definitely enjoying reading all of the comments here!
Hello! I really like this video, but some of the concepts are beyond what I’ve learned. What would you recommend to first look at to get a greater understanding of the video?
@@SheafificationOfG are you sure you can't go bigger than base 256 for the fourier algo? Doubles have 51 bit mantissa's. 256 is 8 bits.
Memoisation is correct
@somatia350 I think it'll depend on what concepts you want to go in more detail with, but a safe bet is probably The Book on algorithms (I.e., Cormen, Leiserson, Rivest, Stein). Not sure what it says regarding Fourier, but the book is excellent for giving you all the foundations in this kind of stuff, and you can build from there pretty easily.
@tolkienfan1972 The reason for reducing the base to 2^8 is to ensure that the mantissa is large enough to hold the (implicit) *sums* of digits in the underlying convolution. Decreasing the digit size allows for larger sums.
If I were to use 16-bit digits, I might see FFT break down after only the 47000th Fibonacci number, based on the same rough calculation in the video (granted this might be too pessimistic of a bound).
On the other hand, if I used 4-bit numbers, I would be able to take the computations much further (past the 48 millionth Fibonacci number)... if I could compute that far.
@@SheafificationOfG if you add n 16bit (16 to hold the products) numbers you need ceil(16+lg(n)) bits. 51 - 16 is 35. That's about 32 billion limbs. Did I make a mistake?
This video is 25 minutes not one second
Yeah! If I had 1 second I would shout 89, not turn on a computer and start making a video.
The main problem is to find the best algorithm for calculate fibonacci number in one second
@thekatdev6007 Savantism. That’s, where it’s at.
Lies
it's in slow motion so we can follow along
diamond medalist: storing the number in code and printing it
Ah yes it got optimized out by the programmer
everything is a lookup table if you optimize long enough
found you
@@IGoByLotsOfNames 👁h⍜lα👁
@@IGoByLotsOfNames why is my skyblock GOA T here
Let's all just appreciate that this man used DFT, FFT, Binet formula, Karatsuba's multiplication, Linear algebra, Complex numbers and Galois groups just to compute some Fibonacci numbers, whereas SIMD just left the chat
SIMD can only give a constant factor improvement though
@@asdfghyterTrue, but that doesn't mean it's going to be slower. The problem statement is how many can be calculated in 1 second, not which algorithm had the most efficient computation logic. Although that is what the video is about. Technically a SIMD or GPU solution could be faster even with a naïve implementation
@@pumpkinhead002 not with the most naive solution, no. that would be very impossible, since it's exponential. with any of the better algorithms it could indeed be faster though, as long as it's at least polynomial
some of the last steps only gave improvements by a factor, so those might very well be surpassed by a SIMD or GPU implementation
though, it's not completely obvious how to parallelize this problem, as the key part of the definition is a recursion. the main component that is parallel is the basic arithmetic operations, which are basically inherently SIMD already, but SIMD might be used to make bigint implementations faster. (and of course the matrix operations, which i forgot when first writing this)
@@pumpkinhead002I'd love to see gpu parallelized multiplication
@@asdfghyter The fact that the most naive solutions are exponential still does not mean that they can't be faster during one second after a constant speedup from SIMD, it is in this case unlikely given the large problem size, but not impossible. You don't understand asymptotical performance measures.
This video is extremely refreshing! I've seen people claim that you can compute Fibonacci numbers in O(log n) time, because they saying that arithmetic operations take constant time, and there are only O(log n) operations. This approximation is often useful, but in the Fibonacci case, you cannot discount the added cost of adding/multiplying large integers. The way you showed the actual runtime increasing with graphs really sells this point.
I think people are generally talking about the n-th Fibonacci number modulo some integer when they say that.
@@Avighnawhich is basically useless considering Fibonacci has linear memory requirements. Its similar in nature to the fallacy of O(1) hashmaps (maximum memory access speed is O(n^(1/3) for n bits, which has practical effects in the case of the various cache levels of a cpu and ram)
@@palmberry5576 It is a common task in competitive programming. And besides, why is knowing the 2 millionth Fibonacci number not under a modulus useful? It’s all theoretical anyway.
@@Avighna I meant it is useless to talk about modulo some integer considering the Fibonacci sequence’s length grows linearly with n
@@palmberry5576 Can you elaborate on the n^(1/3) result? Where does that come from?
My mind was wandering towards memory management and SIMD... but this is a math channel, and like you said - mathematicians can't program!
@@disquettepoppy this needs a cuda implementation somehow
Find the right algo first. The rest is linear speedups.
@@tolkienfan1972 Certified "Mathematicians don't know how to program" moment. 1:55
You are going to run out of memory before your slightly better scaling algorithm catches up in speed to one that was actually well written to utilize the computer well.
@@tolkienfan1972 You'd be surprised how much speed-up and overall efficiency you can gain when you're conscious of things like memory allocations, cache locality, and the hardware in general (that the program must run on in the end after all). The linear speed-up tends to be several orders of magnitude.
Furthermore, there are absolutely cases where low-level details can affect the time complexity itself.
On the flip side, even when the algorithm has a lower order, the constants that are left out of in the Big O notation can make it absolutely much worse for any inputs of interest.
And I'd argue that actually implementing the thing and analysing it, can absolutely help with coming up with a better algorithm, especially if you find out all the redundant things a given program is doing.
simd doesn't implement a carry bit
This reminds me of the time our discrete math course had a quiz that asked us to "compute f_300" and I, naive and brave, unironically tried doing it by hand. I was and still am pissed off to an unprecedented degree that "compute" apparently meant "Express the general term of the sequence as a linear combination of exponentials and substitute 300 into the free variable without doing any reduction"-they could have just told us to do that yk
I hope you knew how to use
f(2n)= f(n+1)^2 - f(n-1)^2
at some point...
😉
@@landsgevaer Proving it and using it was my original strategy but the computation part was left half finished as I handed the paper in. I also attempted it after the fact on a blank sheet and figured it would have taken far too much time at that point anyway. But after this whole ordeal I am now less naive than I used to be regarding how computations scale as numbers grow
Currently taking a discrete math shmmer course, our textbook linearized the recursive Fibonacci formula, it looked very complicated, can’t imagine doing that on a test lol
LMAO, I'm pretty sure the TA showed your paper to everyone in their lab and they had a ton of laugh about it xDD
If i had graded your assignment i would have still given you 4/5 points.
Man that "1 F for you, and 5 F's for your closest friends" joke had me cackling. solid CS and math humour here lol
i dont get it
i don't get it
Jim Rohn once said "You are the average of the five people you spend most of your time with".
@@vari1535 hexadecimal
@@dinhero21 because plus 1 is a power of 2 and it will have a big increase in time.
love how i came to the channel for set theory and stayed for computer science
Noooooo!!!!! Do not use this video as any kind of teaching of computer science. This is a train-wreck as far as computer science is concerned!
I definitely came for the computer science, a lot of the second half of the math *woosh*
@@SteveBakerIsHerewhy?
I’m watching this for no reason at all
As a Russian, I don't blame you for thinking Karatsuba is Japanese
All I had to do was google it smh
Just actually finished watching it - great video!
As a Russian, i- i thought its japanese too
@@filo8086Yeah
as an asian i thought it was japanese too
19:00 "a true measure of success is when you manage to unmake a name for yourself"
LMAO
Yes, 😂.
Gauss things
I expected Euler to be honest
When you got in the lecture to 'an F for you, and F''s for your five closest friends', UA-cam cut to a commercial beginning 'An actual letter to [advertiser]'. I stuck around in the ad far too long waiting for your punchline, when I realized that it wasn't your joke, it was a practical joke from The Algorithm.
I'm a computer scientist. I'm familiar with Schönhage-Straßen, and with Moler and Van Loan's 'nineteen dubious ways to compute the matrix exponential,' but your discussion is hilarious, in the flavour of Carl Linderholm's 'Mathematics Made Difficult'. Bravo!
you got played
If you want to calculate the multiplication of very large integers as fast as possible, use the GMP library. The authors have done a huge amount of work to make it as efficient as possible.
and at that point you can just use provided fibonaci function
For the matrix method, you can have a 4x4 matrix derived from F(n), F(n-1), F(n-2), F(n-3). This is nice because you can express those with coefficients as powers of 2, which means you can use SIMD and process multiple numbers at the same time. You could even reasonably do this for an 8x8 matrix and get to use AVX2, but it's a tradeoff. Asymptotically nothing would change, but having a 4-8x speedup because of SIMD sure is helpful in real life. This is getting deep into the territory of optimizing big numbers (and at that point, why handroll your implementation instead of wrapping GMP?)
I'm not sure SIMD would be helpful, moving data in and out of registers has quite an overhead.
But the Number class should have been 2^32 based, that's basically just free speedup, because uint8_t is still done on the same ALU unit.
SIMD would be a bit more work to set up since the number class and its arithmetic is nontrivial, but those are ideas (and yeah, reinventing the wheel is definitely never worth it unless you think you're smarter than the many people who developed GMP haha but where's the fun in that).
Also @janisir4529, while I used uint8_t for my FFT-based multiplications (for the sake of controlling the errors), everything else was done with uint32_t (the number class is actually templated)!
@@SheafificationOfG Okay, that makes sense.
Using Schonhage-Strassen multiplication (basically fft over a finite field, so no using doubles and being limited by precision) and matrix exponentiation by squaring, I get the 67108864th fibb number in 1.02s. Doing the same using arithmatic over the number field gets the same in 993ms, basically not improving. Pari somehow computes the 200000000th number in under a second. Would be interesting to show how this was achieved
Does it really make sense to compare these numbers when all calculations are (presumably) done on different machines?
@@spacebusdriver If we know the spec of each processor and memory, you could probably make some kind of generic average based off the performance stats, ie your processer is 2GHz and you get the 1,000,000th number, and i have a 3GHz processor and get the 1,800,000th number, we could scale these down to 1GHz on each machine, we would find you get 500,000th number, and I get 600,000th number, thus my solution is better.
In saying that, it wouldn't be that accurate, but it might give a decent estimation of performance comparison. True performance equivalence is to have some standardized machine that people could have or test it on and run it. Could be a nifty website idea, you slap in your code and see it performs compared to others.
Its done using diagonalization
i wish to know how to do that, for a implementation i've made using optimized reverse binary exponentiation, ive got:
$ ./x 200000000
Time: 4.0576
void f(mpz_t x, mpz_t out_b) {
mpz_t a, b, c, t, tmp;
unsigned long len;
mpz_inits(a, b, c, t, tmp, NULL);
mpz_set_ui(a, 1);
mpz_set_ui(b, 0);
mpz_set_ui(c, -1);
len = mpz_sizeinbase(x, 2);
mpz_set_ui(t, len);
for (unsigned long bit = len - 1; bit < len; --bit) {
if (mpz_tstbit(x, bit)) {
mpz_set(c, b);
mpz_add(a, a, b);
mpz_sub(b, a, b);
}
if(bit==0) break;
mpz_mul(t, b, b);
mpz_add(tmp, a, c);
mpz_mul(c, c, c);
mpz_add(c, t, c);
mpz_mul(a, a, a);
mpz_add(a, t, a);
mpz_mul(b, b, tmp);
}
mpz_set(out_b, b);
mpz_clears(a, b, c, t, NULL);
}
Your channel is truly one of the best math channels around right now. I know I know, opinions might vary depending on whats your level of math, but I can say that it perfect for me. And you do not lie to your audience that everything is EASY and then hit them with axioms they are supposed to absorb in 5sec. You know your math and you are not afraid to show it.
Do template meta programming. Technically it just prints out a number, the compilation taking a very long time doesn't matter, as the task was ill defined.
Template metaprogramming is a pathway to many abilities some consider to be unnatural.
@@SheafificationOfG constexpr should also work
The real answer is that you hardcore the largest number into the binary. Then the 1s time limit is mostly spend reading a number from disk and printing it again.
@@JoJoModding I'm already literally cheating with template meta programming, but even I have standards to not sink that low.
@@janisir4529 I mean, it produces the same program, but just gets there faster
Man I loved this video! Though I didn't understand much past the linear algebra, it was still interesting to see your analysis of the runtime and the possible solutions to improve it. Kudos!
This video is packed with easter eggs that are barely visible on top of a rapid if smooth delivery. I have not laughed so hard at a mathy video ever. Nor rewound so many times. Straight talk from a meme master.
Great video, had a great time watching it. Looking forward to your next one!
One peice of feedback though, dark blue text on a black background is very hard to read due to the contrast. It was difficult to read your code sometimes.
Not just contrast, but also video compression, where colors have less resolution (particularly pure blue and pure red).
Yeah, the video definitely looked better on my computer ("It runs on my computer" moment).
I'll be changing my choice of colours for code in the future for sure, thanks!
As soon as you started explaining digital multiplication, I immediately realized you were going the Karatsuba route
A few months ago I started working on an arbitrary precision integer library (for Fun and Profit™), and spent a whole bunch of time benchmarking exactly where the crossover should be to switch back to doing traditional multiplication vs the crazy allocation cost of doing recursive Karatsuba
Loved this video! I'm a math & cs student, I learned a lot from watching how you connected all of these different areas in math/cs to solve a deceptively simple sounding problem! Please do more stuff like this, it's invaluable how you seamlessly showcased the usage of linear algebra, complexity analysis, complex numbers, Fourier transforms, bit/byte representation of the numbers, optimizing multiplications (and anything else I missed) for optimizing this. I've read and studied these concepts but it was never made THIS clear to me how they could be utilized in practice in such a cohesive video. If you read this, I'm curious how long did it take you to optimize this and get all the material for the video?
Really appreciate the comment! I kinda threw the code together a month ago, and then did some major refactoring halfway through before making it public. I kinda kept things honest (except for the bit-reversal in the FFT implementation), so I tried not to stress myself out with fine-tuning my optimisations, and I was already aware of the algorithms I was going to use before I put the video together.
14:37 I get it! 1FFFFFF is 33,554,431 in hexadecimal.
I have no idea how I ended up here, but this one of the best video I've seen. I look forward to your future videos :)
I also loved your editing !
This is essentially a speedrun in computer science. Well done!
Imagine having this class on the first day of computer science and then learning all the details about this masterpiece.
Duuuude great video quality! I was impressed this channel didn't have more subscribers... got me at the end, though! For sure subscribing
Even knowing everything in the video already, the humour was quite good and I was thoroughly entertained, and seeing the runtime graphs was pleasing. Another banger from my favourite sheaf!
You don't need to compute evey Fibonacci number, only the largest - so your exponential matrix multiplication can just keep doubling for the entire second to get something huge
I haven't been as stimulated and entertained and educated by a video as by this in the past 6 years. I felt like a kid again having newly discover numberphile and minutephysics on UA-cam. love it. thank you so much. love you man.
5 minutes in, already had to flip a table on the affine joke. Great video, subbed.
Very nice! I spent the first 20 minutes or so waiting for the Binet formula, did not expect you to get close to the limit for fft multiplication...
If you want an easy quick followup video. See how long it takes each other function to hit the number reached by the gold metalist number. (feel free to not caluclate it with the recussive function... pretty sure we'll hit the heat death of the universe before that 1 gets done)
This has been one of the best videos I've seen on UA-cam. While I'm already familiar with all of the steps you've taken, the way you merged them together neatly while still respecting and addressing the imprecisions added when you use the fourier transform made the video a very enjoyable and elegant demonstration. By addressing the issues at the end you scratched that itch at the back of my head and I thank you for that.
stumbled upon this randomly only to be flashbanged by the wysi at the 17:27 timestamp, probably the last video I expected to see the reference
Not even math videos can save me, it follows me wherever I go.
This is great work! I loved seeing more and more complex math theory appear to solve a seemingly simple problem faster and faster. Thank you for taking the time to produce this video and share it with us!
I thought there would be a joke about constexpr and compile-time calculations to create a constant time result at runtime. Nice video
Missed opportunity
When you said, "divide... and conquer," my CompSci brain went "ah, multithreading, of course."
11:02 EXPANDED FIBONACCI NUMBERS INTO THE REALS
Akchually it's only expanding into the rationals 🤓☝️
@@edsaid4719 complex numbers:
As soon as you said the problem could be written using matrices I immediately thought "It could be a good idea to diagonalize the matrix!" and kept going crazy because you just wouldn't do it (until the end). Good video!
Thank you for the subs at 11:40 💜! People that just copy their script have spoiled their jokes to me before
Glad the effort was worth it!
What a fascinating number, great video!
Happy birthday to your dad from Australia!
i remember when and how i was taught the fibonacci sequence, it was year 4 and we were learning about sequences of numbers and the teacher said that this is a sequence not even mathematicians could figure out until they were told it and wrote the fibonacci sequence on the board, she gave us an attempt to figure out the pattern and no one did it
Pygmalion effect
@@thefunseeker9545 i had never heard of that but looking it up i guess so, it was more that we (or at least i) hadnt been taught yet how flexible sequences were, as in, i had never seen a sequence before that required n-1 to work out n, the previous ones had been things like nx2 or n^2 or n+8 if you get what i mean, everything could be reduced down to a formula that could be worked out without the previous numbers (even though we didnt know how to do that, im just explaining the difference)
As someone who did Complexity Analysis in college, I love your video. brings me back ! Your not a YT programmer, you are a computer scientist !
This python code gets past the four millionth Fibonacci number in half a second on my laptop. Normally, python would be disastrous for speed, but most of the time is spent inside CPython's schoolbook(?) multiplication doing the last three squarings. The way I wrote this code was by starting from repeated squaring of {{0,1},{1,1}} and then simplifying by realizing the intermediate matrices always had the form {{a,b},{b,a+b}}.
def fib_power_of_2(exponent: int) -> int:
a, b = 0, 1
while exponent:
a2 = a**2
b2 = b**2
ab2 = (a+b)**2
a = a2 + b2
b = ab2 - a2
exponent -= 1
return b
Python integer multiplication is quite optimised: it uses Karatsuba when warranted.
Yeah, I very conveniently left out how easy it is to outdo my implementation using well-established large number classes like those used in CPython or GMP :^)
@@fplancke3336 Hey, you're right! I thought Python used schoolbook but I searched "karatsuba" in the CPython's github repo and found where they switch to it. They also seem to be making decisions based on if it's squaring instead of multiplying. They don't seem to be using Schonnage-Strassen or SSE instructions, though.
Wow, great video!
I really hope there would be more algorithm and performance focused videos in the future :)
Ok, the last method I did not see coming. Nice job!
This is an incredibly neat demonstration of optimisation techniques that typical programmers like myself aren't familiar/comfortable with. Great video, well done!
5:29 hawk tuah?
The whole time, I was wondering why you weren't using the closed form. Great video!
3:59 lmao best math joke I've heard lol
Liked it! The last solution was beyond what I knew from school. You've inspired me to start studying maths again because i haven't thought of an eigenvalue in years.
I have no idea what you are talking about at 3:47. So I HAVE TO subscribe.🤣🤣🤣🤣🤣🤣🤣
Mhmm, you hit a spot in my soul I never knew existed. All I have is this sub and this like. The love I give you freely. This was a wonderful experience.
The analysis at 7:30 is incorrect. The sum of numbers is proportional to the length of the number, not its size, so it doesn't grow with n, rather with log(n). So the algorithm isn't O(n^2) but O(nlog(n)). Huge difference.
Nah, it is correct, actually.
It indeed grows with n if you define n as the number of digits, like you say. Now, since the Fibonacci numbers grow asymptotically exponentially, their number of digits relates linearly to the index (roughly one extra digit every five steps), and that index is used as n in the video.
So the video looks correct to me.
He also mentions this at 10:54 , but I agree it should've been mentioned earlier. Stumped me as well
Completely agree. I just typed out something similar and then saw your comment
@@landsgevaer If you define n as the number of digits you perform 2^n sums of n digits so it is n*2^n not n^2.
@@luminica_ 2^n sums of digits? How is that?
you're like a dank version of 3blue1brown
Yah
I thought you were going to explain finite-field FFT (a.k.a. Number Theoretic Transform) at the end. FFT can be suitably modified to work on Z_p instead of C, for certain primes p.
The main requirement on p is that 2^k | p-1 for some k > log2(N), because k bounds how many times you can do the FFT trick of splitting into even and odd parts
Not only does NTT not have precision issues, it is also usually faster because it uses half as much space and basic operations are done on integers.
I fucking love watching videos that delve into topics that I clearly don't/shouldn't understand
I don't even know how this crept into my recommended. But I love this
Dude this channel is so good
Curious about parallelization 👀
You have to do the computations in order though right?
@@lih3391yes
@@lih3391 I kinda want to multithread this, but I don't think it's possible. The matrix multiplication could be parallelized theoretically, but by the time starting a thread for a single multiplication becomes worth it, we no longer fit into memory.
I saw bionicle and had to like. Moreover, awesome video in regards to the consequences of abiding by ‘Big O’ notation for efficiency while ignoring practical limitations of memory. It also shows a good peek into the depths of optimization for beginners in the realm of coding. Thanks for the treat.
Wow I never considered the field method at the end, it can come out useful for other stuff whenever one knows you're working with just specific roots!
I wonder how one can generalize this for fast diagonalization of any matrix, since eigenvalues will always be roots of polynomials, I will think about it, right after liking and subscribing!
This is a really cool video! In particular a very obvious reason for why the vanilla "linear" fibonacci is O(n^2) rather than O(n), which I didn't realise at first.
Also having the direct form of the nth fibonacci number via diagonalisation is so neat! I knew the proof for it from a different kind of proof (en.wikipedia.org/wiki/Recurrence_relation), but the diagnonalisation is much more intuitive.
Nicely edited as well! Might have forgotten this, but would have liked to know bit more about the specs of your laptop
The Binét formula was always going to win this competition. However you perhaps ought to have started by examining the Lucas equations to find better quick relations for obtaining large Fibonacci terms.
I love the idea that someone would come across this as their first introduction to the Fibonacci sequence, be able to immediately understand what it means that it's a "recurrence relation," and then make it through the whole video.
I started watching this video shortly after it was posted, and decided to implement this all in Rust using benchmarking. I thought this would be a fun project since I am new to Rust. 6h later, and things are getting off the ground. I'll edit this comment and add a link to my repo when it is finished :)
Even if (when?) you beat my golden record, I still won't convert to rust 🦀
@@SheafificationOfG I started in python and C# as my main, and then got into Haskell. I'm loving Rust, but it's definitely a chore to learn. It was difficult to find the time to learn between semesters 😩
Man the world of Math is truly wild. I'd have done a+b=c, then b->a, c->b and repeat. Seeing you use vastly more complex things that I am unable to comprehend was just as fascinating as it was confusing to me.
I learned absolutely nothing, understood even less than that and somehow, I was still entertained. Incredible.
FFT is usually done using number theoretic transform, rather than real numbers. And there's probably a fast way to do this using Chinese Remainder.
You can still use FFT, just do it over a finite field of some kind iirc. Pretty sure you can do it in the ring mod 2^2^n+1 as well which works nicely because you can use 4 as a root of unity or something?
@@DarthWho01 Yeah that's basically what number theoretic transform is, if I remember right. Though 253 is also a nice prime because it turns things into bytes. And it may be faster just to use a prime near 2^64.
“Mathematicians don’t know how to program” said a mathematician who started counting life lessons from 0
@0:44 Aah I see you have the Manga Guide to Statistics
This was such an amazing video, thank you so much for making it. I have always wondered where the closed-form formula came from for the Fibonacci numbers.
“[It] is known as the Cooley-Tukey algorithm, so-called because these insights are due to none other than the same person who discovered the Fourier transform… Gauss.” LMAOOO
9:53 I am sad to say I did actually notice this in less time than it took for it to calculate 2^19-1 fibonacci numbers, and furthermore knew precisely what powers of two they were
What is that you said at 5:28 ?
hawk ____?
this video has proper captions thankfully. he says 'ad hoc'
Bro that was really the best video that I ever watched!! Loved it
Great video, one criticism though: dark blue is really not that legible on a black background, changing the colours of the code highlighting to something more contrasting (e.g. around 3:54) would help a lot!
The 6 million fibbonaci number limit is just because you're using double floating point numbers, right? If you swapped to quads or arbitrary precision you could go past that then. Though that's probably a little bit too much of a rabbit hole for the scope of this video...
this is much deeper of a topic than I anticipated, cool stuff!
5:29
Say that again?
An alternate way to get from 4 to 2 numbers is to realize that calculating Mⁿ is the same as evaluating the polynimial Xⁿ at M. Because M² = M + 1 this evaluation factors through Z[X]/, which means you can just calculate X^n in this ring (with exponentiation by squaring) instead and then ebaluate at M. The last step is the same as adding the coefficients in front of X⁰ and X¹. The same can easily be done for a general linear recurrence by calculating X^n in R[X]/ and linearly mapping to R by mapping Xⁱ to the i-th starting value.
Didn't expect a channel doing fast Fibonacci algorithms to be into osu! but somehow I'm not surprised.
Wait he plays osu as well? What’s the osu channel
@@nikplaysgames4734 My hint was 'wysi' in the chapter name for 17:27
I was waiting for the Binet algorithm from the start, but the journey was actually interesting, so I stayed.
I honestly never considered the fact that if you work with numbers with undetermined bit size, you'd need fft just to compute a product of two integers, that's pretty crazy
50 lines of C code with GMP and matrix approach (4 multiplications) with -O0 can go for 67'108'864-th fibonacci number in one second. Life lesson: do not rewrite yourself highly optimized code.
Why would you tell us your result by running it in debug mode? Try -O3 (obviously) -march=native
I mean, I have managed to write a better printf than printf. It could only print out integers, but it was very fast.
@RuslanKovtun used -O0 to really show me up, since I used -O3 and -march=native
But you're right, there's unfounded hubris in thinking you can outdo the work of well-established large number libraries, even Python can put my golden output to shame.
@@SheafificationOfG , you and @luigidabro missed that GMP is a library that is statically liked as -lgmp and has all optimizations in it. Yes, it is like in python where your code just calls it, but I will argue that python is still slow even for single "for" loop.
The thing is, GMP has like ten different multiplication algorithms to choose from, and it selects the presumed fastest given the input arguments. There is no purpose in doing a Fourier transform for 6 * 4.
Love the video! I was impressed by your addition of "Ad Hoc 2". Nothing escapes brainrot.
18:30 radiohead reference
Can't believe that its over 25 minutes long. It was so engaging, it felt like a second
what the hell this video is sick
so many great jokes, and a lot to learn. I'd never have thought to use Master Theorem to analyze an algorithm like this, and I would never have guessed that Binet's formula actually would turn out to be faster in the end, given how much floating point multiplication I assumed would have to be done
The whole point was that binets formula uses an implementation without floating point multiplication by expressing it as a field (except that there \is\ floating point multiplication hidden in the fast Fourier transform)
this video is a humbling reminder i am not good at programming and am just in my first year of my comp sci degree
Whats your osu name
Wanna come and play gacha together
Idk
the second I saw the 2x2 matrix equation at 5:28, matrix diagonalisation immediatly came to mind. Probably PTSD from math class. I wondered when you will talk about this and I was not disappointed when it came out !
When you see it
when you fucking see it
Before watching:
My immediate thought would be to use x86 assembly and run the following pseudocode
int x = 0
int y = 1
while(onTheClock):
x += y
y += x
print(lastModified)
print(numIterations*2)
There’s obviously some cleanup to be done, but essentially, just adding and storing the result repeatedly between two 64-bit registers
After finishing the video:
How can I have forgotten the fast Fourier transform, and the myriad of other things are definitely did not fly over my head! How silly of me (incredible video!!)
nah im gonna stick to just doing it the simple way if that's okay with you...
also you sound weirdly similar to physics for the birds btw
I'll take that as a compliment! I like his content
@23:30 You can use Number Theoretic Transform to achieve the same time complexity as FFT without loss of precision. You just need to find a very large prime number (slow) and hard code it (fast).
5:30 hawk tuah
I heard it
Nice
I finished watchin the video and was like, great video! And then "the ugly truth" started playing, adressing the only potential criticism I had. Instant subscribe.
Love your video ❤❤❤
Jokes on you, I'm just going to take one second to generate the largest number I can then claim it's a Fibonacci number.
"I'm a pure mathematician. I don't care about the real world."
That cracked me up.
😂❤😅
Great video! My only qualm is the choice of "Which axis is which" on the graph. Like, the huge slowdowns in the graph at 9:45 look like huge JUMPS in progress, lol
-Paintspot Infez
Wasabi!
Bro i didnt understand ANYTHING
I liked when the lines moved.
You need to be math major + CS both to understand this video properly I sucks at both both I am junier in math and I don't know much of C+ bit for what i know it old even making it more consuming at first ,😳
This video had me smiling from the math and references all the way through, loved it :^)