I remember my professor saying nlogn might be possible, and then a few years later Harvey-Hoeven proved it! A video on the discrete fourier transform would be awesome. I want to understand how decomposing wave functions has applications in things like multiplication, factoring, and discrete logarithms.
Cool! Ya at some level it should be nlog n just based on the complexity of fast fourier transforms, so all the extra baggage in the early algorithms was just that
The most brilliant part of Karatsuba's algorithm was that he took it up as a challenge to prove Kolmogorov wrong (claim that the lower bound cant be any lesser than n^2) and came up with this brilliant manipulation. Like damn dude , like how confident do you have to be to challenge and disprove perhaps one of the greatest mathematician of the USSR, rather the world! He did all this while he was just a student.
The original technique of reducing it to 3 multiplications instead of 4 is often attributed to Gauss. What Karatsuba did was add in the recursive divide and conquer step to continuously reduce the size of the problem.
The similar algorithm for every number is called binary exponentiation. For example, if you had to multiply x with 24, you first multiply x with 16 with bit shifting, then you multiple x with 8 with again bit shifting, then add those 2 numbers
@@8ightfold Yeah. But that if statement is branching. If you care about speed then clearly you are handling a lot of data so instead you can use AVX/SIMD to multiply many elements, and an unrolled loop to do many of those. The micro-code of your processor is very well coded to stream many of these operations back to back. Or you can ruin it all with an if statement.
Every old software multiply I see is like this: fn multiply(a, b) { sum = 0 while (b) { sum += a if b & 1 left shift a right shift b } } That's n time where a and b have n bits. Not sure why you guys are doing anything else.
So multiplication is O(1) or O(n) depending on if wires or RAM is in play if your number is in base b and you multiply by b^k. However since you must convert to and from that number system, the complexity of base conversion will again not be better than n log n. So the fact you point out though practically useful in digital systems isn't really going to improve things.
@David Hand addition is O(n) and therefore you just described gradeschool multiplication or an O(n^2) algorithm. Not sure what universe you are from but in this one addition is not free. Not to mention bit by bit is terrible in computing performance. All real implementations do it a machine word at a time treating a hardware machine multiply as O(1) which is massively faster than what you wrote e.g. 1024 times faster on a 32 bit system. After n is around 30 to 50, Karatsuba will start to outperform. You might want to try it and do performance measurements before jumping to conclusions.
Recently, I wrote a program in C implementing the grade-school multiplication algorithm. It's working fine for multiplying two numbers up to around 130 digits long, but the result for numbers greater than 130 digits starts to differ slightly from that of Wolfram Alpha. I don't know what the issue is. Well i guess it's time to try the Karatsuba algorithm and see how it goes. and as usual great video professor!!
What types are you working with? I know different types have different memory sizes for them. For example BOOL, LONG, and INT types use different numbers of bits to store their information.
Yeah, it sounds like you might be getting an overflow error somewhere along the way for large numbers. It might be useful to examine the start and end of the binary representation of the correct answer and the answer your program spits out. If the end agrees and the beginning doesn't, that support the idea that it's an overflow error.
@@GhostyOcean i am using character array to take input and then integer array of same size, then converting string input into integer by ascii adjustment then carrying out multiplication just like we human do. and storing the result in dynamically allocated resulting array.
The best part about O(n log n) multiplication isn't doing it (as is clear from the fact that we don't actually do it), it's analysing other things that use multiplication and just having it be O(n log n) and not muddy the entire analysis with more complicated terms.
There is this lovely tension between theory and practicality in maths. The Greeks: there are an infinity of primes because a finite list can be multiplied together, add 1 and factorise to get a new prime. Cryptographers: take 2 large primes and multiply, bet you can't find the original primes. Karatsuba: multiplying those primes is hard, let's make it faster. Schonhage-Strassen, and Harvey and Hoeven: for numbers so big we can't in fact work with them let's make it faster. Lovely - now we need a theoretical proof that n log(n) is indeed the limit.
I believe you may have been slightly misleading in the discussion about the time complexity of Karatsuba's algorithm. While it is clear from your analysis that the asymptotic number of multiplications (or subproblems) is roughly n^1.58, I think it is also necessary to address the fact that the additional cost to combine the recursive results for each of these subproblems is not constant (it depends on the size of the subproblem). Namely, once the recursive multiplications have been obtained, we must add/subtract them with each other (and shift various results by some amount of place values), which together costs O(m) for a size m subproblem. When you do the full analysis and sum the total amount of work done at each level of the recursion, it does indeed end up being O(n^1.58), but that is because the recurrence for the work of this algorithm is "leaf-dominated" (the total amount of work is comparable to the number of subproblems). There are other examples where this would end up mattering (for example, in the recurrence for mergesort, there are roughly n subproblems for an input array of size n, but the additional cost of merging the two recursive results costs O(m) for a size m subproblem, which, because the recurrence is "balanced" causes the overall time complexity to be O(n log n)). Am I missing something?
the fun thing about hardware multiplication is they often continue to break the problem down even further. from what i've heard, modern hardware generally breaks it down to 8x8 bit multiplications and then uses the fastest 8x8 bit multiplication known to mankind: a 65k entry table. and you even have a separate table for every 8x8 bit multiplication that has to be done. and while you could do 64x64 bit multiplication (which is implemented in x86_64, i have no idea how to get the full answer in C though) using 27 tables instead of 64 using Karatsuba, they're all done at the same time so in terms of speed that's just more layers of addition. so in case you were wondering how we manage to keep doubling those transistor counts... what i think is really interesting though is that in grade school you're effectively left to assume that division isn't much slower than multiplication. but in reality, multiplication has all these different ways you can improve, while with division it's things like "wow we cut it down by a factor of 4, still 10x slower than multiplication though because none of it is parallel" for hardware and "so basically you multiply a bunch of times because multiplication is just that much faster" for low complexity algorithms (just checked before posting this, turns out dividers can achieve the same complexity as multipliers, though once again, this is by turning it into multiple multiplications). also doesn't help division's case that integer dividers have to provide two almost unrelated answers at once (quotient and remainder), not just one big one.
I'd suggest you look up Dadda and Wallace trees. Additionally, you might find Booth's multiplication algorithm useful. One nice thing about Booth's method is that it handles signed numbers directly and with a minor modification, handles unsigned as well (basically extend the unsigned numbers an additional significant bit with a value of 0). At the hardware level, this allows for both signed and unsigned multiplies to be performed with essentially the same hardware and microcode. But as for this video, it's only useful for LARGE numbers. For smaller numbers such as used natively by 32, 64, and 128 bit computers, it's generally faster to use the straight forward O(n^2) algorithms or lookup tables. One minor note. If you look at Booth's method, a lot of naïve people will claim that it reduces the number of add operations for performing multiplication. This is wrong. On average the total number of adds will be the same for Booth and conventional. The big advantage to Booth's method is that for multi-bit versions, the values added (or subtracted) are simpler than the values used for conventional multi-bit multiplication. For instance, Radix-4 Booth uses the values 1x, 2x, -2x, -1x. All of which is a simple addition or subtraction of either the value directly, or a single bit shifted value. Whereas radix-4 conventional uses the values 1x, 2x, 3x. The values 1x and 2x are simple. But 3x can't be calculated via a simple shift. It requires both a shift and an addition. So it's considered "complex". And for Radix-8 Booth, the values required are 1x, 2x, 3x, 4x, -4x, -3x, -2x, -1x. So the only complex values needed is 3x, whereas for Radix-8 conventional, the complex values required are 3x, 5x, and 7x (6x is a simple shift of the 3x value).
I love learning how things actually works in mathematics for larger numbers. The love only increased further during my electrical engineering days where we had a subject, aptly, called Advanced Engineering Mathematics (AEM) 1 & 2. For sure it was difficult than anything I had done till then but at the same time it felt so cool to understand it and solve problems. In the end, AEM was amongst my top 4 highest scoring subjects. 😅
"We all learn how to do this in high school." We learned it in elementary school. By high school, most people were getting rusty at it because we could use calculators, though I always lost my calculator and so got better. (I still was able to write this before I "instantly" knew what 7*8 was, though. It's 56, I remember now.) On yhe ither hand, I like the way you explicitly write all 4 1-digit multiplications instead of just writing out 2 multiplications, one for each of the bottom digits,like I learned to do.
The first time I heard about the FFT algorithm for multiplication, it seemed... impossible. What did multiplication have to do with the frequency domain after all? I didn't really look into it further since I didn't need to multiply giant numbers anyway. I think it was a 3bluebrown video talking about convolutions in probability, and used long multiplication as an example to introduce it. That was a giant AHA! moment. Of course multiplication is just convolution followed by the carry additions! I immediately opened up a python prompt and had to try it. Mind blown! What a beautifully weird algorithm. Using the FFT to convolve digits of a number in the frequency domain? Brilliant! Maybe a bit useless for everyday computing, but still!
Hey dr, I hope you are doing well! Missed your classes (during the COVID era) and videos very much! Your videos helped me get an A on Calculus. Im also glad you got over 300k subscribers. I am also recommending your videos to my first year friends, because they are extremely helpful.
5:30 This reminds me of the DeepMind matrix multiplication thing, or the one-up to the 2*2 decomposition of larger matricies. That saves up to 33% less calculation. I forget that guys name.
I can't believe I've never seen 1:09 multiplication where the "cross" of the bottom one's digit to the top ten's digit is completely stored for the additive solution section. Only the tens addition of a phantom ten's on the top ten's digit. This visual makes more sense of large number multiplication.
Binary multiplication is a different beast all together. It's adding the multiplicand to a running sum of adding exponentiated copies of itself according to the reversed sequence of ones and zeros in the multiplier. But I'll definitely be trying this method on binary numbers.
I had tried it before in binary using bitshifts + recursion but still native multiplication will be faster which makes sense, since it doesn't involve any function calls, ig if it's done in a low level language difference can be seen..
@ashutoshmahapatra537 If your compiler is any good then function calls to short functions should be removed during the optimisation stage. Native multiplication will however always be faster because it's done in hardware, not software. Algorithms mentioned in this video come into play when you have to multiply those enormous numbers which you can't represent with native data types.
I was taught a slightly different method for multiplying multi-digit numbers, but the point of the video still stands. Instead of just looking at individual digits and having n^2 numbers that need adding together at the end, I was taught to run the entire top number against each digit of the bottom number, with proper carrying, which gave me just n numbers to add together, with n being the number of digits in the bottom number.
Ya fair enough, I sort of think of this as equivalent (you are still doing n^2 single digit multiplications, just organizing your work on the page a bit differently of when you do the additions)
Here are the missing steps that Karatsuba (probably) went through to improve this: (i.e. here's how ad+bc equals the new 10s term) (a*10^n+b)(c*10^n+d)= ac*10^2n+(ad+bc)*10^n+bd Focusing on the middle term alone, ad+bc= ad+bc+ac+bd-ac-bd= (a+b)(c+d)-ac-bd ac*10^2n+((a+b)(c+d)-ac-bd)*10^n+bd And the product at 5:00 should be (if I'm right- I went Karatsuba all the way down) 1,117,262,504,544,938. Well- that's what I got.
Usually when I see a software multiply in old code, computing a*b goes like this: i = 0 sum = 0 while i++ < n { if b & 1 add a to sum shift a left shift b right } The adds and shifts and the and are all O(k) apiece, so the full multiply is O(n). Am I missing something here? Why are we mucking about with nlog n? I suppose it only works out this way in base 2, but only base 2 matters if we're computing complexity. In base 2, a pair of digits can only multiply to 1 or 0, whereas a pair of base 10 digits has to be an actual multiply instead of logical and, and the multiply in that case is not O(k).
Yes, you are missing something. The key thing to remember is that they are talking about LARGE numbers. So addition is an O(n) operation where n is the number of bits, not O(1). And for your multiply operation, you're doing n additions at a cost of O(n) each, giving an overall cost of O(n^2). Now, for modern processors, operations on the basic word size are O(1). But once again, in the grand scheme of things, those values that are basic to the CPU are small, whereas the values being talked about in the video are large values composed of thousands to millions of bits.
Would the Harvey and Hoeven be useful for checking Fermat number primality? I know F_33 is the first candidate and 2^(1729^12) would be on the order of 2^(2^129) thus F_129 (which doesn't appear to have factors currently) might be a candidate to try and use it with Pepin's Test.
In regards to the 32 bit by 32 bit multiplication, you probably would not want to break them up into 32/31 bit segments. The reason being, is you will get a 32 bit integer back. Assuming you are multiplying 2 random 32 bit numbers, the odds of getting a number where you need more than 32 bits to represent is very high. The computer would then just roll over and just give you the last 32 digits and get rid of the rest. Imagine the 2 integers you are multiplying are 2^31 * 2^31 = 2^62. This is clearly not representable using 32 bits. You would want to break them up into smaller segments and the best way to segment the number without overflowing would be 16 bit segments since worst case scenario is (2^16 - 1) * (2^16 - 1) < 2^16 * 2^26 = 2^32 which is within 32 bits so the 16 bit segments will never overflow.
many computer instruction sets have a high_mul instruction that will give you the upper bits of the multiplication so the way presented is often going to be faster.
0:48 not sure if its just us, but we instead do 6x2, 6x3 and then merge them while calculating, then we do 5x2, 5x3 and merge those while calculating, then we do x+x=x
3:00 correct me if i am wrong. what if a+b and c+d become greater than 10? Now you have more than 3 single digit multiplication. In worst case you have 6 single digit multiplication.
3:00 In binary, you don't have any "single digit" multiplies at all, since they are all 1 or 0. Instead, you have the decision to add or skip the add. The number of adds (and decisions on whether to perform or skip) is equal to the number of bits in the smaller factor. So if you consider n to be the number itself, multiplication has k log n steps. If you consider n to be the size (length) of the input; that is, the number of bits in the factors to be multiplied. then multiplying has kn steps. Really, if you wanted to make multiplication of large numbers fast on a computer, it would do to implement a special type of memory that has an adder and a shifter built-in, that joins the individual cells together for as large of a number as you want. That is, given two numbers stored in this memory A and B, it could perform B+=A in constant time and shift A right one bit in constant time. It's important to note that the number of logic gates is constant for each cell (word or byte) so scales linearly with the number of cells.
I am hoping to do a video at some point really diving into the discrete fast fourier transforms and how they are used for the more advanced algorithms:)
I've implemented fft multiply. I have a suspicion that a careful implementation with hand tuning can make it beat out the n^2 algo for much smaller n. I had to work on something else tho. Maybe I'll get back to it. The thing is, a naive implementation includes a lot of redundant operations, affecting the hidden constant.
Have you tried Vedic Way of Multiplication So the Example you gave 32*56 The way a Vedic Mathematician Have done in (3*5) at tens Place (3*6+5*2) then this (2*6)at last. So the way you get 28 can be done by 3*6+5*2
i think everyone should figure out the most efficient algorithms for multiplication and addition and why they are what they are. I feel like that would be a lot of value created.
@9:00 "For numbers after about 10 to the 96th..." Is that a typo? 10^96? That is a grokking huge number, far far huger than 2^64. If this is true then this algorithm is not worth it even for x64 architecture.
If you do many multiplications by hand, you should soon notice that once you’ve multiplied for a particular digit, you never have to repeat it. You just copy it shifted over, the same as when multiplying by 1. It’s merely a copy operation. Never are n² digit-wise multiplications required. The maximum is 8n.
Hi, you said at 3:00 that we need to do 3 single digit multiplications. But couldn't a+b and c+d also be 2 digit numbers? Thank you for this video , an awesome Explanation of this algorithm! Edit: Ok I get it. a+b or c+d would be guaranteed much smaller than a*10 + b for very large numbers, Thanks again!
Since, in binary, everything is bits instead of digits; how does the fact that multiplying bits (a single 'And' operation) is much simpler than adding bits (two 'Xor' operations [one for the bits, one for the previous term's Carry) and a 'And' operation (for the new Carry)) factor into this? It feels like reducing bit-additions would be the most direct way to reduce speed, in this framing.
if i'm not mistaken, and its kind of cheating, but i'm pretty sure early mathematicians multiplied numbers with logarithm lookup tables. for example, say you wanted to do: 31415*92653 what you can do is: ln(31415*92653)=ln(31415)+ln(92653) look up these logarithms in your lookup table to get: 10.35504076483705 + 11.43661661111373 which is much easier to do: = 21.79165737595078 then use your logarithm lookup table in reverse: = 2,910,693,995 really nowadays you'll always have your phone with you for a calculator, and if you don't have your phone you also probably don't carry around a logarithm lookup table (although that would be so cool to own actually), it's still super useful to know in some oddly specific circumstances i think i might buy a pocket logarithm lookup table just in case
The problem is that the look up of the logarithm has built in accuracy limits - my paper copy is 5 digits so completely superceeded by computer calculation.
It's just a natural, necessary extension of exponentiation. Exponentiation without logarithms would be like addition without subtraction. The story of Euler's number _e_ and the natural log is slightly more interesting. _e_ was chosen so that _e^x_ would be its own derivative.
In case of 32bit multiplication hardware... you can use such HW to multiply any two 16bit numbers and get 32bit result without loosing any information.
I think the shifts won't actually end up getting implemented; rather, they just kind of fall into place by getting the indexes right. To illustrate how this kind of thing works with a toy model, notice that the fastest way to compute x*(2^64) + y isn't to start by computing x*(2^64), it's actually two do the whole thing in one traversal by repeatedly computing x[i+1] + y[i]. I'm not an expert, but I have a feeling that Karatsuba's algo would typically be implemented in a similar way.
If you multiply two binary numbers of length n, you have to perform n shifts left on one number (padding 0 as least significant bit), and at most n additions to it of the other number, one per each one found in the latter, zeros don't count, looking from the least significant bit. Isn't this the fastest multiplication? Order n, for sums, no multiplications.
@@gdclemo Yes, whether you use breaking integers down (for conventional computers) or hybrid conventional with Turing processors, a sum requires to go along successive pairs of ciphers n or n+1 times. Thus, O(n) times. And O(n) number sums, therefore, require O(n²) elemental operations But my point was that, with binary numbers, you don't need multiplications! Whatever the multiplication method, whatever the base, you still need O(n) sums of numbers, therefore O(n²) elemental operations just for sums, plus the elemental or combined multiplications in bases greater than two.
@@wafikiri_ Well you've just broken the multiplications down to their elemental components to the extent that you're just multiplying by zero or one, which are of course trivial. But those operations still add up to O(n^2) in total which is much slower than the optimal algorithm. Big-O notation doesn't care what the operations are or how long they take individually, only how they grow.
Hmm.. in your first example, two digits X two digits, you had to do four multiplications using the high school method. But if you choose to represent your numbers in a base greater than the larger of the two numbers, then it will always be a one digit number X one digit number, and you never have to do more than one multiplication. Multiplications can be done in constant time. Q.E.D. 🙂
56*32 is easy to do... it's just (40+16)*(40-8) or 40²+(16-8)*40-16*8 3 multiplications and 3 additions. I mean if we did numbers like 96*93 it's stupidly easy, it's 8900+28 I did 1 multiplication a simple one, -7*-4... That's all I needed. And how I came up with those numbers, oh boy. 96*93=(100-4)(100-7) 100²+100*((-4)+(-7)) + (-4)(-7) And you still say that's 3 multiplications... and I say it's not, 96-100=-4 , 93-100=-7 96-7=89 -4×-7=28 89 concatenate 28 is 8928... 1 multiplication the rest is addition.
12:30 Such an incredibly precise number is unlikely to ever be necessary. That tiny extra gained precision is cancelled out by random variation at least in real world processes, and on a computer you still won't find much use for it because all computational tasks in some way are tied to the real world.
Well this is the same, you are just writing less down, when you multiply 32x6 you are presumably doing 2x6 and 30*6 in your head and adding them. And I do too, I normally only write down 2 lines not 4 but I'm writing all 4 to make explicit that there are 4 single digit multiplications involved.
What I once thought was some yokel's ignorant misunderstanding of language, I now realize is the common way Americans pronounce 'Fourier'. I get it if you feel this is petty and that math should stand above such matters of culture and language - by all means, call it "wave decomposition" or somesuch. But if you're going to name something after a person - might as well get their name right.
Go be insufferable somewhere else, Daniel. Maybe you should learn Ancient Aramaic so you can say that Biblical name of yours properly. After all, if you're going to name something after a person, you might as well get their name right.
given enogh storage space you could just look at a table with the solutions for any multiplications. guess that is a bit of cheating but O=1 is technically possible
@@DrTrefor can't you multiply by any number you want by doing bitshifts and addition really easily? I.E. to multiply by 5, shift bits2 left, and add the original value once, Or to multiply by 6 shift bits 2 left and add the original value twice?
Nope, that algorithm is faster for humans too if you can get used to doing math with an accumulator rather than trying to hold onto both operands for the whole operation. In fact, my own algorithm works almost exactly like this but in reverse.
I remember my professor saying nlogn might be possible, and then a few years later Harvey-Hoeven proved it! A video on the discrete fourier transform would be awesome. I want to understand how decomposing wave functions has applications in things like multiplication, factoring, and discrete logarithms.
Cool! Ya at some level it should be nlog n just based on the complexity of fast fourier transforms, so all the extra baggage in the early algorithms was just that
@@DrTrefor isnt FFT also a lot of multiplications?
Radix algorithm
I believe 3blue1brown did a very solid video on FFT a month or so ago
The multiplication algorithm is a bit like a convolution which can be replaced with a single multiplication in the transform domain.
The most brilliant part of Karatsuba's algorithm was that he took it up as a challenge to prove Kolmogorov wrong (claim that the lower bound cant be any lesser than n^2) and came up with this brilliant manipulation. Like damn dude , like how confident do you have to be to challenge and disprove perhaps one of the greatest mathematician of the USSR, rather the world! He did all this while he was just a student.
Very cool!
It was a very clever manipulation- maybe he went through the steps I outlined in my comment.
@@wyattstevens8574 bro I can't find your comment
cool
The original technique of reducing it to 3 multiplications instead of 4 is often attributed to Gauss. What Karatsuba did was add in the recursive divide and conquer step to continuously reduce the size of the problem.
4:10 "you can immediatly tell what 7 times 8 is" me: 64... wait no...
Me: 72!!!
Me: Hold on. So 8^2=64 so 64-8=60-4=56 FIFTY SIX!
@@jaydentplays7485 lol, that's how I didit in my mind.
30!!
me, thinking he was going to say 7*6: 42, yes.
Note that in a program multiplying by a power of two can be just a bit-shift. But you have to recognise when it's a power of two
The similar algorithm for every number is called binary exponentiation. For example, if you had to multiply x with 24, you first multiply x with 16 with bit shifting, then you multiple x with 8 with again bit shifting, then add those 2 numbers
@@8ightfold Yeah. But that if statement is branching. If you care about speed then clearly you are handling a lot of data so instead you can use AVX/SIMD to multiply many elements, and an unrolled loop to do many of those. The micro-code of your processor is very well coded to stream many of these operations back to back.
Or you can ruin it all with an if statement.
Every old software multiply I see is like this:
fn multiply(a, b) {
sum = 0
while (b) {
sum += a if b & 1
left shift a
right shift b
}
}
That's n time where a and b have n bits. Not sure why you guys are doing anything else.
So multiplication is O(1) or O(n) depending on if wires or RAM is in play if your number is in base b and you multiply by b^k. However since you must convert to and from that number system, the complexity of base conversion will again not be better than n log n. So the fact you point out though practically useful in digital systems isn't really going to improve things.
@David Hand addition is O(n) and therefore you just described gradeschool multiplication or an O(n^2) algorithm. Not sure what universe you are from but in this one addition is not free. Not to mention bit by bit is terrible in computing performance. All real implementations do it a machine word at a time treating a hardware machine multiply as O(1) which is massively faster than what you wrote e.g. 1024 times faster on a 32 bit system. After n is around 30 to 50, Karatsuba will start to outperform. You might want to try it and do performance measurements before jumping to conclusions.
Recently, I wrote a program in C implementing the grade-school multiplication algorithm. It's working fine for multiplying two numbers up to around 130 digits long, but the result for numbers greater than 130 digits starts to differ slightly from that of Wolfram Alpha. I don't know what the issue is. Well i guess it's time to try the Karatsuba algorithm and see how it goes. and as usual great video professor!!
hmm interesting, I wonder if it is some memory issue?
@@DrTrefor yeah it might be the reason. as a beginner i am very much satisfied with it hahaha, i will work on it later
What types are you working with? I know different types have different memory sizes for them. For example BOOL, LONG, and INT types use different numbers of bits to store their information.
Yeah, it sounds like you might be getting an overflow error somewhere along the way for large numbers. It might be useful to examine the start and end of the binary representation of the correct answer and the answer your program spits out. If the end agrees and the beginning doesn't, that support the idea that it's an overflow error.
@@GhostyOcean i am using character array to take input and then integer array of same size, then converting string input into integer by ascii adjustment then carrying out multiplication just like we human do. and storing the result in dynamically allocated resulting array.
The best part about O(n log n) multiplication isn't doing it (as is clear from the fact that we don't actually do it), it's analysing other things that use multiplication and just having it be O(n log n) and not muddy the entire analysis with more complicated terms.
There is this lovely tension between theory and practicality in maths.
The Greeks: there are an infinity of primes because a finite list can be multiplied together, add 1 and factorise to get a new prime.
Cryptographers: take 2 large primes and multiply, bet you can't find the original primes.
Karatsuba: multiplying those primes is hard, let's make it faster.
Schonhage-Strassen, and Harvey and Hoeven: for numbers so big we can't in fact work with them let's make it faster.
Lovely - now we need a theoretical proof that n log(n) is indeed the limit.
I believe you may have been slightly misleading in the discussion about the time complexity of Karatsuba's algorithm. While it is clear from your analysis that the asymptotic number of multiplications (or subproblems) is roughly n^1.58, I think it is also necessary to address the fact that the additional cost to combine the recursive results for each of these subproblems is not constant (it depends on the size of the subproblem). Namely, once the recursive multiplications have been obtained, we must add/subtract them with each other (and shift various results by some amount of place values), which together costs O(m) for a size m subproblem. When you do the full analysis and sum the total amount of work done at each level of the recursion, it does indeed end up being O(n^1.58), but that is because the recurrence for the work of this algorithm is "leaf-dominated" (the total amount of work is comparable to the number of subproblems). There are other examples where this would end up mattering (for example, in the recurrence for mergesort, there are roughly n subproblems for an input array of size n, but the additional cost of merging the two recursive results costs O(m) for a size m subproblem, which, because the recurrence is "balanced" causes the overall time complexity to be O(n log n)).
Am I missing something?
Thanks, sticking to Karatsuba.
the fun thing about hardware multiplication is they often continue to break the problem down even further. from what i've heard, modern hardware generally breaks it down to 8x8 bit multiplications and then uses the fastest 8x8 bit multiplication known to mankind: a 65k entry table. and you even have a separate table for every 8x8 bit multiplication that has to be done. and while you could do 64x64 bit multiplication (which is implemented in x86_64, i have no idea how to get the full answer in C though) using 27 tables instead of 64 using Karatsuba, they're all done at the same time so in terms of speed that's just more layers of addition. so in case you were wondering how we manage to keep doubling those transistor counts...
what i think is really interesting though is that in grade school you're effectively left to assume that division isn't much slower than multiplication. but in reality, multiplication has all these different ways you can improve, while with division it's things like "wow we cut it down by a factor of 4, still 10x slower than multiplication though because none of it is parallel" for hardware and "so basically you multiply a bunch of times because multiplication is just that much faster" for low complexity algorithms (just checked before posting this, turns out dividers can achieve the same complexity as multipliers, though once again, this is by turning it into multiple multiplications). also doesn't help division's case that integer dividers have to provide two almost unrelated answers at once (quotient and remainder), not just one big one.
I'd suggest you look up Dadda and Wallace trees. Additionally, you might find Booth's multiplication algorithm useful. One nice thing about Booth's method is that it handles signed numbers directly and with a minor modification, handles unsigned as well (basically extend the unsigned numbers an additional significant bit with a value of 0). At the hardware level, this allows for both signed and unsigned multiplies to be performed with essentially the same hardware and microcode.
But as for this video, it's only useful for LARGE numbers. For smaller numbers such as used natively by 32, 64, and 128 bit computers, it's generally faster to use the straight forward O(n^2) algorithms or lookup tables.
One minor note. If you look at Booth's method, a lot of naïve people will claim that it reduces the number of add operations for performing multiplication. This is wrong. On average the total number of adds will be the same for Booth and conventional. The big advantage to Booth's method is that for multi-bit versions, the values added (or subtracted) are simpler than the values used for conventional multi-bit multiplication. For instance, Radix-4 Booth uses the values 1x, 2x, -2x, -1x. All of which is a simple addition or subtraction of either the value directly, or a single bit shifted value. Whereas radix-4 conventional uses the values 1x, 2x, 3x. The values 1x and 2x are simple. But 3x can't be calculated via a simple shift. It requires both a shift and an addition. So it's considered "complex". And for Radix-8 Booth, the values required are 1x, 2x, 3x, 4x, -4x, -3x, -2x, -1x. So the only complex values needed is 3x, whereas for Radix-8 conventional, the complex values required are 3x, 5x, and 7x (6x is a simple shift of the 3x value).
Fascinating. Stumbled onto this and I’d never really thought about the rate determining steps in larger multiplication tasks. Thanks
I love learning how things actually works in mathematics for larger numbers.
The love only increased further during my electrical engineering days where we had a subject, aptly, called Advanced Engineering Mathematics (AEM) 1 & 2.
For sure it was difficult than anything I had done till then but at the same time it felt so cool to understand it and solve problems.
In the end, AEM was amongst my top 4 highest scoring subjects. 😅
"We all learn how to do this in high school."
We learned it in elementary school. By high school, most people were getting rusty at it because we could use calculators, though I always lost my calculator and so got better. (I still was able to write this before I "instantly" knew what 7*8 was, though. It's 56, I remember now.) On yhe ither hand, I like the way you explicitly write all 4 1-digit multiplications instead of just writing out 2 multiplications, one for each of the bottom digits,like I learned to do.
The first time I heard about the FFT algorithm for multiplication, it seemed... impossible. What did multiplication have to do with the frequency domain after all? I didn't really look into it further since I didn't need to multiply giant numbers anyway. I think it was a 3bluebrown video talking about convolutions in probability, and used long multiplication as an example to introduce it. That was a giant AHA! moment. Of course multiplication is just convolution followed by the carry additions! I immediately opened up a python prompt and had to try it. Mind blown! What a beautifully weird algorithm. Using the FFT to convolve digits of a number in the frequency domain? Brilliant! Maybe a bit useless for everyday computing, but still!
also the same Strassen as in the Solovay-Strassen primality test!
That's right, the dude was busy!
Hey dr, I hope you are doing well! Missed your classes (during the COVID era) and videos very much! Your videos helped me get an A on Calculus. Im also glad you got over 300k subscribers.
I am also recommending your videos to my first year friends, because they are extremely helpful.
5:30 This reminds me of the DeepMind matrix multiplication thing, or the one-up to the 2*2 decomposition of larger matricies. That saves up to 33% less calculation. I forget that guys name.
9:40 yep. That all just came together.
I can't believe I've never seen 1:09 multiplication where the "cross" of the bottom one's digit to the top ten's digit is completely stored for the additive solution section. Only the tens addition of a phantom ten's on the top ten's digit. This visual makes more sense of large number multiplication.
Binary multiplication is a different beast all together. It's adding the multiplicand to a running sum of adding exponentiated copies of itself according to the reversed sequence of ones and zeros in the multiplier.
But I'll definitely be trying this method on binary numbers.
I had tried it before in binary using bitshifts + recursion but still native multiplication will be faster which makes sense, since it doesn't involve any function calls, ig if it's done in a low level language difference can be seen..
@ashutoshmahapatra537 If your compiler is any good then function calls to short functions should be removed during the optimisation stage. Native multiplication will however always be faster because it's done in hardware, not software. Algorithms mentioned in this video come into play when you have to multiply those enormous numbers which you can't represent with native data types.
I was taught a slightly different method for multiplying multi-digit numbers, but the point of the video still stands.
Instead of just looking at individual digits and having n^2 numbers that need adding together at the end, I was taught to run the entire top number against each digit of the bottom number, with proper carrying, which gave me just n numbers to add together, with n being the number of digits in the bottom number.
Ya fair enough, I sort of think of this as equivalent (you are still doing n^2 single digit multiplications, just organizing your work on the page a bit differently of when you do the additions)
@@DrTrefor That's also fair enough.
Here are the missing steps that Karatsuba (probably) went through to improve this: (i.e. here's how ad+bc equals the new 10s term)
(a*10^n+b)(c*10^n+d)=
ac*10^2n+(ad+bc)*10^n+bd
Focusing on the middle term alone, ad+bc= ad+bc+ac+bd-ac-bd= (a+b)(c+d)-ac-bd
ac*10^2n+((a+b)(c+d)-ac-bd)*10^n+bd
And the product at 5:00 should be (if I'm right- I went Karatsuba all the way down) 1,117,262,504,544,938. Well- that's what I got.
Usually when I see a software multiply in old code, computing a*b goes like this:
i = 0
sum = 0
while i++ < n {
if b & 1 add a to sum
shift a left
shift b right
}
The adds and shifts and the and are all O(k) apiece, so the full multiply is O(n). Am I missing something here? Why are we mucking about with nlog n?
I suppose it only works out this way in base 2, but only base 2 matters if we're computing complexity. In base 2, a pair of digits can only multiply to 1 or 0, whereas a pair of base 10 digits has to be an actual multiply instead of logical and, and the multiply in that case is not O(k).
You can replace the if with bitmasking a if you don't like the jump, too
Yes, you are missing something. The key thing to remember is that they are talking about LARGE numbers. So addition is an O(n) operation where n is the number of bits, not O(1). And for your multiply operation, you're doing n additions at a cost of O(n) each, giving an overall cost of O(n^2). Now, for modern processors, operations on the basic word size are O(1). But once again, in the grand scheme of things, those values that are basic to the CPU are small, whereas the values being talked about in the video are large values composed of thousands to millions of bits.
4:09 It's multiplication table
Would the Harvey and Hoeven be useful for checking Fermat number primality? I know F_33 is the first candidate and 2^(1729^12) would be on the order of 2^(2^129) thus F_129 (which doesn't appear to have factors currently) might be a candidate to try and use it with Pepin's Test.
In regards to the 32 bit by 32 bit multiplication, you probably would not want to break them up into 32/31 bit segments. The reason being, is you will get a 32 bit integer back. Assuming you are multiplying 2 random 32 bit numbers, the odds of getting a number where you need more than 32 bits to represent is very high. The computer would then just roll over and just give you the last 32 digits and get rid of the rest.
Imagine the 2 integers you are multiplying are 2^31 * 2^31 = 2^62. This is clearly not representable using 32 bits. You would want to break them up into smaller segments and the best way to segment the number without overflowing would be 16 bit segments since worst case scenario is
(2^16 - 1) * (2^16 - 1) < 2^16 * 2^26 = 2^32
which is within 32 bits so the 16 bit segments will never overflow.
many computer instruction sets have a high_mul instruction that will give you the upper bits of the multiplication so the way presented is often going to be faster.
0:48 not sure if its just us, but we instead do 6x2, 6x3 and then merge them while calculating, then we do 5x2, 5x3 and merge those while calculating, then we do x+x=x
whaa we do 32x50+32x6
This video was really helpful. I think I know enough to code it up on my own.
Nice!
In highschool? Im sorry i learned multiplication in grade 2
Multiple digit multiplication.
I think in my country it's grade 3-4
high school? or primary school?
3:00 correct me if i am wrong. what if a+b and c+d become greater than 10? Now you have more than 3 single digit multiplication. In worst case you have 6 single digit multiplication.
0:06 highschool?
3:00 In binary, you don't have any "single digit" multiplies at all, since they are all 1 or 0. Instead, you have the decision to add or skip the add. The number of adds (and decisions on whether to perform or skip) is equal to the number of bits in the smaller factor.
So if you consider n to be the number itself, multiplication has k log n steps.
If you consider n to be the size (length) of the input; that is, the number of bits in the factors to be multiplied. then multiplying has kn steps.
Really, if you wanted to make multiplication of large numbers fast on a computer, it would do to implement a special type of memory that has an adder and a shifter built-in, that joins the individual cells together for as large of a number as you want.
That is, given two numbers stored in this memory A and B, it could perform B+=A in constant time and shift A right one bit in constant time. It's important to note that the number of logic gates is constant for each cell (word or byte) so scales linearly with the number of cells.
Would have loved to learn about the newer theorems but still was very interesting.
I am hoping to do a video at some point really diving into the discrete fast fourier transforms and how they are used for the more advanced algorithms:)
What you're doing is invaluable Dr?
Thanks once more
It's fascinating how we used logarithms to make counts faster and now we use them to measure how method is doing counts faster
2:59 a+b and c+d are not necessarly single digits.
I've implemented fft multiply. I have a suspicion that a careful implementation with hand tuning can make it beat out the n^2 algo for much smaller n. I had to work on something else tho. Maybe I'll get back to it. The thing is, a naive implementation includes a lot of redundant operations, affecting the hidden constant.
Thanks for this great vid, Professor!
Do you also plan to branch to modular Multiplikation algorithms in one of your next videos?
Regards, Rob.
good idea!
I've just read a description of "About"tab, and I think I'll like you😄
Hi Dr. Bazett!
I bet these new algorithms will find a use eventually -- if not in this millennium, then certainly in subsequent ones!
it already has, i'm pretty sure (in python for example)
Hello professor. Maybe you can make another playlist on another math course in the future which would help us a lot! Thank you.
Lot's more coming...what would you like to see?
@@DrTrefor maybe calculus of complex variables or real analysis
Have you tried Vedic Way of Multiplication
So the Example you gave
32*56
The way a Vedic Mathematician Have done in
(3*5) at tens Place
(3*6+5*2) then this
(2*6)at last.
So the way you get 28 can be done by 3*6+5*2
He says that's basically the grade-school n² algorithm.
Make a video about fast multiplication using fast fourier transform
I plan to!
0:07 I learned it in elementry
Hi, can you make another Video about modular multiplication. Especially the Montgomery Modular Multiplication?
i think everyone should figure out the most efficient algorithms for multiplication and addition and why they are what they are. I feel like that would be a lot of value created.
@9:00 "For numbers after about 10 to the 96th..." Is that a typo? 10^96? That is a grokking huge number, far far huger than 2^64. If this is true then this algorithm is not worth it even for x64 architecture.
If you do many multiplications by hand, you should soon notice that once you’ve multiplied for a particular digit, you never have to repeat it.
You just copy it shifted over, the same as when multiplying by 1.
It’s merely a copy operation.
Never are n² digit-wise multiplications required.
The maximum is 8n.
keep up the good work
Hi, you said at 3:00 that we need to do 3 single digit multiplications. But couldn't a+b and c+d also be 2 digit numbers? Thank you for this video , an awesome Explanation of this algorithm!
Edit: Ok I get it. a+b or c+d would be guaranteed much smaller than a*10 + b for very large numbers, Thanks again!
sorry, highschool? I learned this in 3rd grade
4:32 "A little bit"...? (No offense.)
Fascinating like always ❤❤❤
Since, in binary, everything is bits instead of digits; how does the fact that multiplying bits (a single 'And' operation) is much simpler than adding bits (two 'Xor' operations [one for the bits, one for the previous term's Carry) and a 'And' operation (for the new Carry)) factor into this? It feels like reducing bit-additions would be the most direct way to reduce speed, in this framing.
How does first algo works with odd amount of number given or odd and even number given
I’m wondering if you can use a Fourier Transform on the entries.
Indeed, that's the basis for the more modern methods
0:08 bro you learned the multiplication in high school? bro...
if i'm not mistaken, and its kind of cheating, but i'm pretty sure early mathematicians multiplied numbers with logarithm lookup tables. for example, say you wanted to do:
31415*92653
what you can do is:
ln(31415*92653)=ln(31415)+ln(92653)
look up these logarithms in your lookup table to get:
10.35504076483705 + 11.43661661111373
which is much easier to do:
= 21.79165737595078
then use your logarithm lookup table in reverse:
= 2,910,693,995
really nowadays you'll always have your phone with you for a calculator, and if you don't have your phone you also probably don't carry around a logarithm lookup table (although that would be so cool to own actually), it's still super useful to know in some oddly specific circumstances
i think i might buy a pocket logarithm lookup table just in case
The problem is that the look up of the logarithm has built in accuracy limits - my paper copy is 5 digits so completely superceeded by computer calculation.
Do you have a court in Brilliant, Dr. Bazett?
Interesting algorithm!
I love your videos!
isn't toom-cook O(N+e) epsilon as small as you want ?
Can make a video explaining where did the log actually came from? Imran how did we start using log and ln in the first place??
It's just a natural, necessary extension of exponentiation. Exponentiation without logarithms would be like addition without subtraction.
The story of Euler's number _e_ and the natural log is slightly more interesting. _e_ was chosen so that _e^x_ would be its own derivative.
Can you do a course on Markov / Semi Markov / Hidden Markov / Semi Hidden Markov models please.
Is there any scientific reading analyzing the type of number we use in scientific or non-scientific computation ? (what size) from 0BC to 2023 ?
In case of 32bit multiplication hardware... you can use such HW to multiply any two 16bit numbers and get 32bit result without loosing any information.
Did you just say you learn multiplying in HS 0:08
I learned it way back in the before times that is all a murky blurr:D
I re watched the beginning of this video like 5 times because I thought I was missing something important because of this lol
super professor!📚😃🏃♀️
Thank you!!
I also studied math at University of Toronto. Back when Prof Eric Moore was just Senior Tutor Moore.
When you said enormous numbers and recursively apply the algorithm the first thing I thought was stack overflow lol
”We all learn this in high school, for instance”
Well we usually learn that in grade school. Or I'd hope so.
What a super teacher!
Reminds me of the FFT algorithm for computing the DFT.
Feels like this is a hint for a handful of Project Euler solutions.
Yeah so much nicer to write the carry inline, never thought of that.
3:15 mentions computers, but the title doesn't specify this. Fastest multiplication algorithm for computers would be a better description.
multiplication is first being taugh tin high school, now?
...
I like the topology picture at his shirt
04:06 I've memorized single-digit-multiplication, but recently I'm trying not to use the memorization. I'm trying to imagine a rectangle to multiply.
You have to also count the shift operations.
Shifts (i.e. multiplication by 2 in binary) are computationally cheap from what I understand
I think the shifts won't actually end up getting implemented; rather, they just kind of fall into place by getting the indexes right. To illustrate how this kind of thing works with a toy model, notice that the fastest way to compute x*(2^64) + y isn't to start by computing x*(2^64), it's actually two do the whole thing in one traversal by repeatedly computing x[i+1] + y[i]. I'm not an expert, but I have a feeling that Karatsuba's algo would typically be implemented in a similar way.
If you multiply two binary numbers of length n, you have to perform n shifts left on one number (padding 0 as least significant bit), and at most n additions to it of the other number, one per each one found in the latter, zeros don't count, looking from the least significant bit. Isn't this the fastest multiplication? Order n, for sums, no multiplications.
You are talking about n additions of arbitrary length numbers, when you break that down into additions of fixed size integers you are back to O(n^2).
@@gdclemo Yes, whether you use breaking integers down (for conventional computers) or hybrid conventional with Turing processors, a sum requires to go along successive pairs of ciphers n or n+1 times. Thus, O(n) times. And O(n) number sums, therefore, require O(n²) elemental operations
But my point was that, with binary numbers, you don't need multiplications! Whatever the multiplication method, whatever the base, you still need O(n) sums of numbers, therefore O(n²) elemental operations just for sums, plus the elemental or combined multiplications in bases greater than two.
@@wafikiri_ Well you've just broken the multiplications down to their elemental components to the extent that you're just multiplying by zero or one, which are of course trivial. But those operations still add up to O(n^2) in total which is much slower than the optimal algorithm. Big-O notation doesn't care what the operations are or how long they take individually, only how they grow.
I believe every problem has n log n solution.
it's not a bad null hypothesis!
Mine is way slower but works! 66*12=6*11*12=2*3*11*12=2*2*3*2*3*11=8*3*3*11=24*3*11=72*11=72*10+72=720+72=722+70=792
Hmm.. in your first example, two digits X two digits, you had to do four multiplications using the high school method. But if you choose to represent your numbers in a base greater than the larger of the two numbers, then it will always be a one digit number X one digit number, and you never have to do more than one multiplication. Multiplications can be done in constant time. Q.E.D. 🙂
Haha true! Ya if we just built hardware multiplies of arbitrary size it would multiply in one step!
In the first question, I'd just do (56 × 3 × 10) + (56 × 2)
56*32 is easy to do... it's just (40+16)*(40-8) or 40²+(16-8)*40-16*8 3 multiplications and 3 additions.
I mean if we did numbers like 96*93 it's stupidly easy, it's 8900+28 I did 1 multiplication a simple one, -7*-4... That's all I needed. And how I came up with those numbers, oh boy.
96*93=(100-4)(100-7) 100²+100*((-4)+(-7)) + (-4)(-7) And you still say that's 3 multiplications... and I say it's not, 96-100=-4 , 93-100=-7 96-7=89 -4×-7=28 89 concatenate 28 is 8928...
1 multiplication the rest is addition.
12:30 Such an incredibly precise number is unlikely to ever be necessary. That tiny extra gained precision is cancelled out by random variation at least in real world processes, and on a computer you still won't find much use for it because all computational tasks in some way are tied to the real world.
Interesting :)😃
Why does he randomly start to talk in italian?
Is Katasuba is this TikTok method with crossing lines ?
Who does it the way you first presented? You of course first multiply 32 by 6 and then 32 by 5 one shifted. The you add.
Well this is the same, you are just writing less down, when you multiply 32x6 you are presumably doing 2x6 and 30*6 in your head and adding them. And I do too, I normally only write down 2 lines not 4 but I'm writing all 4 to make explicit that there are 4 single digit multiplications involved.
“Seven times eight and you immediately know the answer” 🤔 😬
What I once thought was some yokel's ignorant misunderstanding of language, I now realize is the common way Americans pronounce 'Fourier'. I get it if you feel this is petty and that math should stand above such matters of culture and language - by all means, call it "wave decomposition" or somesuch. But if you're going to name something after a person - might as well get their name right.
Go be insufferable somewhere else, Daniel. Maybe you should learn Ancient Aramaic so you can say that Biblical name of yours properly. After all, if you're going to name something after a person, you might as well get their name right.
Nah, better do 1729 dimensional fourier transform
given enogh storage space you could just look at a table with the solutions for any multiplications. guess that is a bit of cheating but O=1 is technically possible
At some point, searching within the table would be the hard part!
Isn't binaty multiplication almost as fast as addition though?
multiplication by 2 is very fast in binary specifically, but not any two numbers.
@@DrTrefor can't you multiply by any number you want by doing bitshifts and addition really easily?
I.E. to multiply by 5, shift bits2 left, and add the original value once,
Or to multiply by 6 shift bits 2 left and add the original value twice?
This looks like recursive functions from coding.
Nope, that algorithm is faster for humans too if you can get used to doing math with an accumulator rather than trying to hold onto both operands for the whole operation. In fact, my own algorithm works almost exactly like this but in reverse.
With hand? Use log and log tables. That's what they are invented for originally.
Si that's what the CPUs use
Anyone else hungry for carrot soup now?
First view proffesor.❤
Nice one!