I think thats pretty funny because most of the times the algorithm will stop not because it ran "the correct optimal algorithm" but just print the answers and they happen to be right... It would only start to do better than "print all pairs of 2 numbers and check" when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long.
"when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long" -- it's funny because a lot of programming is exactly that, writing a piece of code that's shorter than the answer
Presumably, there exists an algorithm that requires a finite amount of code to implement regardless of the size of the input or output. That algorithm may be longer than the shortest "print()" solutions up to a point, but again: it doesn't grow as the input size grows while "print()" does. Just like how the program Gzip ramains the same size whether you want to compress a Megabyte file or a Petabyte file. There will be some largest number where "print()" is more efficient, beyond that, there are infinite numbers where implementing the actual algorithm is more efficient. Therefore the phrase "most of the time" is misleading here, because finite
Let's also admire the fact that this algorithm implements and executes perfectly aligned AGI. And of course all the paperclip maximizers and alike. The slowest doomsday device
@@PolylogCS you try to catch it, but what about the escape key on your key chain, did you remember to remove it after locking the AGI in it's block chain.
It'll also eventually output a 4K video of you and the grandma next door starring in a hardcore adult film directed by Michael Bay and narrated by Morgan Freeman
@@user-dh8oi2mk4f If we're talking about things with finite memory, they're technically not Turing Machines, which have an infinite amount of memory, so it's technically a different problem (hence why saying this doesn't break math and win you a Turing award)
Seems like for composite numbers where the string containing the solution program is big enough, before reaching that solution, the universal search algorithm might at one step generate a program which is also a universal search algorithm itself, searching for the same answer, which would also create another nested universal search algorithm program before reaching the answer and so on. That would be pretty cool :D
Yes, that's absolutely correct! Unfortunately, that other nested instance of itself won't have any chance of finishing faster than the high-level universal search, because it needs to do at least as many steps. What could be even more funny is that it'll implement some better universal search that might use some better programming language and its simulator which will run much faster and generate the correct program faster...
This reminds me of the Library of Babel for some reason. Somewhere in there, anything you can think of exists. Good luck finding it lol. (Although in that case there is a straightforward mapping between text and location.) In the case of factoring primes, I'm pretty sure it would be faster to brute force all the numbers < sqrt(p*q) than to randomly generate a program that would do it. I also wonder if the universal search algorithm could be quantum-ified to check all possibilities of, say, length n, at once. But of course a sophisticated quantum computer like that could crack large primes in its sleep anyway. Perhaps other problems would benefit from it though.
Interesting connection with the library of babel! Regarding quantum computers, one (relatively boring) thing you can do is to adapt universal search for them so that it is as fast as any quantum algorithm. But as you say, factoring is one of a few problems that quantum computers are known to solve quickly, so the question about the fastest quantum algorithm for them is perhaps not so interesting.
It isnt the same. Universal search is an algorizhm that implements every algorithm a turing machime can simulate, and so it can solve every problem. Moduloing a large number by every number smaller then it WILL solve prime factorization, but it will only solve time factorization.
> _"Library of Babel"_ i was thinking same as well... aside: when i watched vid about that weeks ago; i never thought that i would face that concept again so soon.
Amazing video, I didn't know about the universal search algorithm before and found it very interesting. I wish you had done the derivation for why doing exponential steps on previous machines had complexity 2^L + f(n) since I would've imagined a log_2 to show up, but I managed to figure it out (I think) for anyone interested: Since each level grows by powers of 2 when the next machine gets added, you need to add log_2(f(n)) new machines to the search before the machine at level L will finish (2^log_2(f(n)) steps = f(n) steps). This means the total height you get to is L + log_2(f(n)) and since the total number of operations is the sum of many consecutive powers of 2 2^0 + 2^1 + 2^2 + ... + 2^(L - 1 + log_2(f(n))) = 2^(L + log_2(f(n)) ) = 2^L + f(n) You get the final complexity in the video.
Yes! One reason why we did not go into this is that there is one subtlety lurking there that Ondřej Bouchala promptly uncovered. Check out also his comment! :) Also, there is a small mistake in your derivation: 2^(L + log(f(n))) = 2^L * f(n).
Syntax errors are really the same thing as program termination. So, it isn't hard to come up with a system where any code is valid code, i.e. syntax errors are impossible. In Brainfcuk, there exists only one syntax error: Incorrect brackets, like for example "][". If one redefines [ and ] to have well defined behavior in these cases, any string of BF commands will be a valid program. For example, redefine ] such that in the case of no matching [ it moves the instruction pointer to the beginning of the program, and redefine [ such that in the case of no matching ] it moves to the end of the program, i.e. just terminates it.
Great video! I'd like to add that "Polynomial time?"-level of abstraction is useful in areas like complexity theory (well, at least "old school" style sub-areas of it) since you might want to study time complexity as a property of a problem and not of an algorithm that solves it, so you would like to use a measure of complexity that depends as little as possible on computational model that you use. Different models of computations often disagree on the power of a polynomial, but very rarely disagree on polylogarithimc/polynomial/superpolynomial/subexponential/exponential/etc. levels of precision.
Generating AST instead of code would cut a lots of non runnable code. Assymptotically the same, but makes a little bit more sense. Oh, and also its funny, that once in a while it is going to generate itself, so its a kinda quine)
Asymptotic complexity is great most of the time, you only have to not throw out the constant factor and it might not be the best idea to lexicographically order all programs by it
the moment I got my head around it my brain instantly went to a version of universal search that uses assembly commands and numbers instead of all characters, with the goal of finding the most optimal algorithms for math functions through brute force.
That actually exists. Compilers encounter multiplication by constants (e.g. size * 13) or other math functions. Compilers that generate highly optimized code have libraries of small chunks of code that compute specific math functions. They build that library by trying all possible small chunks of machine language.
This is why we don't use big O in our course about algorithms. We use something like tilde notation witch adds the coefficient to the highest exponent term and you get a way better representation to the performance of an algorithm to compare to other algorithms.
You might enjoy looking at the video description! :) turns out you can make the multiplicative constant 1.01 and hide the horrible stuff to additive constant!
Any notation that hides information has to be assumed to only be useful in certain cases. The tilde that gives you slightly more information is nice, but is a level of abstraction that you sometimes don't need when simply analyzing an algorithm; I really don't care if an algorithm is 1000(n log n), since that's worse than being n^2 regardless, and the coefficient isn't going to be higher in any reasonable algorithm.
Well, worse bogo sort for all algorithms. It's basically a random output generator where for each output you check if it is correct. If you get a correct output from an algorithm, it doesn't mean it will give correct output for a different input. You could optimise it for the problem you are solving, for example generating pairs of 2 INTEGERS which would never output 'banana' but that is no fun given the context of universal search. This also works only for problems which have a constant time complexity for checking the solution, otherwise it wouldn't be "asymptoticaly optimal".
It's actually better than bogosort, because this algorithm is at least guaranteed to give you the correct answer in a finite amount of time while the time complexity for bogosort is unbounded, meaning it could just never land on the correct permutation if you're "lucky"
@@postmodernist1848 It's better than random bogosort, but roughly equivalent to deterministic bogosort. Deterministic bogosort works in effectively the same way as universal search, by iterating through all the possible solutions.
I find it weird that the most common measure of complexity in computer science is big O, ignoring the constant factor of the highest order term. In mathematics, I would only ever mean the equivalence relation "~" by the phrase "assymptotic to...", i.e. the ratio of the two functions tends to 1 in some limit (which could be n tends to infinity, as is always meant in computer science). And I think this is a much more generally useful term for algorithms where the behaviour is well understood (i.e., all the constants are easily measured and/or bounded). Even just giving an order of magnitude upper/lower bound to the coefficient of the leading term is enough information to tell you if the algorithm is practical or impractical. For example, someone might see that heap sort and merge sort are both O(n log n) comparison sorts, so would just think "they are both optimally fast, so I'll just use heap sort because merge sort takes a higher space complexity". But this misses the point that merge sort takes ~ n*log_2(n) comparisons, while heap sort takes ~ 2n*log_2(n) comparisons, in the worst case (can be optimised, but this is the simple case to demonstrate the point). So the truth is that heap sort takes O(n log n) *more* comparisons that merge sort, which might help explain the trade off that merge sort is making with space for time.
I totally agree that this way, you get more information about the algorithm! But it also gets hard to compute the leading constant, unless you decide to measure something as simple as "number of comparisons". Also, you don't capture cache behavior (important for quicksort) etc. Finally, it turns out (check the video description) that there is a universal search with the property that if there is an f(n) algorithm, the search has complexity 1.01f(n) + O(1), so even in your way of measuring complexity, universal search looks like an amazing algorithm.
A simple way to speed up the search in this specific case, is to simply start from the other end. We know that in encryption, it will be two fairly large numbers of fairly similar magnitude, there won't be a factor of, say, 17. So, just start at the sqrt(x) and work downwards. To speed it up a bit more, remove simple to detect factors (2, 3, 5...). It won't speed up the general factorization case, but for cryptography, it'll make a big difference.
You gave so little spotlight to the fact that Universal Search is an asymptotic limit to all those sequences of better and better algorithms... It is truly a wonder problems don't define inferior open boundaries in the algorithms space with the asymptotic complexity topology!
Nice catch! If you want to get rid of this, you have to think of checking together with the simulated program as one program. By that I mean that your allocated time is counted also for checking, so maybe you run out of it and you have to wait until the next iteration to continue checking. If you do it this way, everything sums up nicely and you get O(f(n)). We really did not want to get into that, since anyway for f(n) > n log(n) this term is dominated by O(f(n)), so you can also think of this imperfect search as achieving O(f(n) + n log n).
So the guarantee on any problem (say P problems) should be O(f(n)*log(f(n))) right? Checking can only be as hard as just solving the problem all over again!
@@yanntal954 We assume that, in this case, checking works in linear time and f(n) is the running time of the algorithm we compare ourselves with. Then, I claim the universal search with exponentially allocated steps with the trick I mentioned above is O(f(n)). Does it make sense?
@@PolylogCS I was talking more in general not just factoring, You should get this as a guarantee. It may not be tight but it will always be a valid upper bound! I think assuming Factoring is Omega(n*log(n)) is pretty reasonable and thus for factoring you should get O(f(n)). In general the checking process can take at most O(f(n)) (yet can't be more than that!) thus you get the aforementioned guarantee I mentioned
So you can theoretically apply this idea to find the asymptotically fastest algorithm for Matrix Multiplication? Take the input size to be N = n^2 where we are dealing with square matrices, then we know that there exists an (probabilistic) algorithm running in linear time (linear in N, the input size) that can check whether the resulting matrix C is in fact the solution to A * B (two matrices of size N. Then we can simply use this search to generate C and use this linear time check and get the "best" algorithm!
not really - what if you got lucky and found a random program that happened to find the solution for A*B, but it's useless for multiplying other matrices? you would need to run Universal Search for each new pair of matrices A and B you need to compute.
What is the smallest number for which this algorithm ends with the result of a sub algorithm that didn't simple guessed the right result but did really use some calculation? ^^
Good question. My guess is that the naive algorithm we showed, written as concisely as possible, could be among the first nontrivial algorithms to finish.
Imo it would be fun to use Symmetric Interaction Nets (which are useful for implementing something like Lamping's Abstract Algorithm) as the basis for Universal Search.
9:19 Wouldn't you get something that's roughly O(f(n)*log(f(n)))? Perhaps even O(f(n)*log^2(f(n))) by analogy to the previous construction, even though you won't get a triangle as before but you could perhaps rearrange the runs in such a manner?
@@PolylogCS I am not sure why that extra n*log(n) addition came from? but to my knowledge, log-RAM machines aren't physically possible because log(n) for every n means your machine has an unbounded word size that it can get constant time access to. For that reason I think the O(nlogn) time you get from the Harvey and Van DER Hoeven algorithm is (currently) your best bet, meaning the assumption that multiplication takes linear time probably isn't yet realizable?
@@yanntal954 If we take this discussion further, even Harvey and and der hoeven algorithm is not physically realizable, since it assumes you can do random access, which is not possible in our 3D space! (if you put too much information to one place, it collapses to a black hole) So maybe the Turing machine after all is the right model? But for large enough n anyway the universe suffers heat death before you finish computing. What I want to say is that for every model of computer, you can make an argument that it is not physically possible. In practice, Word RAM or pointer machine (there is btw no assumption on log(n) word size in the pointer machine model) are good models and maybe just view our video as a mathematical statement about these models.
There's a paper called "A note on probabilistically verifying integer and polynomial products" by Michael Kaminski from 1989 that basically let's us test in linear time probabilistically (with high probability of course), given integers a, b and c whether a * b = c. The reason why it's possible faster than Harvey and Van Der Hoevens algorithm is because it simply checks, it doesn't actually perform the calculation (at least not over Z). So, if a good enough pseudo-random generator is created, you can, in linear time do this using the mutitape Turing machine model!
A problem I see with this is that as the number to be factored gets larger, any program using actual variables will start producing wrong results due to variable size limitations. So while L is a constant for each program, it might diverge towards infinity if there is no program that will accurately factor _all_ numbers at the fastest possible time complexity. If there is a variable type that can handle any unlimited amount of digits, and that can be used in a program with the fastest possible time complexity, then that might be solved. But I don't know whether such a variable exists and can be used in the programming language this algorithm runs in
This is a great question! It ultimately leads to "what is your mathematical model of computer?" Let's say that our model is Turing machine. There, there are no "variable types" and so on, so I hope that with this model of computer, everything we say in the video makes sense and you can see that we prove a mathematically correct statement there. Turing machine is not a good model of computers, more usual model is the so-called WordRAM model, which is basically C, but you assume that if the input has n bits, the basic variables you use have log(n) bits, so that you can e.g. index your input. For this model, our statement is also correct. Your problem as I see it is basically that for very large inputs, maybe the WordRAM model is not a good model of computer. And in that you are right, similarly to asymptotic complexity, wordRAM is just another model. Not sure how much sense I make :)
@@PolylogCS yeah, if you have a variable type that can have any power of 2 as its size (and thereby display any number in a finite variable) then this works out
Рік тому+3
Python's supports arbitrarily large integers out of the box.
@ likely its not real if you can get beyond 512,1024 bit integers. Meaning that its likely in part software giving you the abstraction of super larger integers by in some form or fashion a collection of more standard 32,64bit ones. Reason i say likely is, you likely do have some 128 bit registers now adays, theres a flag you can throw at your C compiler to see. And as well since we do have sse,avx and their variants it may be python if when its built on a system passes some flags to the C compiler to build with avx extensions and maybe you can get some 512 to 1024 ones. But typically this is something ive seen when installing pandas/numpy. More broadly though, you cant address smaller than a byte on modern systems so an 8 bit integer. It would be technically impossible. So if you can have "odd" bit integers or anything beyond the max register size among all the avx/sse registers, then i do think it has to be a software abstraction. Probably 0 padded ints for bit sizes less than 64, and some arrays/linked lists as a software implementation with software defined operators of integer operations under the hood for crazy large beyond your max avx/sse register size.
@@stevepa3416 it doesn't matter if those numbers are not like that in hardware, as long as they are useful in software, so calling them "not real" is a disservice to them =). those numbers being able to be "arbitrary large" basically means that instead of being limited by architecture-defined word size, these numbers are limited by how much RAM you (can) give to the program. So they can be useful for universal search.
What if the algorithm, by pure chance, found the RSA factorization to the bank account with the most money in it, and then, by pure chance, moved all that money into your account, and then, by pure chance, made it seem like no money ever left (just duplicated it), and then, by pure chance, any flags that would pick up on this on your account were 'disabled,' and then, by pure chance, any possible way for this to be noticed was 'programmed out,' all via 'pure chance.' So... what then?
@@PolylogCS Hopefully we find a 'computer virus' that hacks the world banks and "clones" money directly into your bank account... all without being traced!
How much would it be possible to optimize Universal Search with AST-based code generation rather than text strings? If you start iterating over the input space of all possible grammatically valid programs rather than all possible text strings, this would cut down your constant factor by a fair degree. Of course, Python's not a particularly metaprogramming-friendly language, but I bet it could be done with some package somewhere...
Let us know if you manage to make this work :) Perhaps you can reach the naive factoring algorithm in such a universal search, but not much farther, I guess...
This is basically what Program Synthesis has been doing since the 1980s. How it usually works in practice is that Python is too big, you usually define a custom DSL - a specialized mini language - tailored for the algorithm you're trying to generate. For example, if you're trying to find a Sorting Algorithm, you define a mini langauge that has lists and common operations on them (swapping, iterating,...) as primitives. More importantly, the mini language *doesn't* have anything else not relevant to Sorting, no exceptions, no OOP, etc... You're essentially 'cheating', making things easier for the search algorithm by restricting its search space. Come to think of it, this bears resemblance to neutral network architectures as well, they have certain human-designed hyper parameters, and the training algorithm can focus on merely tweaking trainable parameters within those constraints. Program Synthesis is fascinating, with plenty of applications and analogs. SQL databases already have a Program Synthesizer that millions use everyday (the query planner).
What if you find a non asymptotically optimal program so long before you find an asymptotically optimal one, that it succesfully factors your number before you have a chance to find the better one?
Let's just run this algorithm on a supercomputer with something other than python. After all we need to find an algorithm for it only once to break encryption.
I think this is analogous to asking a neural network to do it. The reward function would be checking if the output is correct and rewarding the network accordingly
yes, because the output of training is a model, and those models could represent the outputs of any function on a finite domain (to even put the input into a computer requires that the domain is finite)
I feel like if you actually write this program, the simplest program to “factor” a number will literally just print its factors without actually factoring
if you wanted to generate a program which multiplies two large prime numbers, you would have to factorise each result you get, but you don't know to do itz so you can run a universal search for it
funnily enough if i had to guess this algorithm will probably just generate a print function before actually generating something that solves the problem since the print would probably be shorter character wise
Idk what needs two videos to make and explain when all you have to do to find all the factors is this: let num = 4; function betterFactorFinder(n){ let factors = new Set(); for (let i=1; i
I wonder if it would be "useful" to have some sort of hyper threaded supercomputer somewhere just constantly running universal search looking for solutions to famous problems. And it just runs forever. And spits out what it finds for us to look at.
Problems for which no optimal solution exists aren't necessarily bizarre/pathological: it's thought that matrix multiplication is such a problem. Strassen like algorithms have been able to push the complexity down from O(n^3) to O(n^2.3728596). However, it is know that the infimum O(n^w) complexity is not achieved by any algorithm. I don't know enough to know if this precludes optimal algorithms with complexities like O(n^a * log(n)) or others not of the form O(n^a)
Hm, you can actually check the result of matrix multiplication in quadratic time, modulo randomness, and probably throwing one log factor to the complexity. This should imply via our universal search that there exists asymptotically optimal algorithm, modulo randomness, and one log factor. Is this contradicting your claim or not? Do you have a link to a discussion about this topic?
@@PolylogCS I don't have a better source than wikipedia / Computational_complexity_of_matrix_multiplication, which references a 1981 paper by Coppersmith and Winograd. And yeah as far as I can tell, this doesn't mean there's no optimal algorithm with a log factor or something like that. There's a few ways that the impossibility of achieving the infimum w could be reconciled with the universal optimality of universal search: - w is a more rough measure than O since it ignores log factors - I only know of a randomized algorithm to certify matrix multiplcation in quadratic time like you alluded to, and I'm not sure how that interacts with universal search - Universal search has the runtime of the optimal algorithm, so if there is no optimal algorithm, ie for every algorithm with a runtime of O(n^a) with a > w, there is another algorithm with runtime O(n^b) with a > b > w, then the time complexity of universal search simply wouldn't be well defined
@@hacatu Regarding your last point: the complexity of universal search is always well-defined in the sense that it is some function of n (I am not claiming it is a simple function or anything like that, just that it is well defined in the sense that it makes sense to talk about it). So, you can view the argument in our video as saying that under certain circumstances I can actually prove that there exists an asymptotically optimal algorithm for a given problem. For example, whenever you give me a sequence of algorithms with complexities n^a1, n^a2, ... with a1>a2>... for matrix multiplication, I look at my Universal search (checking by Freivald's trick which I assume should be in O(n^2 log(n)) randomized complexity) and if I then go through the argument in our video, I conclude that there exists a concrete algorithm (universal search for this problem) that has asymptotic complexity O(n^a1 + n^2 logn), O(n^a2 + n^2 logn), ... In particular, assuming that all a_i's are strictly bigger than 2, my algorithm is asymptotically as fast as any of the algorithms in the list. I conclude that there exists a "limit" algorithm at least as good as any algorithm in your list. Does it make sense?
@@PolylogCS I think that the closer an algorithm gets to the limit w, the longer the length of the shortest implementation L, and so if a program actually achieved the limit w, it would have infinite length. Universal search usually sweeps the length of the program under the rug, because it hugely affects the runtime but is independent from the input size, but it can't sweep an infinite length under the rug. But that's not a fully satisfying argument, because universal search matrix multiplication obviously has some fixed length K, so how could L(a) ever exceed K? If any program has length L ≥ K and is O(n^a), then by construction the complexity of universal search is at most O(n^a). So I'm not sure
@@hacatu Think of it in this way. Given an algorithm A, denote with A(n) the maximum time it takes to solve any instance of a problem of size n. The construction presented in the video shows that the algorithm US has the property limsup US(n)/A(n) < +oo, for every algorithm A. This implies that there cannot be any algorithm which is better than US in the sense that lim A(n)/US(n) = 0, i.e. US is optimal.
If you apply this algorithm to a case where there is no answer, (like factoring 5) your program would never finish. We know that none of its trial programs can ever output a factorization of 5, i.e. the algorithm will keep spawning new trial programs indefinitely; and it would never tell you that 5 is not factorizable. This means that the algorithm would be limited to inputs of which we _know_ ahead of time that they can be factored into at least two integers (i.e. only non-primes). And it doesn't seem like the primality question is solvable with a universal search. It would only work for tasks like sorting, where we know that every input has _an_ answer that satisfies the "is this sorted" test. One further bonus problem that you would care about if you wanted to run this algorithm in practice is that the algorithm would also eventually write all _viruses_ that are possible in python or brainfuck. I.e. your computer might crash before it gets to the factorization of 4. [Insert "In soviet Russia, factoring numbers hacks YOU" style joke.]
There are no viruses in Brainfuck, since the only operations that interact with the environment are "get a character from the input" and "write a character to the output." And remember, we can't use `eval`, since we need to run things one step at a time, so we actually need to write our own interpreter. So, just leave out all the file/network access, and everything that can be used to make a virus.
My favorite universal search algorithm is very simple and very powerful and it runs in the same time complexity as it takes to check if an answer is correct. step 1, create a universe for every possible input. step 2, check if the answer is correct. step 3, if not correct. Destroy the universe. Leaving you with only one universe with the correct answer right away
Hmm, in practice, how often does base US actually find a general algorithm? Ignoring the input and just emitting the constants that happen to be the factors for the given instance of factoring seems like it would be shorter in length + execution time than a general algorithm that reads the input case, does work to factor it, and then emits the factors. At least with the improved one and sufficiently large inputs, the actual size of an algorithm discovered by US is likely to be shorter than the length of the input, so, that is good I'm reminded of Permutation City, which, you have almost certainly read, but, if you haven't, check it out. It is an excellent book by Greg Egan
So, the only problem is O? You can solve that fairly easily, if you design the hardware specifically for the task at hand, even more than CPU and GPU are designed for graphics and computation, then you can cut down on O by huge amounts and even on the time complexity itself by streamlining processes. This is the exact method the NES used to avoid the lag so common on other systems of its time and it shows.
I love how it's just a matter of having an infinite time span filling an infinite memory bank processed through an infinite computanional unit amount to have the best search algorithm. 👍
This is interesting! I always thought of running some kind of Genetic Algorithm for a long time on a PC to generate a code to solve a very simple problem. I was curious how long it takes for an evolutionary process to come up with a solution (read find a species that fits the environment). Maybe that idea can be used here?!
I implemented this idea in lisp a few years ago, i can give you the github if interested. Althought it was very poorly implemented, say, for example, finding a program that could solve aquadratic equation took something like 20 minutes if i remember correctly
To avoid syntax errors being generated from your random program you simply need to define a way to store the program such that syntax errors aren't possible byte code for instance, can be good at this.
wouldn't universal search in general keep finding hardcoded solutions before any actual algorith? Because if that's the case the algorithm should have the same complexity as a brute forcing algorithm
For practical purposes, yes. You have to imagine what happens when the number to factor is so large it does not fit into the universe!
Рік тому+3
Think about what happens if hardcoding the answer takes more bytes than computing the answer. As an example, instead of thinking about factoring, think about finding the asymptotically optimal algorithm to decode an mp3 file to wav. For the sake of argument, let's assume that the optimal algorithm takes 1 GiB to write down, then you will find it before hitting the hardcoded solution, if your output wav-file is longer than 1 GiB.
Рік тому
@Patrick Lenihan Well, hardcoding is just a special case of your 'incorrect algorithm'.
The only problem with this solution is that you may have a program that sends your nudes to all your friends on Facebook before you get the one you want. It's a risky game.
I wonder if you can improve Universal Search? if you have N=p*q, then factoring N will have a program of the sort "print(p,q)" show up. Since the length of p and q is roughly the length of N, that means the order of the algorithm is faster than O(n). (I mean, of course we knew that - even guessing each number is O(sqrt(n)) ) But if you knew that a factoring algorithm existed and ran in "K" steps, you could provide a lower bound on time, and any program running more than K steps can be stopped. But i suppose the idea of that would be to find the holy grail factoring algorithm, and not just have to search every time. if you set a bound with the exponential growth as mentioned previously, you would have a runtime of K*L + 2*f(n) While L is ungodly big, as in the video, the factor in front isn't as bad.
If the upper bound K is based on some known factoring algorithm (e.g. guessing each number) then K is a function of n. Let's call that K(n). Then the runtime bound you give for the search algorithm is K(n)*L + 2*f(n) = O(K(n)), i.e. this expression is actually dominated by K(n). The algorithm with the upper bound is actually still faster than universal search, it's just that you don't actually run all those (L-1) bad programs the full K(n) steps, since the optimal algorithm L may terminate before you simulate all those extra steps in all those bad programs. This makes a big difference, if n is large enough. So the runtime is still O(f(n)) overall, and the constant factor is probably still something like 2^L if I had to guess.
yes, i think that's mostly correct. However, I don't think the constant factor is of order 2^L. The number of operations is bounded by K(n)*L + 2*f(n), and since a simple algortihm is to guess, K(n)=n in this scenario. So n*L + 2f(n) is the time, and of course, n*L dominates, but is far less than 2^L. (Would it be appropriate to call this O(nL)?)
@@purplenanite n is variable, 2^L is constant, so n gets much larger than 2^L as n gets arbitrarily large (i.e. asymptotically). In general, K(n) is asymptotically larger than f(n), so a*K(n) + b*f(n) is on the order of K(n), regardless of constants a and b. The point is that this bound isn't better than the original bound of 2^L * f(n), which still applies.
For an algorithm that works, there will be numbers such that writing print(p, q) is an even longer query thanks to how many digits it takes to represent p and q.
I didn't really understand one thing: if the universal search has time complexity O(f(n)) if there is an algorithm with time complexity f(n), why should it "pick" the fastest algorithm? Like say there are two algorithms, A and B, to solve problem P. A has time complexity O(x) and B has time complexity O(x^2). Why would universal search have a time complexity of O(x) instead of O(x^2) when used to solve problem P?
This is universal valid language generator, valid on one criteria and one leanguage set. Many program make some, but not always you think. Universal search is anyway very interesting thing. My noob opinion in searching.
I've been thinking a bit about this vid and while I do appreciate the concept, I'd like to point out a flaw. It's important to note that any brainf*ck program can be represented exactly by a one-tape turing machine (so is bound in speed). The extended church turing thesis only yields that a 1-tape turing machine can implement an algorithm in any other computational system to some polynomial slowdown, and that polynomial slowdown can be pretty trivially shown to be problematic for even basic problems like classifying palindromes where any matching 1-tape machine is at best O(n^2) where with a multi tape setup we can see a simple O(1) method. You can say that if factoring is in P, you've found a poly time algo for it, but you can't be more fine grained than that without more care.
Small correction: the typical one-tape turing machine model involves pre-pasting the input onto the work tape, so brainf*ck could be faster in some very specific instances due to its relative freedom in i/o. We can properly bound the problem with a 3-tape turing machine with a work tape, input tape and output tape where the head on i/o tapes can only go right, though the analysis does not change (and the example of palindromes is not deprecated either due to our restriction on the input tape head movement and lack of knowledge of the string length)
What if it just finds the program: Printf("2 2") Wouldnt that also be considered a solution? Or do you just have to check the program with multiple numbers so it is unlikely that the simple printf cheat would work?
This is absolutely possible and very likely. The search will just terminate there and will be happy with the valid solution. Note that we are only making a worst-case argument about the time complexity of the algorithm, but it is very possible that it will coincidentally finish faster when it encounters mostly completely wrong algorithm that accidentally produces the correct answer. The point is that in the extremely unlikely case that no such wrong algorithm that produces the correct answer exists, it will still eventually get to the 'correct' algorithm that produces the correct answer.
@@PolylogCS I see. So an upper limit to how much time is needed to find the wrong or right solution since it only has a flawed acceptance function to check. This feels a lot like training a neural net with a naive fitness function 😂
That is ... like shooting an atom bomb at sparrows. It feels absurd to even call it a solution to the problem, yet it feels absurdly funny to even think about using it
We're glad you enjoyed it and had some fun thinking about it! That was exactly the point, not to show something immediately practical, but make you think about some theoretical concepts that can in return widen and deepen your insights.
If we assume that this algorithm will go over all possible programs, then for each composite number there exists a program that returns its factors... thus making the asymptotic time O(1) ? For example, in the case of 4, the algorithm might find the following program "return 2, 2"
Not quite - if your number is **really large**, the search will find a "correct" algorithm before any other one. It's not O(1) because going through all the algorithms to find it is really difficult.
simply using the fastest multiplication algorithm for checking isn't good enough, since you're not just multiplying numbers of length n. You have to check numbers of any length that can be generated in the available number of steps, which will put the time taken around f(n)log(f(n)) (multiplication can be done in nlogn for numbers of length n) for the biggest multiplications. To get there your checking function can just count the length of the factors, since if they're too long they can't be correct. That will make it around nlogn, rather than f(n)logf(n). doing all checks puts you around n*logn*log(f(n)) (overestimate for log(f(n)) checks of numbers with length f(n), in reality the max output length is decreasing for the later programs). this gives you O(f(n)+n*logn*log(f(n))) in the very end, for factoring we can assume that the first term will outgrow the second, so O(f(n)).
Actually, multiplication can be done in linear time in the standard ram model or on pointer machine. But you are totally right that if you choose a more fine grained model, then we are not optimal and probably losing at least one log factor for multiplication.
1:55 slight error: these functions are actally *elements* of (and thus not *equal* to) O(n^2) since the O-Notation actually gives you the set of all functions with that asymptotic growth.
Thanks for pointing that out! This is actually what I don't like on this notation - so many times I want to write things like n < O(n^2) or O(n) < O(n^2) but if you want to be formal, you have to overload the meaning of "=" or write n \in O(n^2) or O(n) \subset O(n^2) which looks crazy.
@Numerius Aha, I thought the error was in the audio. If Spulg argues just about what is on the screen, I also go with that it's just a different definition of "=" :)
I got used to using in/subset in school... but now whenever I'm working I always just go with =. Never considered using < before, but that notation makes sense, too.
But there might be a super genius algorithm that could factorize in O(log n). That case, multiplying takes O(log n log log n) and thus your algorithm will take O(log n (log log n)²) to terminate which is not the fastest.
My nondeterministic computatortron always solves problems in the least possible finite number of steps, or it runs forever. It runs all possible programs against all possible inputs in parallel. Each cycle of the computatortrob runs one cycle of each program/input. As soon as one of them spits out the solution, the computatortron halts.
You should probably run it in a sandbox and simulate a language that is not allowed to execute any system commands and can only access a virtual memory :-)
@@deltamico If an algorithm that solves the problem exists, eventually (for high enough n) the universal search will reach said algorithm and thus won't check the stuff past that point. And we don't care about small n for an asymptotic analysis.
But does it actually solves the problem? When search is performed, we search for the algorithm, which just passes one test case, but not really computes the solution. We can just run into the "print("2, 2"") program, can't we?
Indeed! Most likely the search will find print("2 2") much earlier than the real optimal algorithm and just uses that one. The point is, that for some super obscure super large input, it might not be the case. The main point is that it could find a wrong algorithm that accidentally gives the correct answer, but in the worst case it will find the optimal algorithm and never be "much" slower than that (it can be much faster by some "accident").
To accurately say, that this is asymptotically accurate algorithm, there is a need to explain, why we wouldn't stop at some solution with complexity g(n), such that O(g(n)) != O(f(n)). Suppose L is the so called `number` of an algorithm with complexity g(n) within the list of all `compiling` programs, and L' is the similiar number of an algorithm with O(f(n)), where f is the optimal complexity. Number of steps, performed to algorithm g(n) is actually number of steps performed to f(n), multiplied by some constant (it is somewhere near 2^(L' - L) ) So, as n approaches infinity g(n) * C > f(n), due to O(g(n)) != O(f(n)) and f(n) - is the optimal algorithm. Also, to me it is unclear the latest statement, that there is no infinity sequence of decaying complexity functions. We assumed, that L in analysis is a finite value, but if there is exists such sequence, wouldn't be this assumption wrong? Correct me if I am mistaken somewhere. Also sorry for my poor English, it is not my native :sadge:.
Great question! I agree that the argument about the infinite sequence of better and better algorithms is not explained in very much detail. Let me give you a small hint: Imagine there would be such a sequence of algorithms, our "Universal Search" algorithm is asymptotically not slower (or faster) than all of them - well, then this very algorithm, which we implemented in Python, is also somewhere in the list and that algorithm itself would be the finite one algorithm with some very specific (maybe unknown) complexity. Does that help to understand better?
I don't think I understand. With this, you only find the first program that solved your input. You do not find one which solves *all* inputs. Also, assymptotical is (also) about running time *versus* different input size. If you have only one input it doesn't make sense.
Ok, now I understand, the algorithm is "run an universal search on whatever input you receive." It doesnt matter if it is "print 2,2" for 4 and then a complete sieve for 6, we are only talking about the aggregate algorithm, not the intermediate one, right?
You got it! That's exactly how it works, there's no guarantee that it actually finds the optimal algorithm before it finishes (most likely not, it will likely find some incorrect algorithm before that that accidentally produces the correct answer). The only promise we give is that at the very worst case it won't take much longer than eventually finding the optimal algorithm which is still at some constant index in the list of all algorithms!
There's something I'm a bit puzzled by. In order to check whether an algorithm is a factoring algorithm, it would have to work for every input. Not just "4". If all we want is to factor 4, then the program that always print "2 2" is likely the shortest that outputs the correct answer, but is not a factoring algorithm.
Correct. This _factorization algorithm_ doesn't ever need to run a _factorization program_ to work. That's indeed counter-intuitive, but all that counts is that the _overall_ algorithm works for every input, not that it ever tries a _trial program_ that does. I would say that "a set of instructions that is guaranteed to select a correct output" is actually a perfectly valid definition for what an algorithm is. Keep in mind that we are talking about _asymptotic_ complexity here. Short inputs (as long as it only applies to a finite number of inputs) are perfectly valid to have runtimes that don't follow the curve described by the function that describes its algorithmic complexity. And if you think of factoring long numbers, compare about how the program shown at 7:03 that computes the factorization has a fixed length to how any program that naively outputs a hard coded solution to a sufficiently long input will have to contain that answer and is thus longer than the most concise "actual" factorization program. And the naive program will then be started only after the actual factorization program was started. The naive one _might_ still finish first. It depends on how the optimal "actual" factorization scales and how much of a head start it has over the naive one. [edit:] In fact, for a number with d digits, going through all "naive" programs has a complexity of 10^(2d). While even the most straightforward algorithm has only complexity 10^(d/2). For another way to think about this paradox, consider the algorithm that for any input is supposed to compute the square and give you the remainder mod 4 of that square. But we can use our brains before writing that algorithm and consider the even and odd numbers separately: Squares of even numbers have at least 2 factors of 2 in them, or at least 1 factor of 4, which means their square mod 4 is 0. Odd numbers are either 1 mod 4 - which squared is 1 mod 4; or they are 3 mod 4 - which squared is also 1 mod 4. I.e. the actual program at runtime only has to check if a number is even or odd and then it can simply output the pre-computed results of 0 or 1, without ever "actually" doing the calculation of the result. If the universal search algorithm is able to pick a program like this that "has the actual work done ahead of time"; then why shouldn't it instead be "doing the actual work" after running its little program, when it runs the output check.
@@Pystro Ah, yes. I misunderstood that step. The universal search algorithm *is* the factoring algorithm (as long as the check is correct). It does not *find* it. And the interesting part compared to a simple exhaustive search (producing all bit strings of increasing length and testing if it's the right answer) is that it has an asymptotically optimal complexity. Assuming the problem has a finite-size optimal algorithm to solve it. Which seems to be usually the case. But is it always? (It feels like there could be a link with undecidability but even checking an answer to the halting problem is problematic.) Actually, it looks like the universal search algorithm is related to the Kolmogorov complexity of the solution for the given input. Except it does not exactly compute it as it prefers a longer program that runs exponentially faster.
Blog post with more clarifications/details: vasekrozhon.wordpress.com/2024/09/03/the-most-powerful-and-useless-algorithm/
I think thats pretty funny because most of the times the algorithm will stop not because it ran "the correct optimal algorithm" but just print the answers and they happen to be right... It would only start to do better than "print all pairs of 2 numbers and check" when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long.
"when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long" -- it's funny because a lot of programming is exactly that, writing a piece of code that's shorter than the answer
Presumably, there exists an algorithm that requires a finite amount of code to implement regardless of the size of the input or output. That algorithm may be longer than the shortest "print()" solutions up to a point, but again: it doesn't grow as the input size grows while "print()" does. Just like how the program Gzip ramains the same size whether you want to compress a Megabyte file or a Petabyte file. There will be some largest number where "print()" is more efficient, beyond that, there are infinite numbers where implementing the actual algorithm is more efficient.
Therefore the phrase "most of the time" is misleading here, because finite
Waiting for this algorithm to finish is more akin to gambling than anything else. I love it 0_0
gambling addictions can be very srious 0_0
@@wadswa6958 there is a reason why we do computer science
Well..... doooof! It is 'randomly generating,' so of course it is 'akin to gambling.'
I would love to introduce you to ✨Bogo Sort✨
@@abraxas2658 bogos sorted 👽
Let's also admire the fact that this algorithm implements and executes perfectly aligned AGI. And of course all the paperclip maximizers and alike. The slowest doomsday device
This is why we added the try catch block to the checking function, we don't want the paperclip maximizers to escape!
Shoot I didn't think of that! QUICK someone stop the factoring machine!
@@PolylogCS you try to catch it, but what about the escape key on your key chain, did you remember to remove it after locking the AGI in it's block chain.
Only if they are generated before an algorithm that generates a valid solution (even in an invalid way), and execute before that altogether outputs
It'll also eventually output a 4K video of you and the grandma next door starring in a hardcore adult film directed by Michael Bay and narrated by Morgan Freeman
at 4:09, you can simply write a function that inputs the program and outputs if it runs infinitely in a loop or not ;)
One problem is that there is no program that could determine whether an input program halts or not (this is a so called halting problem)
this is one of the funniest either dumb or bait comments I've read in a while. "At this point, you could have simply solved the halting problem ;)"
@@PolylogCS Actually, it's possible for computers with finite memory!!!!!!
@@user-dh8oi2mk4f If we're talking about things with finite memory, they're technically not Turing Machines, which have an infinite amount of memory, so it's technically a different problem (hence why saying this doesn't break math and win you a Turing award)
Haha, good one, lol
Seems like for composite numbers where the string containing the solution program is big enough, before reaching that solution, the universal search algorithm might at one step generate a program which is also a universal search algorithm itself, searching for the same answer, which would also create another nested universal search algorithm program before reaching the answer and so on. That would be pretty cool :D
Yes, that's absolutely correct! Unfortunately, that other nested instance of itself won't have any chance of finishing faster than the high-level universal search, because it needs to do at least as many steps. What could be even more funny is that it'll implement some better universal search that might use some better programming language and its simulator which will run much faster and generate the correct program faster...
This reminds me of the Library of Babel for some reason. Somewhere in there, anything you can think of exists. Good luck finding it lol. (Although in that case there is a straightforward mapping between text and location.)
In the case of factoring primes, I'm pretty sure it would be faster to brute force all the numbers < sqrt(p*q) than to randomly generate a program that would do it. I also wonder if the universal search algorithm could be quantum-ified to check all possibilities of, say, length n, at once. But of course a sophisticated quantum computer like that could crack large primes in its sleep anyway. Perhaps other problems would benefit from it though.
Interesting connection with the library of babel! Regarding quantum computers, one (relatively boring) thing you can do is to adapt universal search for them so that it is as fast as any quantum algorithm. But as you say, factoring is one of a few problems that quantum computers are known to solve quickly, so the question about the fastest quantum algorithm for them is perhaps not so interesting.
Well, maybe you can use quantum computers to find the fastest algorithm so that it can later on be used by traditional computers
Because it is exactly the same. Which proves the P, NP problems are all bunk science.
It isnt the same. Universal search is an algorizhm that implements every algorithm a turing machime can simulate, and so it can solve every problem. Moduloing a large number by every number smaller then it WILL solve prime factorization, but it will only solve time factorization.
> _"Library of Babel"_
i was thinking same as well...
aside: when i watched vid about that weeks ago; i never thought that i would face that concept again so soon.
Amazing video, I didn't know about the universal search algorithm before and found it very interesting.
I wish you had done the derivation for why doing exponential steps on previous machines had complexity 2^L + f(n) since I would've imagined a log_2 to show up, but I managed to figure it out (I think) for anyone interested:
Since each level grows by powers of 2 when the next machine gets added, you need to add log_2(f(n)) new machines to the search before the machine at level L will finish (2^log_2(f(n)) steps = f(n) steps). This means the total height you get to is L + log_2(f(n)) and since the total number of operations is the sum of many consecutive powers of 2
2^0 + 2^1 + 2^2 + ... + 2^(L - 1 + log_2(f(n))) =
2^(L + log_2(f(n)) ) =
2^L + f(n)
You get the final complexity in the video.
Yes! One reason why we did not go into this is that there is one subtlety lurking there that Ondřej Bouchala promptly uncovered. Check out also his comment! :) Also, there is a small mistake in your derivation: 2^(L + log(f(n))) = 2^L * f(n).
@@PolylogCS could you pin it?
This is a chronically underrated channel. Keep up the great videos!
Thanks!
8:04 It's fun to think about how no syntax errors is required for this "worst case" scenario.
Syntax errors are really the same thing as program termination. So, it isn't hard to come up with a system where any code is valid code, i.e. syntax errors are impossible.
In Brainfcuk, there exists only one syntax error: Incorrect brackets, like for example "][".
If one redefines [ and ] to have well defined behavior in these cases, any string of BF commands will be a valid program.
For example, redefine ] such that in the case of no matching [ it moves the instruction pointer to the beginning of the program, and redefine [ such that in the case of no matching ] it moves to the end of the program, i.e. just terminates it.
This is like the physics troll meme of computer science, I love it.
Great video! I'd like to add that "Polynomial time?"-level of abstraction is useful in areas like complexity theory (well, at least "old school" style sub-areas of it) since you might want to study time complexity as a property of a problem and not of an algorithm that solves it, so you would like to use a measure of complexity that depends as little as possible on computational model that you use. Different models of computations often disagree on the power of a polynomial, but very rarely disagree on polylogarithimc/polynomial/superpolynomial/subexponential/exponential/etc. levels of precision.
Generating AST instead of code would cut a lots of non runnable code. Assymptotically the same, but makes a little bit more sense.
Oh, and also its funny, that once in a while it is going to generate itself, so its a kinda quine)
Asymptotic complexity is great most of the time, you only have to not throw out the constant factor and it might not be the best idea to lexicographically order all programs by it
the moment I got my head around it my brain instantly went to a version of universal search that uses assembly commands and numbers instead of all characters, with the goal of finding the most optimal algorithms for math functions through brute force.
That actually exists. Compilers encounter multiplication by constants (e.g. size * 13) or other math functions. Compilers that generate highly optimized code have libraries of small chunks of code that compute specific math functions. They build that library by trying all possible small chunks of machine language.
This is why we don't use big O in our course about algorithms. We use something like tilde notation witch adds the coefficient to the highest exponent term and you get a way better representation to the performance of an algorithm to compare to other algorithms.
You might enjoy looking at the video description! :) turns out you can make the multiplicative constant 1.01 and hide the horrible stuff to additive constant!
Any notation that hides information has to be assumed to only be useful in certain cases. The tilde that gives you slightly more information is nice, but is a level of abstraction that you sometimes don't need when simply analyzing an algorithm; I really don't care if an algorithm is 1000(n log n), since that's worse than being n^2 regardless, and the coefficient isn't going to be higher in any reasonable algorithm.
So… it’s bogo sort for factoring algorithms… but even worse. I love it!
Well, worse bogo sort for all algorithms. It's basically a random output generator where for each output you check if it is correct.
If you get a correct output from an algorithm, it doesn't mean it will give correct output for a different input.
You could optimise it for the problem you are solving, for example generating pairs of 2 INTEGERS which would never output 'banana'
but that is no fun given the context of universal search.
This also works only for problems which have a constant time complexity for checking the solution, otherwise it wouldn't be "asymptoticaly optimal".
No, you gotta sell it. It's the All purpouse general Genetic algorithm
It's actually better than bogosort, because this algorithm is at least guaranteed to give you the correct answer in a finite amount of time while the time complexity for bogosort is unbounded, meaning it could just never land on the correct permutation if you're "lucky"
@@postmodernist1848 It's better than random bogosort, but roughly equivalent to deterministic bogosort. Deterministic bogosort works in effectively the same way as universal search, by iterating through all the possible solutions.
i used to wonder why bogo sort even had a name - now i know.
what's more; i genuinely believed that it was just the youtube channel's given name
Python obeys a grammar. You could write an algorithm to generate valid python syntax and drastically reduce the search space
Man finally got a channel which actively discusses CS problems and their algorithms passionately, your explanation of analysis is very good.
I find it weird that the most common measure of complexity in computer science is big O, ignoring the constant factor of the highest order term. In mathematics, I would only ever mean the equivalence relation "~" by the phrase "assymptotic to...", i.e. the ratio of the two functions tends to 1 in some limit (which could be n tends to infinity, as is always meant in computer science). And I think this is a much more generally useful term for algorithms where the behaviour is well understood (i.e., all the constants are easily measured and/or bounded). Even just giving an order of magnitude upper/lower bound to the coefficient of the leading term is enough information to tell you if the algorithm is practical or impractical.
For example, someone might see that heap sort and merge sort are both O(n log n) comparison sorts, so would just think "they are both optimally fast, so I'll just use heap sort because merge sort takes a higher space complexity". But this misses the point that merge sort takes ~ n*log_2(n) comparisons, while heap sort takes ~ 2n*log_2(n) comparisons, in the worst case (can be optimised, but this is the simple case to demonstrate the point). So the truth is that heap sort takes O(n log n) *more* comparisons that merge sort, which might help explain the trade off that merge sort is making with space for time.
I totally agree that this way, you get more information about the algorithm! But it also gets hard to compute the leading constant, unless you decide to measure something as simple as "number of comparisons". Also, you don't capture cache behavior (important for quicksort) etc. Finally, it turns out (check the video description) that there is a universal search with the property that if there is an f(n) algorithm, the search has complexity 1.01f(n) + O(1), so even in your way of measuring complexity, universal search looks like an amazing algorithm.
7:44 or more likely, it would terminate because of an algorithm that says something like this:
return 2,2
Ah yes, the classic case where big O hides a dependence on another quantity than the input size
A simple way to speed up the search in this specific case, is to simply start from the other end. We know that in encryption, it will be two fairly large numbers of fairly similar magnitude, there won't be a factor of, say, 17. So, just start at the sqrt(x) and work downwards. To speed it up a bit more, remove simple to detect factors (2, 3, 5...).
It won't speed up the general factorization case, but for cryptography, it'll make a big difference.
Lovely video about a fascinating concept, thanks!
You gave so little spotlight to the fact that Universal Search is an asymptotic limit to all those sequences of better and better algorithms... It is truly a wonder problems don't define inferior open boundaries in the algorithms space with the asymptotic complexity topology!
7:30 nice arpeggio
In the improved case we need to run the checking of the result log(f(n)) times. That gives log(f(n))*n. How do we know that it is O(f(n))?
Nice catch! If you want to get rid of this, you have to think of checking together with the simulated program as one program. By that I mean that your allocated time is counted also for checking, so maybe you run out of it and you have to wait until the next iteration to continue checking. If you do it this way, everything sums up nicely and you get O(f(n)). We really did not want to get into that, since anyway for f(n) > n log(n) this term is dominated by O(f(n)), so you can also think of this imperfect search as achieving O(f(n) + n log n).
So the guarantee on any problem (say P problems) should be O(f(n)*log(f(n))) right?
Checking can only be as hard as just solving the problem all over again!
@@yanntal954 We assume that, in this case, checking works in linear time and f(n) is the running time of the algorithm we compare ourselves with. Then, I claim the universal search with exponentially allocated steps with the trick I mentioned above is O(f(n)). Does it make sense?
@@PolylogCS I was talking more in general not just factoring, You should get this as a guarantee. It may not be tight but it will always be a valid upper bound!
I think assuming Factoring is Omega(n*log(n)) is pretty reasonable and thus for factoring you should get O(f(n)). In general the checking process can take at most O(f(n)) (yet can't be more than that!) thus you get the aforementioned guarantee I mentioned
Oh my god you even got a haircut to match Levin
So you can theoretically apply this idea to find the asymptotically fastest algorithm for Matrix Multiplication?
Take the input size to be N = n^2 where we are dealing with square matrices, then we know that there exists an (probabilistic) algorithm running in linear time (linear in N, the input size) that can check whether the resulting matrix C is in fact the solution to A * B (two matrices of size N. Then we can simply use this search to generate C and use this linear time check and get the "best" algorithm!
not really - what if you got lucky and found a random program that happened to find the solution for A*B, but it's useless for multiplying other matrices? you would need to run Universal Search for each new pair of matrices A and B you need to compute.
@@tantarudragos If you get lucky you still run in at most n^ω steps where ω is probably 2.
Very good video, i'm a computer science student from Brazil!!
ny favourite non-constant time complexity function is the disjoint-set data structure search, which is amortized O(inverse_Ackermaan(n))
What is the smallest number for which this algorithm ends with the result of a sub algorithm that didn't simple guessed the right result but did really use some calculation? ^^
Good question. My guess is that the naive algorithm we showed, written as concisely as possible, could be among the first nontrivial algorithms to finish.
Imo it would be fun to use Symmetric Interaction Nets (which are useful for implementing something like Lamping's Abstract Algorithm) as the basis for Universal Search.
Just out of curiosity, did your program ever give you a result?
No.
3:05 As someone who loves writing in Raku (perl 6), I hate to say it but you are correct.
Love the "lessons" at the end!
Thanks, Przemek!
"Is often obvious just by looking at it" oh I wish
Great video, love seeing accessible cs theory on yt
Glad you enjoyed!
9:19 Wouldn't you get something that's roughly O(f(n)*log(f(n)))?
Perhaps even O(f(n)*log^2(f(n))) by analogy to the previous construction, even though you won't get a triangle as before but you could perhaps rearrange the runs in such a manner?
Check out the answers to comments from TheDmviper and Ondřej Bouchala!
@@PolylogCS I am not sure why that extra n*log(n) addition came from?
but to my knowledge, log-RAM machines aren't physically possible because log(n) for every n means your machine has an unbounded word size that it can get constant time access to.
For that reason I think the O(nlogn) time you get from the Harvey and Van DER Hoeven algorithm is (currently) your best bet, meaning the assumption that multiplication takes linear time probably isn't yet realizable?
@@yanntal954 If we take this discussion further, even Harvey and and der hoeven algorithm is not physically realizable, since it assumes you can do random access, which is not possible in our 3D space! (if you put too much information to one place, it collapses to a black hole) So maybe the Turing machine after all is the right model? But for large enough n anyway the universe suffers heat death before you finish computing. What I want to say is that for every model of computer, you can make an argument that it is not physically possible. In practice, Word RAM or pointer machine (there is btw no assumption on log(n) word size in the pointer machine model) are good models and maybe just view our video as a mathematical statement about these models.
There's a paper called "A note on probabilistically verifying integer and polynomial products" by Michael Kaminski from 1989 that basically let's us test in linear time probabilistically (with high probability of course), given integers a, b and c whether a * b = c.
The reason why it's possible faster than Harvey and Van Der Hoevens algorithm is because it simply checks, it doesn't actually perform the calculation (at least not over Z).
So, if a good enough pseudo-random generator is created, you can, in linear time do this using the mutitape Turing machine model!
A problem I see with this is that as the number to be factored gets larger, any program using actual variables will start producing wrong results due to variable size limitations. So while L is a constant for each program, it might diverge towards infinity if there is no program that will accurately factor _all_ numbers at the fastest possible time complexity.
If there is a variable type that can handle any unlimited amount of digits, and that can be used in a program with the fastest possible time complexity, then that might be solved. But I don't know whether such a variable exists and can be used in the programming language this algorithm runs in
This is a great question! It ultimately leads to "what is your mathematical model of computer?" Let's say that our model is Turing machine. There, there are no "variable types" and so on, so I hope that with this model of computer, everything we say in the video makes sense and you can see that we prove a mathematically correct statement there. Turing machine is not a good model of computers, more usual model is the so-called WordRAM model, which is basically C, but you assume that if the input has n bits, the basic variables you use have log(n) bits, so that you can e.g. index your input. For this model, our statement is also correct. Your problem as I see it is basically that for very large inputs, maybe the WordRAM model is not a good model of computer. And in that you are right, similarly to asymptotic complexity, wordRAM is just another model. Not sure how much sense I make :)
@@PolylogCS yeah, if you have a variable type that can have any power of 2 as its size (and thereby display any number in a finite variable) then this works out
Python's supports arbitrarily large integers out of the box.
@ likely its not real if you can get beyond 512,1024 bit integers. Meaning that its likely in part software giving you the abstraction of super larger integers by in some form or fashion a collection of more standard 32,64bit ones.
Reason i say likely is, you likely do have some 128 bit registers now adays, theres a flag you can throw at your C compiler to see. And as well since we do have sse,avx and their variants it may be python if when its built on a system passes some flags to the C compiler to build with avx extensions and maybe you can get some 512 to 1024 ones. But typically this is something ive seen when installing pandas/numpy.
More broadly though, you cant address smaller than a byte on modern systems so an 8 bit integer. It would be technically impossible. So if you can have "odd" bit integers or anything beyond the max register size among all the avx/sse registers, then i do think it has to be a software abstraction. Probably 0 padded ints for bit sizes less than 64, and some arrays/linked lists as a software implementation with software defined operators of integer operations under the hood for crazy large beyond your max avx/sse register size.
@@stevepa3416 it doesn't matter if those numbers are not like that in hardware, as long as they are useful in software, so calling them "not real" is a disservice to them =). those numbers being able to be "arbitrary large" basically means that instead of being limited by architecture-defined word size, these numbers are limited by how much RAM you (can) give to the program. So they can be useful for universal search.
What if the algorithm ended up checking and running a dangerous program before it found the factorisation?
You definitely need to be careful with the sandboxing since you are iterating also over all computer viruses in universal search... :)
What if the algorithm, by pure chance, found the RSA factorization to the bank account with the most money in it, and then, by pure chance, moved all that money into your account, and then, by pure chance, made it seem like no money ever left (just duplicated it), and then, by pure chance, any flags that would pick up on this on your account were 'disabled,' and then, by pure chance, any possible way for this to be noticed was 'programmed out,' all via 'pure chance.'
So... what then?
@@PolylogCS Hopefully we find a 'computer virus' that hacks the world banks and "clones" money directly into your bank account... all without being traced!
@@pyropulseIXXI Then oof
How do we know this comment was not generated by Universal Search during the factorization of 42?
How much would it be possible to optimize Universal Search with AST-based code generation rather than text strings? If you start iterating over the input space of all possible grammatically valid programs rather than all possible text strings, this would cut down your constant factor by a fair degree.
Of course, Python's not a particularly metaprogramming-friendly language, but I bet it could be done with some package somewhere...
Let us know if you manage to make this work :) Perhaps you can reach the naive factoring algorithm in such a universal search, but not much farther, I guess...
This is basically what Program Synthesis has been doing since the 1980s.
How it usually works in practice is that Python is too big, you usually define a custom DSL - a specialized mini language - tailored for the algorithm you're trying to generate. For example, if you're trying to find a Sorting Algorithm, you define a mini langauge that has lists and common operations on them (swapping, iterating,...) as primitives. More importantly, the mini language *doesn't* have anything else not relevant to Sorting, no exceptions, no OOP, etc... You're essentially 'cheating', making things easier for the search algorithm by restricting its search space. Come to think of it, this bears resemblance to neutral network architectures as well, they have certain human-designed hyper parameters, and the training algorithm can focus on merely tweaking trainable parameters within those constraints.
Program Synthesis is fascinating, with plenty of applications and analogs. SQL databases already have a Program Synthesizer that millions use everyday (the query planner).
What if you find a non asymptotically optimal program so long before you find an asymptotically optimal one, that it succesfully factors your number before you have a chance to find the better one?
"... and in the remaining cases PERL programs" :D haha ❤
was waiting for it
I did a spit-take when you said to 'simply allocate exponentially more simulation steps to earlier programs'. Grotty! Imagine how slow that is!
It works! Until you hit a program that infinite loops, like +[]. (The universal search algorithm they implemented is in brainfuck...)
What's the memory complexity of this algorithm?
Let's just run this algorithm on a supercomputer with something other than python. After all we need to find an algorithm for it only once to break encryption.
I think this is analogous to asking a neural network to do it. The reward function would be checking if the output is correct and rewarding the network accordingly
yes, because the output of training is a model, and those models could represent the outputs of any function on a finite domain (to even put the input into a computer requires that the domain is finite)
Its definitely no genetic algorithm.
I thought you would continue by addressing p vs np
I feel like if you actually write this program, the simplest program to “factor” a number will literally just print its factors without actually factoring
if you wanted to generate a program which multiplies two large prime numbers, you would have to factorise each result you get, but you don't know to do itz so you can run a universal search for it
Somebody really liked bogosort and wanted to apply it to every other problem in computer science...
When you realized that Genetic Algorithm is just a special case of Universal Search 😶
Your room is so cool!
Love this stuff, but i might not be your regular audience, as this is the first video of yours I've watched.
funnily enough if i had to guess this algorithm will probably just generate a print function before actually generating something that solves the problem since the print would probably be shorter character wise
Idk what needs two videos to make and explain when all you have to do to find all the factors is this:
let num = 4;
function betterFactorFinder(n){
let factors = new Set();
for (let i=1; i
Yea it does seem slightly slow, have you tried caching?
I wonder if it would be "useful" to have some sort of hyper threaded supercomputer somewhere just constantly running universal search looking for solutions to famous problems. And it just runs forever. And spits out what it finds for us to look at.
this sounds like it comes VERY close to the halting problem.
Neat video, reminds me of the library of babel
Good stuff, my friend
Problems for which no optimal solution exists aren't necessarily bizarre/pathological: it's thought that matrix multiplication is such a problem. Strassen like algorithms have been able to push the complexity down from O(n^3) to O(n^2.3728596). However, it is know that the infimum O(n^w) complexity is not achieved by any algorithm. I don't know enough to know if this precludes optimal algorithms with complexities like O(n^a * log(n)) or others not of the form O(n^a)
Hm, you can actually check the result of matrix multiplication in quadratic time, modulo randomness, and probably throwing one log factor to the complexity. This should imply via our universal search that there exists asymptotically optimal algorithm, modulo randomness, and one log factor. Is this contradicting your claim or not? Do you have a link to a discussion about this topic?
@@PolylogCS I don't have a better source than wikipedia / Computational_complexity_of_matrix_multiplication, which references a 1981 paper by Coppersmith and Winograd. And yeah as far as I can tell, this doesn't mean there's no optimal algorithm with a log factor or something like that. There's a few ways that the impossibility of achieving the infimum w could be reconciled with the universal optimality of universal search:
- w is a more rough measure than O since it ignores log factors
- I only know of a randomized algorithm to certify matrix multiplcation in quadratic time like you alluded to, and I'm not sure how that interacts with universal search
- Universal search has the runtime of the optimal algorithm, so if there is no optimal algorithm, ie for every algorithm with a runtime of O(n^a) with a > w, there is another algorithm with runtime O(n^b) with a > b > w, then the time complexity of universal search simply wouldn't be well defined
@@hacatu Regarding your last point: the complexity of universal search is always well-defined in the sense that it is some function of n (I am not claiming it is a simple function or anything like that, just that it is well defined in the sense that it makes sense to talk about it). So, you can view the argument in our video as saying that under certain circumstances I can actually prove that there exists an asymptotically optimal algorithm for a given problem. For example, whenever you give me a sequence of algorithms with complexities n^a1, n^a2, ... with a1>a2>... for matrix multiplication, I look at my Universal search (checking by Freivald's trick which I assume should be in O(n^2 log(n)) randomized complexity) and if I then go through the argument in our video, I conclude that there exists a concrete algorithm (universal search for this problem) that has asymptotic complexity O(n^a1 + n^2 logn), O(n^a2 + n^2 logn), ... In particular, assuming that all a_i's are strictly bigger than 2, my algorithm is asymptotically as fast as any of the algorithms in the list. I conclude that there exists a "limit" algorithm at least as good as any algorithm in your list. Does it make sense?
@@PolylogCS I think that the closer an algorithm gets to the limit w, the longer the length of the shortest implementation L, and so if a program actually achieved the limit w, it would have infinite length. Universal search usually sweeps the length of the program under the rug, because it hugely affects the runtime but is independent from the input size, but it can't sweep an infinite length under the rug. But that's not a fully satisfying argument, because universal search matrix multiplication obviously has some fixed length K, so how could L(a) ever exceed K? If any program has length L ≥ K and is O(n^a), then by construction the complexity of universal search is at most O(n^a). So I'm not sure
@@hacatu Think of it in this way. Given an algorithm A, denote with A(n) the maximum time it takes to solve any instance of a problem of size n. The construction presented in the video shows that the algorithm US has the property limsup US(n)/A(n) < +oo, for every algorithm A. This implies that there cannot be any algorithm which is better than US in the sense that lim A(n)/US(n) = 0, i.e. US is optimal.
If you apply this algorithm to a case where there is no answer, (like factoring 5) your program would never finish. We know that none of its trial programs can ever output a factorization of 5, i.e. the algorithm will keep spawning new trial programs indefinitely; and it would never tell you that 5 is not factorizable.
This means that the algorithm would be limited to inputs of which we _know_ ahead of time that they can be factored into at least two integers (i.e. only non-primes). And it doesn't seem like the primality question is solvable with a universal search.
It would only work for tasks like sorting, where we know that every input has _an_ answer that satisfies the "is this sorted" test.
One further bonus problem that you would care about if you wanted to run this algorithm in practice is that the algorithm would also eventually write all _viruses_ that are possible in python or brainfuck. I.e. your computer might crash before it gets to the factorization of 4.
[Insert "In soviet Russia, factoring numbers hacks YOU" style joke.]
There are no viruses in Brainfuck, since the only operations that interact with the environment are "get a character from the input" and "write a character to the output." And remember, we can't use `eval`, since we need to run things one step at a time, so we actually need to write our own interpreter. So, just leave out all the file/network access, and everything that can be used to make a virus.
Video description contains an additional trick with which you can generalize universal search even to these cases.
My favorite universal search algorithm is very simple and very powerful and it runs in the same time complexity as it takes to check if an answer is correct.
step 1, create a universe for every possible input.
step 2, check if the answer is correct.
step 3, if not correct. Destroy the universe.
Leaving you with only one universe with the correct answer right away
that's called a nondeterministic turing machine
Amazing video!
Love the video. US seems like an extremely clever hack, I love it.
Hmm, in practice, how often does base US actually find a general algorithm? Ignoring the input and just emitting the constants that happen to be the factors for the given instance of factoring seems like it would be shorter in length + execution time than a general algorithm that reads the input case, does work to factor it, and then emits the factors. At least with the improved one and sufficiently large inputs, the actual size of an algorithm discovered by US is likely to be shorter than the length of the input, so, that is good
I'm reminded of Permutation City, which, you have almost certainly read, but, if you haven't, check it out. It is an excellent book by Greg Egan
Should have written a program that generates valid python asts to make it faster :D
well it generates valid brainfuck which is easier to generate
So, the only problem is O? You can solve that fairly easily, if you design the hardware specifically for the task at hand, even more than CPU and GPU are designed for graphics and computation, then you can cut down on O by huge amounts and even on the time complexity itself by streamlining processes. This is the exact method the NES used to avoid the lag so common on other systems of its time and it shows.
I love how it's just a matter of having an infinite time span filling an infinite memory bank processed through an infinite computanional unit amount to have the best search algorithm. 👍
This is interesting! I always thought of running some kind of Genetic Algorithm for a long time on a PC to generate a code to solve a very simple problem. I was curious how long it takes for an evolutionary process to come up with a solution (read find a species that fits the environment). Maybe that idea can be used here?!
I implemented this idea in lisp a few years ago, i can give you the github if interested. Althought it was very poorly implemented, say, for example, finding a program that could solve aquadratic equation took something like 20 minutes if i remember correctly
To avoid syntax errors being generated from your random program you simply need to define a way to store the program such that syntax errors aren't possible byte code for instance, can be good at this.
Nevermind, you used BrainF which happens to be just as effective.
wouldn't universal search in general keep finding hardcoded solutions before any actual algorith? Because if that's the case the algorithm should have the same complexity as a brute forcing algorithm
For practical purposes, yes. You have to imagine what happens when the number to factor is so large it does not fit into the universe!
Think about what happens if hardcoding the answer takes more bytes than computing the answer.
As an example, instead of thinking about factoring, think about finding the asymptotically optimal algorithm to decode an mp3 file to wav.
For the sake of argument, let's assume that the optimal algorithm takes 1 GiB to write down, then you will find it before hitting the hardcoded solution, if your output wav-file is longer than 1 GiB.
@Patrick Lenihan Well, hardcoding is just a special case of your 'incorrect algorithm'.
The only problem with this solution is that you may have a program that sends your nudes to all your friends on Facebook before you get the one you want. It's a risky game.
I wonder if you can improve Universal Search?
if you have N=p*q, then factoring N will have a program of the sort "print(p,q)" show up. Since the length of p and q is roughly the length of N, that means the order of the algorithm is faster than O(n).
(I mean, of course we knew that - even guessing each number is O(sqrt(n)) )
But if you knew that a factoring algorithm existed and ran in "K" steps, you could provide a lower bound on time, and any program running more than K steps can be stopped.
But i suppose the idea of that would be to find the holy grail factoring algorithm, and not just have to search every time.
if you set a bound with the exponential growth as mentioned previously, you would have a runtime of K*L + 2*f(n)
While L is ungodly big, as in the video, the factor in front isn't as bad.
Nice idea!
If the upper bound K is based on some known factoring algorithm (e.g. guessing each number) then K is a function of n. Let's call that K(n). Then the runtime bound you give for the search algorithm is K(n)*L + 2*f(n) = O(K(n)), i.e. this expression is actually dominated by K(n).
The algorithm with the upper bound is actually still faster than universal search, it's just that you don't actually run all those (L-1) bad programs the full K(n) steps, since the optimal algorithm L may terminate before you simulate all those extra steps in all those bad programs. This makes a big difference, if n is large enough. So the runtime is still O(f(n)) overall, and the constant factor is probably still something like 2^L if I had to guess.
yes, i think that's mostly correct. However, I don't think the constant factor is of order 2^L.
The number of operations is bounded by K(n)*L + 2*f(n), and since a simple algortihm is to guess, K(n)=n in this scenario.
So n*L + 2f(n) is the time, and of course, n*L dominates, but is far less than 2^L.
(Would it be appropriate to call this O(nL)?)
@@purplenanite n is variable, 2^L is constant, so n gets much larger than 2^L as n gets arbitrarily large (i.e. asymptotically).
In general, K(n) is asymptotically larger than f(n), so a*K(n) + b*f(n) is on the order of K(n), regardless of constants a and b. The point is that this bound isn't better than the original bound of 2^L * f(n), which still applies.
For an algorithm that works, there will be numbers such that writing print(p, q) is an even longer query thanks to how many digits it takes to represent p and q.
I didn't really understand one thing: if the universal search has time complexity O(f(n)) if there is an algorithm with time complexity f(n), why should it "pick" the fastest algorithm? Like say there are two algorithms, A and B, to solve problem P. A has time complexity O(x) and B has time complexity O(x^2). Why would universal search have a time complexity of O(x) instead of O(x^2) when used to solve problem P?
This is universal valid language generator, valid on one criteria and one leanguage set. Many program make some, but not always you think. Universal search is anyway very interesting thing. My noob opinion in searching.
confused about everything, but great video!
I've been thinking a bit about this vid and while I do appreciate the concept, I'd like to point out a flaw.
It's important to note that any brainf*ck program can be represented exactly by a one-tape turing machine (so is bound in speed). The extended church turing thesis only yields that a 1-tape turing machine can implement an algorithm in any other computational system to some polynomial slowdown, and that polynomial slowdown can be pretty trivially shown to be problematic for even basic problems like classifying palindromes where any matching 1-tape machine is at best O(n^2) where with a multi tape setup we can see a simple O(1) method.
You can say that if factoring is in P, you've found a poly time algo for it, but you can't be more fine grained than that without more care.
Small correction: the typical one-tape turing machine model involves pre-pasting the input onto the work tape, so brainf*ck could be faster in some very specific instances due to its relative freedom in i/o. We can properly bound the problem with a 3-tape turing machine with a work tape, input tape and output tape where the head on i/o tapes can only go right, though the analysis does not change (and the example of palindromes is not deprecated either due to our restriction on the input tape head movement and lack of knowledge of the string length)
What if it just finds the program:
Printf("2 2")
Wouldnt that also be considered a solution?
Or do you just have to check the program with multiple numbers so it is unlikely that the simple printf cheat would work?
This is absolutely possible and very likely. The search will just terminate there and will be happy with the valid solution. Note that we are only making a worst-case argument about the time complexity of the algorithm, but it is very possible that it will coincidentally finish faster when it encounters mostly completely wrong algorithm that accidentally produces the correct answer. The point is that in the extremely unlikely case that no such wrong algorithm that produces the correct answer exists, it will still eventually get to the 'correct' algorithm that produces the correct answer.
@@PolylogCS I see. So an upper limit to how much time is needed to find the wrong or right solution since it only has a flawed acceptance function to check. This feels a lot like training a neural net with a naive fitness function 😂
That is ... like shooting an atom bomb at sparrows. It feels absurd to even call it a solution to the problem, yet it feels absurdly funny to even think about using it
We're glad you enjoyed it and had some fun thinking about it! That was exactly the point, not to show something immediately practical, but make you think about some theoretical concepts that can in return widen and deepen your insights.
well you can check all numbers if they even have factors, by modular exponentiation algorithm of wilson theorem
you can do the factorial computation instantly
with expanded factors approximation with convergence
when you have the universal primality test, you also have the factors
logical proof programming, math statements, is simpler than programming languages, less permutations
you can check if the program runs infinitely looping, or ends
Okay but can Universal Search Algo be used to create a Universal Search Algorithm?
If we assume that this algorithm will go over all possible programs, then for each composite number there exists a program that returns its factors... thus making the asymptotic time O(1) ?
For example, in the case of 4, the algorithm might find the following program "return 2, 2"
Not quite - if your number is **really large**, the search will find a "correct" algorithm before any other one. It's not O(1) because going through all the algorithms to find it is really difficult.
simply using the fastest multiplication algorithm for checking isn't good enough, since you're not just multiplying numbers of length n.
You have to check numbers of any length that can be generated in the available number of steps, which will put the time taken around f(n)log(f(n)) (multiplication can be done in nlogn for numbers of length n) for the biggest multiplications.
To get there your checking function can just count the length of the factors, since if they're too long they can't be correct. That will make it around nlogn, rather than f(n)logf(n). doing all checks puts you around n*logn*log(f(n)) (overestimate for log(f(n)) checks of numbers with length f(n), in reality the max output length is decreasing for the later programs).
this gives you O(f(n)+n*logn*log(f(n))) in the very end, for factoring we can assume that the first term will outgrow the second, so O(f(n)).
Actually, multiplication can be done in linear time in the standard ram model or on pointer machine. But you are totally right that if you choose a more fine grained model, then we are not optimal and probably losing at least one log factor for multiplication.
@@PolylogCS Ah, I was solely thinking about realistic models of computation. Thanks for bringing that up, I have now also seen the video description.
BTW. 7:30 it's totally sound effect from The Matrix. GOTCHA!
1:55 slight error: these functions are actally *elements* of (and thus not *equal* to) O(n^2) since the O-Notation actually gives you the set of all functions with that asymptotic growth.
Thanks for pointing that out! This is actually what I don't like on this notation - so many times I want to write things like n < O(n^2) or O(n) < O(n^2) but if you want to be formal, you have to overload the meaning of "=" or write n \in O(n^2) or O(n) \subset O(n^2) which looks crazy.
@@PolylogCS yeah, it can certainly be notationally tricky^^
@Numerius Aha, I thought the error was in the audio. If Spulg argues just about what is on the screen, I also go with that it's just a different definition of "=" :)
I got used to using in/subset in school... but now whenever I'm working I always just go with =. Never considered using < before, but that notation makes sense, too.
They didn't just write the most useless answer, they used the useless language for it
But there might be a super genius algorithm that could factorize in O(log n). That case, multiplying takes O(log n log log n) and thus your algorithm will take O(log n (log log n)²) to terminate which is not the fastest.
Reminds me of fuzzing for vulnerabilities
My nondeterministic computatortron always solves problems in the least possible finite number of steps, or it runs forever. It runs all possible programs against all possible inputs in parallel. Each cycle of the computatortrob runs one cycle of each program/input. As soon as one of them spits out the solution, the computatortron halts.
*computatortron
What if it deletes your drive first?
You should probably run it in a sandbox and simulate a language that is not allowed to execute any system commands and can only access a virtual memory :-)
Okay ii lost it when you decided to use BRAINFUCK of all things XDDDD
Isn't L depends on n, therefore it isn't a constant and be factored out of the big O notation?
It isn't, L is basically 2^number of characters of your code.
@@PolylogCS it's independant only if the code represents an algorithm instead of a straight answer, tho right?
@@deltamico If an algorithm that solves the problem exists, eventually (for high enough n) the universal search will reach said algorithm and thus won't check the stuff past that point. And we don't care about small n for an asymptotic analysis.
But does it actually solves the problem? When search is performed, we search for the algorithm, which just passes one test case, but not really computes the solution. We can just run into the "print("2, 2"") program, can't we?
Indeed! Most likely the search will find print("2 2") much earlier than the real optimal algorithm and just uses that one. The point is, that for some super obscure super large input, it might not be the case. The main point is that it could find a wrong algorithm that accidentally gives the correct answer, but in the worst case it will find the optimal algorithm and never be "much" slower than that (it can be much faster by some "accident").
To accurately say, that this is asymptotically accurate algorithm, there is a need to explain, why we wouldn't stop at some solution with complexity g(n), such that O(g(n)) != O(f(n)).
Suppose L is the so called `number` of an algorithm with complexity g(n) within the list of all `compiling` programs, and L' is the similiar number of an algorithm with O(f(n)), where f is the optimal complexity.
Number of steps, performed to algorithm g(n) is actually number of steps performed to f(n), multiplied by some constant (it is somewhere near 2^(L' - L) )
So, as n approaches infinity g(n) * C > f(n), due to O(g(n)) != O(f(n)) and f(n) - is the optimal algorithm.
Also, to me it is unclear the latest statement, that there is no infinity sequence of decaying complexity functions.
We assumed, that L in analysis is a finite value, but if there is exists such sequence, wouldn't be this assumption wrong?
Correct me if I am mistaken somewhere.
Also sorry for my poor English, it is not my native :sadge:.
Great question! I agree that the argument about the infinite sequence of better and better algorithms is not explained in very much detail. Let me give you a small hint: Imagine there would be such a sequence of algorithms, our "Universal Search" algorithm is asymptotically not slower (or faster) than all of them - well, then this very algorithm, which we implemented in Python, is also somewhere in the list and that algorithm itself would be the finite one algorithm with some very specific (maybe unknown) complexity. Does that help to understand better?
I don't think I understand. With this, you only find the first program that solved your input. You do not find one which solves *all* inputs.
Also, assymptotical is (also) about running time *versus* different input size. If you have only one input it doesn't make sense.
Ok, now I understand, the algorithm is "run an universal search on whatever input you receive." It doesnt matter if it is "print 2,2" for 4 and then a complete sieve for 6, we are only talking about the aggregate algorithm, not the intermediate one, right?
You got it! That's exactly how it works, there's no guarantee that it actually finds the optimal algorithm before it finishes (most likely not, it will likely find some incorrect algorithm before that that accidentally produces the correct answer). The only promise we give is that at the very worst case it won't take much longer than eventually finding the optimal algorithm which is still at some constant index in the list of all algorithms!
There's something I'm a bit puzzled by.
In order to check whether an algorithm is a factoring algorithm, it would have to work for every input. Not just "4". If all we want is to factor 4, then the program that always print "2 2" is likely the shortest that outputs the correct answer, but is not a factoring algorithm.
Correct. This _factorization algorithm_ doesn't ever need to run a _factorization program_ to work. That's indeed counter-intuitive, but all that counts is that the _overall_ algorithm works for every input, not that it ever tries a _trial program_ that does. I would say that "a set of instructions that is guaranteed to select a correct output" is actually a perfectly valid definition for what an algorithm is.
Keep in mind that we are talking about _asymptotic_ complexity here. Short inputs (as long as it only applies to a finite number of inputs) are perfectly valid to have runtimes that don't follow the curve described by the function that describes its algorithmic complexity.
And if you think of factoring long numbers, compare about how the program shown at 7:03 that computes the factorization has a fixed length to how any program that naively outputs a hard coded solution to a sufficiently long input will have to contain that answer and is thus longer than the most concise "actual" factorization program. And the naive program will then be started only after the actual factorization program was started.
The naive one _might_ still finish first. It depends on how the optimal "actual" factorization scales and how much of a head start it has over the naive one. [edit:] In fact, for a number with d digits, going through all "naive" programs has a complexity of 10^(2d). While even the most straightforward algorithm has only complexity 10^(d/2).
For another way to think about this paradox, consider the algorithm that for any input is supposed to compute the square and give you the remainder mod 4 of that square. But we can use our brains before writing that algorithm and consider the even and odd numbers separately:
Squares of even numbers have at least 2 factors of 2 in them, or at least 1 factor of 4, which means their square mod 4 is 0.
Odd numbers are either 1 mod 4 - which squared is 1 mod 4; or they are 3 mod 4 - which squared is also 1 mod 4.
I.e. the actual program at runtime only has to check if a number is even or odd and then it can simply output the pre-computed results of 0 or 1, without ever "actually" doing the calculation of the result.
If the universal search algorithm is able to pick a program like this that "has the actual work done ahead of time"; then why shouldn't it instead be "doing the actual work" after running its little program, when it runs the output check.
@@Pystro Ah, yes. I misunderstood that step. The universal search algorithm *is* the factoring algorithm (as long as the check is correct). It does not *find* it.
And the interesting part compared to a simple exhaustive search (producing all bit strings of increasing length and testing if it's the right answer) is that it has an asymptotically optimal complexity. Assuming the problem has a finite-size optimal algorithm to solve it. Which seems to be usually the case. But is it always?
(It feels like there could be a link with undecidability but even checking an answer to the halting problem is problematic.)
Actually, it looks like the universal search algorithm is related to the Kolmogorov complexity of the solution for the given input. Except it does not exactly compute it as it prefers a longer program that runs exponentially faster.
Nice observation with the kolmogorov complexity! You might enjoy looking at the video description.
Hey, but simulating takes n log n, or I missed some big breakthrough in TCS?
What do you mean by simulating?