232
224 649

Gradient descent finds large cliques in random graphs

2:01

Gradient descent finds a 77 element subset of {1,...,240} with no length 4 arithmetic progression

1:10

Gradient descent finds a 37 element subset of {1,...,240} with no length 3 arithmetic progression

1:07

Using gradient ascent to satisfy a Boolean formula while minimizing the number of true variables

1:00

This neural network quickly forgets what it was previously trained on.

5:34

I made another array where each row and column has the same number of squares of each color.

3:13

Gradient descent partitions random graphs into few cliques

Suppose that G=(V,E) is a graph (the graph G is simple with no loops, directed edges, or multiple edges between nodes) and r is a positive integer. A subset A of V is said to be a clique if each pair of distinct elements in A is connected with an edge.
Then the clique partition problem is the problem of finding a partition P of V into at most r blocks where if B belongs to P, then B is a clique. The clique partition problem is a well-known example of an NP-complete problem. But as with many NP-complete problems, there are approximation algorithms that can find partitions of the graph (V,E) into r blocks in many cases.
For this problem, we use gradient descent to minimize a loss function.
Suppose that f is a function from V to r dimensional Euclidean space where \sum f(v)=1 for each vector v and where each entry in v is non-negative. Then set M(f)=lambda \sum_{a\in V}(1-||f(v)||)+\sum_{{u,v}\in F}dot(f(u)f(v)). Here, ||f(v)|| denotes the L2 norm.
If f is a function from V to r dimensional Euclidean space, then set L(f)=M(g) where g is the function defined by setting g(v)=f(v/||v||)^2. Here, the squaring operation is applied pointwise. Then we minimize the loss function L using gradient descent with a random initialization.
In the visualization, the graph (V,E) is shown in blue. The equivalence relation corresponding to the graph partition is shown in white after convergence for the rounds in which the algorithm obtains a partition into cliques. More specifically, the visualization shows the matrix
M adjoint(M) where the rows of M are the vectors of the form (v/||v||)^2. Here, I chose the coloring so that if the gradient descent terminates with a partition of V into sets which are not all cliques, then the pairs of distinct points that belong to the same block in the partition but are not a part of any edge are colored red. In other words, the color red visually shows the portions of the graph that contribute to the loss level.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/sponsors/jvanname to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.

Відео

Gradient descent finds large cliques in random graphs

2:01

Gradient descent finds large cliques in random graphs

Переглядів 8820 годин тому

Suppose that (V,E) is a graph (this graph is simple, so it has no loops, multiple edges, or directed edges). Then a clique is a subset A of V such that each pair of distinct elements in A is connected with an edge. Given a graph G=(V,E), the objective of the clique problem is to find a clique in G of maximal size. The clique problem is an NP-complete problem, so there are cases of the clique pr...

Gradient descent finds a 77 element subset of {1,...,240} with no length 4 arithmetic progression

1:10

Gradient descent finds a 77 element subset of {1,...,240} with no length 4 arithmetic progression

Переглядів 7284 години тому

In this visualization, we use gradient descent to find a subset of {1,...,240} with a maximized cardinality with no length 4 arithmetic progression. An arithmetic progression of length n is a sequence of integers of the form a,a b,a 2b,...,a (n-1)b where a,b are integers and b is greater than zero. Set r=240. Define a function M by setting M(x_1,...,x_r)=[\sum_{a,b,c,d}x_a x_b x_c x_d]-lambda(x...

Gradient descent finds a 37 element subset of {1,...,240} with no length 3 arithmetic progression

1:07

Gradient descent finds a 37 element subset of {1,...,240} with no length 3 arithmetic progression

Переглядів 8357 годин тому

In this visualization, we use gradient descent to find a subset of {1,...,240} with a maximized cardinality with no length 3 arithmetic progression. An arithmetic progression of length n is a sequence of integers of the form a,a b,a 2b,...,a (n-1)b where a,b are integers and b is greater than zero. Set r=240. Define a function M by setting M(x_1,...,x_r)=[\sum_{a,b,c}x_a x_b x_c]-lambda(x_1 ......

Using gradient ascent to satisfy a Boolean formula while minimizing the number of true variables

1:00

Using gradient ascent to satisfy a Boolean formula while minimizing the number of true variables

Переглядів 1459 годин тому

We say that a Boolean propositional formula is a positive (or monotone) formula if all the literals are positive. Equivalently, a positive Boolean propositional formula is a formula obtained from the variables by applying the connectives AND and OR without applying the negation NOT. A negative Boolean propositional formula is the negation of a positive Boolean propositional formula. We shall gi...

This neural network quickly forgets what it was previously trained on.

5:34

This neural network quickly forgets what it was previously trained on.

Переглядів 68916 годин тому

This is a visualization of a neural network periodically learning and forgetting information. The network is designed to learn 500 data points, but instead of training the network on all of these training data points or a random sample of these training points for each network update, after training the network on a particular data point for some period of time, we discontinue training the netw...

I made another array where each row and column has the same number of squares of each color.

3:13

I made another array where each row and column has the same number of squares of each color.

Переглядів 45316 годин тому

The array has 64 columns and 40 rows. We are given 8 colors. The goal is to have 5 squares of each color in each column and 8 squares of each color in each row. We then use a hill descending algorithm and repeatedly swap pairs of colors in such a way that each swap keeps the loss level the same or decreases the loss level. Eventually we achieve a loss level of zero and in that case, we have obt...

I made an array where each row and column has the same number of squares of each color.

2:00

I made an array where each row and column has the same number of squares of each color.

Переглядів 2,3 тис.16 годин тому

The array has 64 columns and 40 rows. We are given 8 colors. The goal is to have 5 squares of each color in each column and 8 squares of each color in each row. We initialize the array with an equal number of squares of each color. We then use a hill descending algorithm and repeatedly swap pairs of colors in such a way that each swap keeps the loss level the same or decreases the loss level. E...

I tried to disprove the existence of a higher infinity and produced these tables instead. Part 3

10:25

I tried to disprove the existence of a higher infinity and produced these tables instead. Part 3

Переглядів 293День тому

Large cardinals are levels of infinity that are so large that these levels of infinity cannot be proven in standard ZFC set theory. While we are unable to prove the existence of large cardinals, we may be able to falsify some of these large cardinal axioms. This is a visualization of the algebras obtained in an attempt to falsify the existence of rank-into-rank cardinals. A rank-into rank embed...

I tried to disprove the existence of a higher infinity and produced these tables instead. Part 2.

13:59

I tried to disprove the existence of a higher infinity and produced these tables instead. Part 2.

Переглядів 99День тому

Recall that Godel's second incompleteness theorem states that a sufficiently strong mathematical theory A cannot be used to prove its own consistency con(A), and recall that Godel's completeness theorem states that a mathematical theory T is consistent if and only if it has a model. Large cardinals are levels of infinity that are so large that they contain models of the standard axioms of set t...

I tried to disprove the existence of a higher infinity and produced these tables instead.

20:49

I tried to disprove the existence of a higher infinity and produced these tables instead.

Переглядів 102День тому

Ways to append an ordinal to a classical Laver table of size at most 128: 2-adic ordering

21:17

Ways to append an ordinal to a classical Laver table of size at most 128: 2-adic ordering

Переглядів 3814 днів тому

This is a visualization of some of the reduced Laver-like algebras (X,*) generated by two elements a,b where b*b=1 where the algebra generated by a has cardinality 128. A reduced Laver-like algebra is an algebra (X,*,1) that satisfies the identities x*(y*z)=(x*y)*(x*z),x*1=1,1*x=x and where if x_n is in X for each natural number n, then there is some N where x_0*...*x_N=1 (the implied parenthes...

Critically simple Laver-like algebras with 2 generators

15:29

Critically simple Laver-like algebras with 2 generators

Переглядів 6514 днів тому

In this visualization, we show all critically simple Laver-like algebras with their two generators what satisfy some specific conditions. A self-distributive algebra is an algebraic structure (X,*) that satisfies the identity x*(y*z)=(x*y)*(x*z). Define operations *_n for all n by setting x*_{0}y=y,x*_{n 1}y=x*(x*_{n}y). Define x*_{\infty}y=\lim_{n\rightarrow\infty}x*_{n}y whenever this limit e...

All 6854 ways to append an ordinal to a classical Laver table of size at most 64: 2-adic ordering

9:31

All 6854 ways to append an ordinal to a classical Laver table of size at most 64: 2-adic ordering

Переглядів 6614 днів тому

This is a visualization of all the reduced Laver-like algebras (X,*) generated by two elements a,b where b*b=1 where the algebra generated by a has cardinality at most 64. A reduced Laver-like algebra is an algebra (X,*,1) that satisfies the identities x*(y*z)=(x*y)*(x*z),x*1=1,1*x=x and where if x_n is in X for each natural number n, then there is some N where x_0*...*x_N=1 (the implied parent...

All 6854 ways to append an ordinal to a classical Laver table of size at most 64

9:31

All 6854 ways to append an ordinal to a classical Laver table of size at most 64

Переглядів 4814 днів тому

This is a visualization of all the reduced Laver-like algebras (X,*) generated by two elements a,b where b*b=1. A reduced Laver-like algebra is an algebra (X,*,1) that satisfies the identities x*(y*z)=(x*y)*(x*z),x*1=1,1*x=x and where if x_n is in X for each natural number n, then there is some N where x_0*...*x_N=1 (the implied parentheses are grouped on the left so that a*b*c*d=((a*b)*c)*d). ...

Hill descending algorithm produces up to size 56 by 56 Hadamard matrices with 2 circulent cores

15:28

Hill descending algorithm produces up to size 56 by 56 Hadamard matrices with 2 circulent cores

Переглядів 10814 днів тому

Hill descending algorithm produces up to size 56 by 56 Hadamard matrices with 2 circulent cores

Hill descending algorithm produces up to size 44 by 44 Hadamard matrices with 2 circulent cores

5:56

Hill descending algorithm produces up to size 44 by 44 Hadamard matrices with 2 circulent cores

Переглядів 11914 днів тому

Hill descending algorithm produces up to size 44 by 44 Hadamard matrices with 2 circulent cores

All 39194 superreduced multigenic Laver tables with 2 generators of size at most 50: Over 10 hours

10:53:14

All 39194 superreduced multigenic Laver tables with 2 generators of size at most 50: Over 10 hours

Переглядів 11614 днів тому

All 39194 superreduced multigenic Laver tables with 2 generators of size at most 50: Over 10 hours

Finding a vector on a torus that remains on the torus after a unitary transformation

1:09

Finding a vector on a torus that remains on the torus after a unitary transformation

Переглядів 18921 день тому

Finding a vector on a torus that remains on the torus after a unitary transformation

Artificial intelligence produces a bent vector of length 256.

1:07

Artificial intelligence produces a bent vector of length 256.

Переглядів 11321 день тому

Artificial intelligence produces a bent vector of length 256.

AI finds a bent vector of length 256 by moving along a vector field in the real projective space

1:20

AI finds a bent vector of length 256 by moving along a vector field in the real projective space

Переглядів 17221 день тому

AI finds a bent vector of length 256 by moving along a vector field in the real projective space

This AI finds a bent vector of length 64 by training 64 complex matrices.

1:35

This AI finds a bent vector of length 64 by training 64 complex matrices.

Переглядів 19921 день тому

This AI finds a bent vector of length 64 by training 64 complex matrices.

AI finds a bent vector of length 256 using a non-random complex-valued initialization

1:04

AI finds a bent vector of length 256 using a non-random complex-valued initialization

Переглядів 17428 днів тому

AI finds a bent vector of length 256 using a non-random complex-valued initialization

AI finds a bent vector of length 256 using a non-random initialization

0:43

AI finds a bent vector of length 256 using a non-random initialization

Переглядів 29428 днів тому

AI finds a bent vector of length 256 using a non-random initialization

This AI finds a bent vector by moving in the direction of a vector field that is not a gradient.

0:50

This AI finds a bent vector by moving in the direction of a vector field that is not a gradient.

Переглядів 2,1 тис.28 днів тому

This AI finds a bent vector by moving in the direction of a vector field that is not a gradient.

Evolutionary computation produces a bent vector

2:15

Evolutionary computation produces a bent vector

Переглядів 47628 днів тому

Evolutionary computation produces a bent vector

Artificial intelligence produces a 20 by 20 Hadamard matrix.

1:20

Artificial intelligence produces a 20 by 20 Hadamard matrix.

Переглядів 2,4 тис.28 днів тому

Artificial intelligence produces a 20 by 20 Hadamard matrix.

A neural network with an unusual activation function learns the sine function.

5:50

A neural network with an unusual activation function learns the sine function.

Переглядів 3,7 тис.Місяць тому

A neural network with an unusual activation function learns the sine function.

Training a neural network on the sine function.

13:41

Training a neural network on the sine function.

Переглядів 56 тис.Місяць тому

Training a neural network on the sine function.

More machine learning recovering handwritten digits from the discrete Fourier transform magnitudes

11:52

More machine learning recovering handwritten digits from the discrete Fourier transform magnitudes

Переглядів 224Місяць тому

More machine learning recovering handwritten digits from the discrete Fourier transform magnitudes

КОМЕНТАРІ

@DeepankumarS-vh5ou 19 годин тому
Can you also do a travelling salesman problem solver using this method where you have a random graph of some size n, and you apply this to find the edges which belong to the Hamiltonian path .I dont know what the objective function would look like plus TSP is also a np hard problem .As you know that all np hard problem can be solved if we solve one of the problems in its set.
@josephvanname3377 14 годин тому
I can try to use gradient descent to solve the traveling salesman problem, but I do not know how well my experiments would work. I also need to think about an objective function for this.
@andrewferguson6901 6 годин тому
Not that this is necessarily computational more effective...
@kysjunglers День тому
Damn that's sick
@VoxelMusic 2 дні тому
This is what happens to your spine through a lifetime of gaming
@aaravyadav9906 3 дні тому
woah thats cool looking reminds of lava lamps
@josephvanname3377 2 дні тому
I am glad that you have found the visualization appealing. Plenty of serious mathematicians have proven many theorems about sets that avoid certain arithmetic progressions. And the combinatorics of arithmetic progressions involves some interesting mathematics including set theory; the infinite version of the an der Waerden theorem has a short proof that uses ultrafilters. And the animation shows many signs that this is an interpretable arithmetic progression (these signs include near symmetry along with self-similar fractal-like structure).
@andrewferguson6901 3 дні тому
Euler?
@josephvanname3377 2 дні тому
What did Euler did with large sets with no arithmetic progressions?
@381delirius 3 дні тому
I'm no neural network scientists but I wonder if it's possible to somehow create a neural network that specializes in long term memory. Like our long term memory has the advantage of being long term but it's difficult to change and slower to access compared to short term. Then somehow connect it to this neural network.
@josephvanname3377 3 дні тому
I am only vaguely acquainted with papers that deal with networks that try to ameliorate the problem of catastrophic forgetting in neural networks. But I barely consider catastrophic forgetting a problem since if one still has the training data, one may continually train the network on that data. The problem of catastrophic forgetting is an artificial problem since one has to remove the training data in order for the problem of catastrophic forgetting to be a legitimate problem. Neural networks need to forget a lot of information in their initializations. I actually believe that neural networks do not go far enough in forgetting; even though the network forgets the specific data points, the cosine similarity between the weight matrices of the network before and after retraining is very high which indicates that the network forgets how to perform tasks, but the network still retains its general structure. I would be interested in constructing networks for which we can recover all the training data from the network without generating any examples that are not in the training data though, and such networks do not forget the training data.
@space-shuttle 3 дні тому
Could you use gradient descent to solve a subset sum problem?
@josephvanname3377 3 дні тому
I can try, but the results may be worse than with discrete optimization techniques.
@Guy-wl9hu 3 дні тому
I would like to know your tool for making those animations Joseph
@josephvanname3377 3 дні тому
I used the the Plots package in Julia.
@lucyhalut4028 4 дні тому
This is so coool! Nice work.
@josephvanname3377 3 дні тому
Thanks.
@xlerb2286 4 дні тому
Reminds me of laying out Latin squares for test plots. Only there the goal was to have each symbol occur exactly once in each row and exactly once in each column. And as we started with a blank array it was simply an exercise of permutations :)
@falnica 4 дні тому
The new UI sucks #changeitback
@josephvanname3377 4 дні тому
The new UI always sucks at first until you forget how much 'better' the old one was.
@Krzys6301 4 дні тому
@QW3RTYUU If simple neural net with dense layers is not good for fitting time series as in this sine example, then what would be better? Particularly for a data like BTC price?
@DaSquyd 4 дні тому
"Mom! can we get a taylor series of sin(x)?" "We have a taylor series at home"
@josephvanname3377 4 дні тому
Thank you for this completely original and totally non-copied joke. It is very funny.
@DaSquyd 4 дні тому
@@josephvanname3377 all in a day's work
@Guy-wl9hu 5 днів тому
is it just me or does this algorithm stagnate at certain point, and always dependent on the initial state for performance?
@josephvanname3377 4 дні тому
As time goes on, it gets more difficult for the algorithm to make improvements.
@Guy-wl9hu 4 дні тому
@@josephvanname3377 Man I love what you're doing, I'm so interested in learning more about you and your way of thinking, if you have any resources recommendations that would be very helpful.
@MrPyroguy108 5 днів тому
Don't know what I'm looking at, but it seems interesting!
@micmal2967 5 днів тому
nice, now use that to sort an array
@josephvanname3377 5 днів тому
I can do it to sort an array, but we already have N log(N) algorithms for sorting arrays, so I don't think sorting using evolutionary computation or hill climbing would do much good.
@official-obama 2 дні тому
@@josephvanname3377it would look cool!
@jamesseto1 5 днів тому
What algorithm did you use?
@gregoriusaryo3560 6 днів тому
me too NN, me too
@josephvanname3377 5 днів тому
Fruits and vegetables would help though.
@GoldenBeholden 6 днів тому
Lmao, this is exactly how I feel when my network won't fit my data.
@VenomousCamel 6 днів тому
"hill descending algorithm" -- Are you picking random points, observing whether the loss value after the swap is lower than the current value, committing the swap if true? Or are there more optimizations in place like selecting colors / rows / columns with high contributions to the loss value?
@josephvanname3377 4 дні тому
I did not bother with optimizing the hill descending algorithm. I just picked random pairs of squares. One simple optimization for this algorithm would be to only compare the loss for the rows and columns that contain the swapped squares, but this code went fast enough that I did not even need to do that.
@jerbear97 7 днів тому
me during college
@josephvanname3377 7 днів тому
The solution is to take lots of math courses that build upon each other so that you don't forget. One needs calculus to compute the expected value and variance of a continuous random variable.
@jerbear97 7 днів тому
neural networkn't
@josephvanname3377 7 днів тому
The worst part is that if we take the cosine similarity between the weight matrices are different parts of the training, then the cosine similarity will be very high. The network does not change its structure during retraining, but it forgets how to do things. Neural knee twerk.
@LUVVEOUS 7 днів тому
It looks so tired
@josephvanname3377 7 днів тому
The network is getting tired because it is getting fat. As the network goes through the data over and over again, it increases the norm of its weight matrices and bias vectors to the point where it has more trouble learning. You can tell that this is the case because the network slows down over time and takes more time to cycle through the data. Regularization and zeroing out nodes would have helped with this.
@LUVVEOUS 7 днів тому
Is this Art?
@josephvanname3377 7 днів тому
This is not art. This is just a demonstration of the behavior of neural networks.
@Benevezzioficial 5 днів тому
@@josephvanname3377 anything can be art
@justman017 7 днів тому
This is cool though! Just wonder... How you make this visualised? I'm just a normal person take interest in ai but don't know how to start... Any advice?
@josephvanname3377 7 днів тому
I make the visualization frame by frame and then I put all the frames together into an mp4 file. How to begin with AI depends on your math/CS background, amount of spare time you have, and level of interest in AI.
@dang-x3n0t1ct 7 днів тому
I, too, am a neural network that easily forgets
@josephvanname3377 7 днів тому
As long as you don't forget your cryptocurrency private key or calculus. . . But in reality, humans can generally remember how to perform tasks after learning different things, so catastrophic forgetting does not happen with humans. I have not been playing the piano much recently (though I may have to play more piano to add music to my videos), but I have not forgotten how to play the piano at all nor have I forgotten specific pieces that I have gone over in the past.
@Pablo-cr1rr 7 днів тому
qr tables
@josephvanname3377 7 днів тому
QR codes are not as interpretable though. And the QR decomposition of a matrix is not very interpretable either. If you want something looking more like QR codes, you should check out my visualizations about bent functions and Hadamard matrices.
@marc_frank 7 днів тому
could there be a move that increases the loss momentarily, but speeds up convergence in the long run?
@josephvanname3377 7 днів тому
That is a good strategy for more difficult problems. But this is an easier problem as the problem of producing a Latin square is a more difficult problem than this one (I did not want to use too many colors, so I chose an easier problem rather than a problem where one has trouble distinguishing colors). Since this problem is easier, it is not likely that we end up in a local minimum or a region that is hard to escape. And for this problem, there are plenty of moves that change the board but which keep the loss level stationary, so since we can still explore without increasing the loss level.
@LUVVEOUS 7 днів тому
indeed
@TimRichardson1984 10 днів тому
i am very honored and privileged to be the first person in the world to watch this work of art
@josephvanname3377 10 днів тому
I appreciate the enthusiasm. Perhaps this content can be played at airports or hotels or in the backgrounds of malls or in the background when people are listening to podcasts.
@r-d-v 11 днів тому
I desperately wanted to hear the waveform as it evolved
@josephvanname3377 11 днів тому
Here, the waveform only goes through a few periods. It would be better if I used a periodic activation for a longer waveform.
@matteopiccioni196 12 днів тому
14 minutes for a 1D function come on
@josephvanname3377 11 днів тому
It takes that long to learn.
@matteopiccioni196 11 днів тому
@@josephvanname3377 I know my friend, I would have reduced the video anyway!
@josephvanname3377 11 днів тому
@@matteopiccioni196 Ok. But a lot has happened in those 14 minutes since the network struggles so much.
@justman017 12 днів тому
How it recovery? Redraw or just edit the image?
@josephvanname3377 11 днів тому
It recovers the original image by minimizing the loss level by moving in the direction opposite to the gradient. The loss level is the distance of the magnitudes of the discrete Fourier transform. I gave more details in the description.
@justman017 11 днів тому
@@josephvanname3377 owh thanks for the explanation! Now it crystal clear for me😆
@josephvanname3377 11 днів тому
@@justman017 You're welcome. I am glad that it is now clear.
@Finn-2763 12 днів тому
why did you torture him :( he wants to be free and not bending metal pipes to fit functions /j
@josephvanname3377 11 днів тому
Unlike the other network with the conventional activation function, it appears as if this neural network is heating the pipes up to a very high temperature before bending them. But these are my post popular visualizations. I have made other visualizations where the network was doing something really fun, and in those visualizations, people don't want to watch as much.
@Neomadra 12 днів тому
This video is very misleading since the sine function is a very bad example to demonstrate how the model is not able to extrapolate. Because the sine is not a trivial mathematical operation, it's an infinite series, a Taylor series. No finite neuronal network, also not your brain, can extrapolate this function on a infinite domain. It might be that the model really wants to extrapolate, but it will never have enough neurons to perform the computation. Probably that's indeed the case because looking at the plot it really looks like it's doing the Taylor series for the sine function, which is the absolutely optimal thing to do! Neuronal networks are just not suited for this, that's why we use calculators for these kinds of things. It's like asking the model to count to infinity
@strangeWaters 12 днів тому
If the neural network had a periodic activation function it could fit sin perfectly though
@josephvanname3377 11 днів тому
There is a big gap between the inability to extrapolate over the entire field of real numbers and the inability to extrapolate a little bit beyond the training interval. And polynomials can only approximate the sine function on a finite interval since an nth degree polynomial has n roots (on the complex plane counting multiplicity).
@machine-boy 12 днів тому
my taylor series can do better trololo
@josephvanname3377 11 днів тому
That is great.

Joseph Van Name

КОМЕНТАРІ