This is amazing work. I'm very excited to see where this may lead to in future research. It potentially solves a long-standing problem in ANN. Great job!
The animation at 1:15 ("this difference is almost the same for different intializations") appears to contradict the animation at 2:58 (where we say all values of f(x_i) at initialization converging to f^*(x_i), and therefore their differences not being all the same). Is the statement at 1:15 wrong? Btw, awesome work!
Very good observation! Actually the settings for both examples that you point to are a bit different, and this leads to the difference you observed. In the first case, the 'push' at x_0 represented by the arrow is the same for all functions/networks, the change is therefore asymptotically the same. On the other hand, when the network is trained with a MSE loss at 2:58, the push at each point x_0 is proportional to the distance between the value of network f(x_i) function and the labels fˆ*(x_i). It therefore depends on the value of the function and will lead to different changes to the functions. Hope this helps!
The data vectors are drawn from the unit circle, equivalent to an interval with the boundaries identified. You can parameterize the vector then with an angle, that angle is \gamma. The bounds on the axis confirm \gamma \in [-\pi, \pi]
This is so genius, I love this
This is amazingly relevant for my and my coworkers thesis. Truly major result.
This is amazing work. I'm very excited to see where this may lead to in future research. It potentially solves a long-standing problem in ANN. Great job!
I think this is great research but I'm missing a few braincells to understand the paper
The animation at 1:15 ("this difference is almost the same for different intializations") appears to contradict the animation at 2:58 (where we say all values of f(x_i) at initialization converging to f^*(x_i), and therefore their differences not being all the same).
Is the statement at 1:15 wrong?
Btw, awesome work!
Very good observation! Actually the settings for both examples that you point to are a bit different, and this leads to the difference you observed. In the first case, the 'push' at x_0 represented by the arrow is the same for all functions/networks, the change is therefore asymptotically the same. On the other hand, when the network is trained with a MSE loss at 2:58, the push at each point x_0 is proportional to the distance between the value of network f(x_i) function and the labels fˆ*(x_i). It therefore depends on the value of the function and will lead to different changes to the functions. Hope this helps!
Très cool mec!
what does the gamma represents in the x axis of the figures?
The data vectors are drawn from the unit circle, equivalent to an interval with the boundaries identified. You can parameterize the vector then with an angle, that angle is \gamma. The bounds on the axis confirm \gamma \in [-\pi, \pi]
@@shaunakravec1460 Thank you, I was wondering the same thing. I can see know that the x axis spans +/- pi
Way too short this video! (As of 1/30/2020 this is also the only video posted by Jacot)
Merci pour cette vidéo Je ne comprends pas pourquoi avoir étudié ou appliquer la méthode des moindre carré vu que c'est pas un ANN
Qu'est ce qui motive cela ?