Ali is 100% correct, I've worked in academia and industry doing ML since 2007 and have recently noticed an uptrend in brittle models. Deep learning is a victim of its own success and I fear if this cycle continues we might experience another winter.
Pretty good talk! Ali Rahimi, thanks for not being afraid to speak up! Most of the things you said are true, imo. We need both empiricial and theoretical science. One thing I must add: Model building was a craft in 2007 (with a lot of bias in feature engineering and techniques), it's a craft today. *Practical* machine learning being an alchemy isn't a new phenomena. It was there before deep learning, as well. More analytical rigor with methods won't solve the problems of filtering bubble, bias inherent in training data etc.
Thanks, it was a great talk. I've watched it before and wonder how he might give it differently today. I think most of the advances in the field have come not from exact theoretical proofs but rather good educated guesses plus a lot of careful methodical tool building, optimizations, and scaling. I understand there is some controversy here and different "schools of thought" on the subject.
Great talk , the way deep learning is handled by the top researchers has spawned many pseudo researchers who have computation power at their hand and try out bunch of different things until something works (that is mainly due to amount of data available , which is far huge compared to what a human would need to learn) . Until it works it is fine but when it will break people will not know what hit them . I am new to deep learning and within few months I am able to understand that the researchers in this field mostly have no clue why it works and this has made me lose all interest in this field. Thanks Ali for being brave and giving an honest opinion , hopefully things will change.
I have been that pseudo researcher and I totally agree with Ali. While you get the task done in the short term (ship a product or publish a paper) you are left with a residual sense of incompleteness. I think its the way we approach "deep learning" that matters. If you approach it as a software engineer looking to build a system/product, the current "throw compute and try until you get the results" path works. Your concerns should then be productivity of the ML framework and DevOps. But IF you approach it as a researcher/student you are left with intellectual dissatisfaction. I was.
" researchers in this field mostly have no clue why it works": that's *literally* the reason *my* interest in this field has exploded. What's the point of doing research in a well-understood field, anyway?
@@avimohan6594 That's not what he meant... Researcher in this field don't gives a damn about why something works or why something doesn't work. If we keep doing it then we'll face AI winter again. You know why? Cuz we don't have anyone RESEARCHING. We have people doing ALCHEMY. Alchemy have limitation.
@@avimohan6594 It is fine to have no clue about things you do, but then you don't publish them as meaningful result but rather try to find scientific way to explain the results, The algorithms like gradient descent and backpropagation have been mathematically explained and hence they are worthy of getting published but not the ones where people do some changes to existing neural networks and get +0.5 percent for accuracy and publish it as research without explaining , I have noticed that most of the papers published even in top conferences have this pattern. I am talking about NIPS and CVPR paper not some local conference papers.
@@siddharthkotwal8823 Yeah , the main goal of the AI researcher should be to prove why the deep neural network works or fails , even if it takes a long time to get this answer , it is something which will win the test-of-time award which Ali won , not the papers which are shipped in 6 months after tweaking existing neural network model and getting +.5 percent accuracy , this is just waste of resources, if your experiment doesn't produce an even small increment in your knowledge , it is not really a research experiment but an application so probably things are fine if you are engineer trying to create a application but such people should stop calling themselves researchers.
Yann LeCun wrote a rebuttal, which hinges early on the following sentence: "In the history of science and technology, the engineering artifacts have almost always preceded the theoretical understanding: the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science." But this misses the point entirely. How many forests were cut down in service of the early steam engines before we gained a solid understanding of thermodynamics? So the correct formulation is this: _When_ should the rigour police follow along? And the answer is Einstein's: as soon as possible, but no sooner. Aha! A stopping problem. Should be no problem to string together some brittle algorithms we don't entirely understand to produce an answer we don't really trust. How many unnecessary GW-H are we presently pouring into gradient descent that doesn't work properly? Hard to say. If only we had some rigour to throw at the question ... LeCun goes on to say: "But the correct attitude is to attempt to fix the situation, not to insult a whole community for not having succeeded in fixing it yet. This is like criticizing James Watt for not being Carnot or Helmholtz." This is the second time the word "insult" came up in my travels today. LeCun apparently missed the point about the floating-point rounding mode. If James Watt observed that switching out clockwise-threaded bolts for counterclockwise-threaded bolts had suddenly reduced his steam engine's efficiency by 75%, I don't think a strident call for improved rigour would be misplaced, then or now.
it is kappa, the condition number. It means that matrix A is ill-conditioned (the ratio of the max. singular value to its min. singular value is very large).
Ali unintentionally called out using machine learning being by corporations to squeeze out every drop of sweat and blood (in some cases literally) of the consumer, at any means. Even if that means spending many man hours tweaking deep learning algorithms to suit a need. It is as much a call out to corporate greed as it is an admonishment of non-scientific methods. The irony can only be seen at the end of the video when you see all the sponsors of NIPS. This is how it is, of course, but it must be really hard on hard core researchers like Ali to know this is the landscape the research is in. Note: I like to think there is some empiricism in their efforts, but LeCun's response doesn't give me much optimism in that respect. I think the little tweaks that work could have more significance than the big ones, even.
Many in the ML community did not like this and dismissed it outright. It led me to learn about evolutionary complexity in machine learning work and this is an extremely hard freaking problem. Kenneth Stanley has some good videos about this. Ali seems to have uncovered a real problem within ML groups playing with disassociated machine minds that can only do something by tweaking. We know nothing.
Of course... LeCun is an engineer, not a mathematician or scientist. See all his past publications, its just him trying out different stuff without putting any effort in understanding how they work. Unfortunately, he succeeded which landed him in the spotlight and a high rank position in FB. To admit that what he was doing all this time is nonsense would undermine that very position.
It's more than just a lack of code or documentation; the reproducibility problem in AI and ML is very real in 2018 still. It hasn't stopped. What it refers to is the lack of ability to literally reproduce the results reported in publications (by top researchers included) even when using the same source code, datasets, etc.. It is evidence of a larger problem hinted at by Ali here; AI and ML as it stands today is sometimes a very fragile, empirical science. Not all of the time. Not by everyone. But more than it should be.
+Felix Sosa: You said "the lack of ability to... reproduce the results... even when using the same source code, datasets, etc.." Is it because the seed of the random number generator, or that the precision of the computation is not reported? Otherwise, where does the discrepancies come from?
Rigor is good. But if you don't understand why it works, then (a) either work on a theory that explains it, (b) or work on a theory that shows what part is arbitrary and probably what the better technique is, or (c) wait and watch others do it, while not complaining about the lack of rigor in deep learning. Simplicity and regularity in the nature of deep learning computations is its greatest virtue for a system designer -- leading to a rare confluence of various areas of CS: high performance architectures, high performance software, and algorithms -- a crowing jewel and exemplar in nature (there's nothing more well-studied than optimizing a dense matrix-matrix multiplication and nothing else as meaningful that can go closer to the machine compute peak). Instead of lecturing others about the lack of rigor, you are free to build something else better with a theory behind it and are free not to use deep learning and free to enlighten others that there isn't (yet) a theory behind it. And finally don't assume that others don't know that more theory is needed! Theoretical soundness and rigor are relative criteria. Sometimes things just work, and you need to provide an explanation; that's also how many fields have evolved and evolve!
I strongly disagree with this comment. Ali does work on these things. That's why he was given the test of time award, the most prestigious honor in our field. You get this award, in a sense, for doing somewhat iconoclastic work. And in this role, giving an invited talk, as both an elder, and iconoclast (vs as an ordinary paper presenter), it's within his right to make a normative point, to make it through metaphor and comedy, and to issue a calll to action.
Intentionally or unintentionally, the call made in this talk was to urge others to work on things that had a sound theory as opposed to on a theory for something that worked and didn't yet have a theory but outclassed former techniques (this is something Ali patched up and went back on if you see his response to Yann Lecun's response). Theoreticians also said Ethernet was doomed to failure (based on wrong and simplifying assumptions for their model) before it became popular, and here we don't have a theory that says deep learning won't work. Perhaps unintended, but the talk makes the wrong call and can easily be construed as one coming out of fear, jealousy, and defensiveness arising out of years of "rigorous" "theoretical" work being tsunamied by deep learning.
> Ali does work on these things. That's why he was given the test of time award The test of time award is given for *a* paper (its impact on posterity, etc.), not for other work a researcher does over a period of time. If someone works on it and can't come up with a theory, that doesn't mean others can't; finally, there are many things in applied areas that mathematicians have never come up with a theory to explain. That doesn't mean you go back to old poor performing and non scalable techniques. The talk, instead of acknowledging the advances of deep learning and emphasizing on the need to work on models/theory explaining it (and perhaps to even improve it), is asking people to work within the constraints of existing theory or to come up with the theory at the same time as building a system. That's not how engineering works.
Nineth I don’t understand why this deep learning community is very sensitive! Why so serious mate? He is not dashing at anyone. Listen to his talk carefully. Don’t just be LeCun’s baby! Rahimi has the right to speak and he has not mentioned a single word “theory” in his talk. All he said was about rigour! It was a brave speech from a great researcher!
Not from the deep learning community. Shouldn't blindly go by words uttered: rigor in his talk is equivalent to development of theory - even those who support his view know that! And irrespective of whether you agree or disagree with the message, everyone knows that he is pointing to the lack of theory underlying deep learning. Of course, some view this talk as a call to shed light on deep learning (including Ali himself in his later response), while others view at as a self-satisfying call to continue working where there is currently light, refusing to believe that the real kick a** stuff is in the dark -- deep learning has I believe pulled the rug from under the feet of theory overdwellers, providing evidence that these folks should now look elsewhere if they want to be better in practice. And yet in denial! Fine if you want to use the theory you like - but stop complaining about deep learning at least!
increasing complexity without understanding basics will lead to more confusion, you are building a tower but you don't understand how the foundation works , it will eventually fall and you will not even understand why.
Ali is 100% correct, I've worked in academia and industry doing ML since 2007 and have recently noticed an uptrend in brittle models. Deep learning is a victim of its own success and I fear if this cycle continues we might experience another winter.
Pretty good talk! Ali Rahimi, thanks for not being afraid to speak up! Most of the things you said are true, imo. We need both empiricial and theoretical science.
One thing I must add: Model building was a craft in 2007 (with a lot of bias in feature engineering and techniques), it's a craft today. *Practical* machine learning being an alchemy isn't a new phenomena. It was there before deep learning, as well.
More analytical rigor with methods won't solve the problems of filtering bubble, bias inherent in training data etc.
Thanks for sharing, can't agree more about the alchemy part.
Thanks, it was a great talk. I've watched it before and wonder how he might give it differently today. I think most of the advances in the field have come not from exact theoretical proofs but rather good educated guesses plus a lot of careful methodical tool building, optimizations, and scaling. I understand there is some controversy here and different "schools of thought" on the subject.
Great talk , the way deep learning is handled by the top researchers has spawned many pseudo researchers who have computation power at their hand and try out bunch of different things until something works (that is mainly due to amount of data available , which is far huge compared to what a human would need to learn) . Until it works it is fine but when it will break people will not know what hit them . I am new to deep learning and within few months I am able to understand that the researchers in this field mostly have no clue why it works and this has made me lose all interest in this field. Thanks Ali for being brave and giving an honest opinion , hopefully things will change.
I have been that pseudo researcher and I totally agree with Ali. While you get the task done in the short term (ship a product or publish a paper) you are left with a residual sense of incompleteness. I think its the way we approach "deep learning" that matters. If you approach it as a software engineer looking to build a system/product, the current "throw compute and try until you get the results" path works. Your concerns should then be productivity of the ML framework and DevOps. But IF you approach it as a researcher/student you are left with intellectual dissatisfaction. I was.
" researchers in this field mostly have no clue why it works": that's *literally* the reason *my* interest in this field has exploded. What's the point of doing research in a well-understood field, anyway?
@@avimohan6594 That's not what he meant... Researcher in this field don't gives a damn about why something works or why something doesn't work. If we keep doing it then we'll face AI winter again. You know why? Cuz we don't have anyone RESEARCHING. We have people doing ALCHEMY. Alchemy have limitation.
@@avimohan6594 It is fine to have no clue about things you do, but then you don't publish them as meaningful result but rather try to find scientific way to explain the results, The algorithms like gradient descent and backpropagation have been mathematically explained and hence they are worthy of getting published but not the ones where people do some changes to existing neural networks and get +0.5 percent for accuracy and publish it as research without explaining , I have noticed that most of the papers published even in top conferences have this pattern. I am talking about NIPS and CVPR paper not some local conference papers.
@@siddharthkotwal8823 Yeah , the main goal of the AI researcher should be to prove why the deep neural network works or fails , even if it takes a long time to get this answer , it is something which will win the test-of-time award which Ali won , not the papers which are shipped in 6 months after tweaking existing neural network model and getting +.5 percent accuracy , this is just waste of resources, if your experiment doesn't produce an even small increment in your knowledge , it is not really a research experiment but an application so probably things are fine if you are engineer trying to create a application but such people should stop calling themselves researchers.
whoa, aged like milk
The really interesting stuff begins at 11:00
Good talk but in 360p, i can not read the screen unfortunately. Besides what is the current situation on the subject.
I agree with him 100%. Also, I now understand why LeCun felt attacked prompting that response on reddit.
Yann LeCun wrote a rebuttal, which hinges early on the following sentence: "In the history of science and technology, the engineering artifacts have almost always preceded the theoretical understanding: the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science."
But this misses the point entirely. How many forests were cut down in service of the early steam engines before we gained a solid understanding of thermodynamics?
So the correct formulation is this: _When_ should the rigour police follow along? And the answer is Einstein's: as soon as possible, but no sooner.
Aha! A stopping problem. Should be no problem to string together some brittle algorithms we don't entirely understand to produce an answer we don't really trust. How many unnecessary GW-H are we presently pouring into gradient descent that doesn't work properly? Hard to say. If only we had some rigour to throw at the question ...
LeCun goes on to say: "But the correct attitude is to attempt to fix the situation, not to insult a whole community for not having succeeded in fixing it yet. This is like criticizing James Watt for not being Carnot or Helmholtz."
This is the second time the word "insult" came up in my travels today. LeCun apparently missed the point about the floating-point rounding mode. If James Watt observed that switching out clockwise-threaded bolts for counterclockwise-threaded bolts had suddenly reduced his steam engine's efficiency by 75%, I don't think a strident call for improved rigour would be misplaced, then or now.
Thank you for posting this!
For anyone interested to read LeCun's original post also: www2.isye.gatech.edu/~tzhao80/Yann_Response.pdf
Anyone recognize the names mentioned at 07:28 ?
I recognize Michael Jordan, Shai Ben-David and Manfred K. Warmuth, but who were the first two ?
I heard Nati Srebro.
Ofer Dekel
I would like to know what means "reducing internal covariate shift"
Does anyone know what the "k" stands for in k(A) = 10^20 at 14:22?
It's the condition number of matrix A.
Thanks man! Seems like the only thing I can't google these days is mathematical symbols lol.
It's a kappa.
I used to abbreviate it as a cond(A) but indeed it is the condition value
it is kappa, the condition number. It means that matrix A is ill-conditioned (the ratio of the max. singular value to its min. singular value is very large).
Very entertaining talk! By the way, which tool is used for making his presentation slides? Powerpoint, Latex beamer, or something else?
Keynote. Font was "Goudy Old Style", as a nod to the theme.
LeCun is clearly offended by Ali's remark on ML is alchemy.
Ali unintentionally called out using machine learning being by corporations to squeeze out every drop of sweat and blood (in some cases literally) of the consumer, at any means. Even if that means spending many man hours tweaking deep learning algorithms to suit a need. It is as much a call out to corporate greed as it is an admonishment of non-scientific methods. The irony can only be seen at the end of the video when you see all the sponsors of NIPS. This is how it is, of course, but it must be really hard on hard core researchers like Ali to know this is the landscape the research is in. Note: I like to think there is some empiricism in their efforts, but LeCun's response doesn't give me much optimism in that respect. I think the little tweaks that work could have more significance than the big ones, even.
joshcryer , lecun's answer is a piece of shit. Lecun should apologize for his agressivity .
Many in the ML community did not like this and dismissed it outright. It led me to learn about evolutionary complexity in machine learning work and this is an extremely hard freaking problem. Kenneth Stanley has some good videos about this. Ali seems to have uncovered a real problem within ML groups playing with disassociated machine minds that can only do something by tweaking. We know nothing.
joshcryer , exactly...lecun was really an ass with his answer.
Of course... LeCun is an engineer, not a mathematician or scientist. See all his past publications, its just him trying out different stuff without putting any effort in understanding how they work. Unfortunately, he succeeded which landed him in the spotlight and a high rank position in FB. To admit that what he was doing all this time is nonsense would undermine that very position.
Alchemy?
How is it that machine learning wasn't 'reproducible'? What does he mean by that statement?
No Github. Most of the papers published without open source code.
and experiments done on private datatsets.
It's more than just a lack of code or documentation; the reproducibility problem in AI and ML is very real in 2018 still. It hasn't stopped. What it refers to is the lack of ability to literally reproduce the results reported in publications (by top researchers included) even when using the same source code, datasets, etc.. It is evidence of a larger problem hinted at by Ali here; AI and ML as it stands today is sometimes a very fragile, empirical science. Not all of the time. Not by everyone. But more than it should be.
+Felix Sosa: You said "the lack of ability to... reproduce the results... even when using the same source code, datasets, etc.." Is it because the seed of the random number generator, or that the precision of the computation is not reported? Otherwise, where does the discrepancies come from?
Rigor is good. But if you don't understand why it works, then (a) either work on a theory that explains it, (b) or work on a theory that shows what part is arbitrary and probably what the better technique is, or (c) wait and watch others do it, while not complaining about the lack of rigor in deep learning. Simplicity and regularity in the nature of deep learning computations is its greatest virtue for a system designer -- leading to a rare confluence of various areas of CS: high performance architectures, high performance software, and algorithms -- a crowing jewel and exemplar in nature (there's nothing more well-studied than optimizing a dense matrix-matrix multiplication and nothing else as meaningful that can go closer to the machine compute peak). Instead of lecturing others about the lack of rigor, you are free to build something else better with a theory behind it and are free not to use deep learning and free to enlighten others that there isn't (yet) a theory behind it. And finally don't assume that others don't know that more theory is needed! Theoretical soundness and rigor are relative criteria. Sometimes things just work, and you need to provide an explanation; that's also how many fields have evolved and evolve!
I strongly disagree with this comment. Ali does work on these things. That's why he was given the test of time award, the most prestigious honor in our field. You get this award, in a sense, for doing somewhat iconoclastic work. And in this role, giving an invited talk, as both an elder, and iconoclast (vs as an ordinary paper presenter), it's within his right to make a normative point, to make it through metaphor and comedy, and to issue a calll to action.
Intentionally or unintentionally, the call made in this talk was to urge others to work on things that had a sound theory as opposed to on a theory for something that worked and didn't yet have a theory but outclassed former techniques (this is something Ali patched up and went back on if you see his response to Yann Lecun's response). Theoreticians also said Ethernet was doomed to failure (based on wrong and simplifying assumptions for their model) before it became popular, and here we don't have a theory that says deep learning won't work. Perhaps unintended, but the talk makes the wrong call and can easily be construed as one coming out of fear, jealousy, and defensiveness arising out of years of "rigorous" "theoretical" work being tsunamied by deep learning.
> Ali does work on these things. That's why he was given the test of time award
The test of time award is given for *a* paper (its impact on posterity, etc.), not for other work a researcher does over a period of time. If someone works on it and can't come up with a theory, that doesn't mean others can't; finally, there are many things in applied areas that mathematicians have never come up with a theory to explain. That doesn't mean you go back to old poor performing and non scalable techniques. The talk, instead of acknowledging the advances of deep learning and emphasizing on the need to work on models/theory explaining it (and perhaps to even improve it), is asking people to work within the constraints of existing theory or to come up with the theory at the same time as building a system. That's not how engineering works.
Nineth I don’t understand why this deep learning community is very sensitive! Why so serious mate? He is not dashing at anyone. Listen to his talk carefully. Don’t just be LeCun’s baby! Rahimi has the right to speak and he has not mentioned a single word “theory” in his talk. All he said was about rigour! It was a brave speech from a great researcher!
Not from the deep learning community. Shouldn't blindly go by words uttered: rigor in his talk is equivalent to development of theory - even those who support his view know that! And irrespective of whether you agree or disagree with the message, everyone knows that he is pointing to the lack of theory underlying deep learning. Of course, some view this talk as a call to shed light on deep learning (including Ali himself in his later response), while others view at as a self-satisfying call to continue working where there is currently light, refusing to believe that the real kick a** stuff is in the dark -- deep learning has I believe pulled the rug from under the feet of theory overdwellers, providing evidence that these folks should now look elsewhere if they want to be better in practice. And yet in denial! Fine if you want to use the theory you like - but stop complaining about deep learning at least!
okay hear me out, what if we used machine learning to solve how machine learning works
increasing complexity without understanding basics will lead to more confusion, you are building a tower but you don't understand how the foundation works , it will eventually fall and you will not even understand why.
That introduction by the lady is atrocious.