I think it is difficult to overfit and easy to train artificial neural nets for a reason. The dot product is a rough statistical filter that responds to broad patterns of neural activation in the prior layer. If that statistical mode did not exist neural networks of large size could not be trained. There is no known algorithm that could do it. Anyone with experience in say evolving engineering systems will tell you 100 parameter systems are about the limit to find interesting solutions. Therefore there must be a gross simplification going on in neural networks to allow them to be trained. And of course statistical solutions you would imagine generaize well.
I think it is difficult to overfit and easy to train artificial neural nets for a reason. The dot product is a rough statistical filter that responds to broad patterns of neural activation in the prior layer. If that statistical mode did not exist neural networks of large size could not be trained. There is no known algorithm that could do it. Anyone with experience in say evolving engineering systems will tell you 100 parameter systems are about the limit to find interesting solutions. Therefore there must be a gross simplification going on in neural networks to allow them to be trained. And of course statistical solutions you would imagine generaize well.