Sepp Hochreiter: Memory Architectures for Deep Learning
Вставка
- Опубліковано 24 лис 2024
- Currently, the most successful Deep Learning architecture is the transformer. The attention mechanism of the transformer is equivalent to modern Hopfield networks, therefore is an associative memory. However, this associative memory has disadvantages like its quadratic complexity with the sequence length when mutually associating sequences elements, its restriction to pairwise associations, its limitations in modifying the memory, its insufficient abstraction capabilities. In contrast, recurrent neural networks (RNNs) like LSTMs have linear complexity, associate sequence elements with a representation of all previous elements, can directly modify memory content, and have high abstraction capabilities. However, RNNs cannot store sequence elements that were rare in the training data, since RNNs have to learn to store. Transformer can store rare or even new sequence elements, which is one of the main reasons besides their high parallelization why they outperformed RNNs in language modelling. Future successful Deep Learning architectures should comprise both of these memories: attention for implementing episodic memories and RNNs for implementing short-term memories and abstraction.
👉 More information about the lecture series "Machines that understand?": dm.cs.univie.a...
👉 Research Group Data Mining and Machine Learning at the University of Vienna: dm.cs.univie.a...
👉 Playlist Machines that understand? • Was bedeutet Generativ...