Recurrent Neural Networks (RNNs) and Vanishing Gradients

Поділитися
Вставка
  • Опубліковано 28 чер 2024
  • For one, the way plain or vanilla RNN model sequences by recalling information from the immediate past, allows you to capture dependencies to a certain degree, at least. They're also relatively lightweight compared to other n-gram models, taking up less RAM and space. But there are downsides, the RNNs architecture optimized for recalling the immediate past causes it to struggle with longer sequences. And the RNNs method of propagating information is part of how vanishing/exploding gradients are created, both of which can cause your model training to fail. Vanishing/exploding gradients are a problem, this can arise due to the fact that RNNs propagates information from the beginning of the sequence through to the end. Starting with the first word of the sequence, the hidden value at the far left, the first values are computed here. Then it propagates some of the computed information, takes the second word in the sequence, and gets new values.
  • Наука та технологія

КОМЕНТАРІ •