ICNLSP 2023: Comparing Modular and End-To-End Approaches in ASR for Well-Resourced and ...

Поділитися
Вставка
  • Опубліковано 14 жов 2024
  • Title of the presentation: Comparing Modular and End-To-End Approaches in ASR for Well-Resourced and Low-Resourced
    Languages.
    By: Aditya Parikh, Louis ten Bosch, Henk van den Heuvel, Cristian Tejedor-Garcia
    ,
    Centre for Language and Speech Technology, Radboud University, Nijmegen, The Netherlands.
    6th International Conference on Natural Language and Speech Processing.
    icnlsp.org/202...
    Abstract:
    We present a comparative study of a state-ofthe-art traditional modular Automatic Speech
    Recognition (Kaldi ASR) and an end-to-end ASR (wav2vec 2.0) for a well-resourced language (Spanish) and a low-resourced language (Irish). We created ASRs for both languages
    and evaluated their performance under different update regimes. Our results show that the
    end-to-end wav2vec 2.0 outperforms the modular ASR for both languages in terms of Word
    Error Rate (WER) but performs worst in terms of real-time decoding. We also addressed the
    issue of non-lexical words in wav2vec 2.0’s output. We found that in wav2vec 2.0 by LM
    integration with shallow fusion and increasing LM weight to 0.7 and 0.8 respectively for the
    Spanish and Irish provided the optimum ASR performance by reducing non-lexical words.
    However, this does not eliminate all non-lexical words. Finally, our study found that Kaldi ASR
    would perform best for real-time decoding for longer audio inputs compared to wav2vec 2.0
    model trained on the same dataset on the minimal infrastructure, although wav2vec 2.0’s performance can be improved with a GPU acceleration in backend. These results may have
    significant implications for creating real-time ASR services, especially for low-resourced languages.

КОМЕНТАРІ •