ICNLSP 2023: Direct Speech to Text Translation: Bridging the Modality Gap Using SimSiam

Поділитися
Вставка
  • Опубліковано 14 жов 2024
  • Title of the presentation: Direct Speech to Text Translation: Bridging the Modality Gap Using
    SimSiam.
    By: Balaram Sarkar (Indian Institute of Technology Indore); Chandresh K Maurya (IBM Research); Anshuman Agrahri (IIT, Indore)
    , India.
    6th International Conference on Natural Language and Speech Processing.
    icnlsp.org/202...
    Abstract:
    Learning similar representations for spoken utterances and their written text involves understanding both forms in a shared manner. This process of developing similar representations for semantically related speech and text is essential, particularly for tasks like speech-to-text (S2T) translation. To that end, we propose a SimSiam-based S2T (S3T) model that leverages the SimSiam network, a state-of-the-art unsupervised learning architecture, to bridge
    the modality gap between speech and text. The proposed model does not require negative sample mining. The comparative study using four directions of the standard MuST-C (Di Gangi
    et al., 2019) dataset demonstrates that the proposed S3T translation model beats all the existing methods, and achieves an average metric of 30.02 BLEU score. Our analysis affirms that
    S3T effectively bridges the representation gap between the two modalities.

КОМЕНТАРІ •