What Can We Learn From Subtitled Sign Language Data? Gül Varol, Asst. Prof@École des Ponts ParisTech

Поділитися
Вставка
  • Опубліковано 3 чер 2024
  • Gül Varol is a research faculty at the IMAGINE team of École des Ponts ParisTech. Previously, she was a postdoctoral researcher at the University of Oxford (VGG). She obtained her PhD from the WILLOW team of Inria Paris and École Normale Supérieure. Her research is focused on human understanding in videos, specifically action recognition, body shape and motion analysis, and sign languages.
    Abstract
    In this talk, Prof Gul will present our works on automatic sign language analysis by leveraging weakly-aligned subtitles for broadcast footage.
    In her own words,
    We first use subtitles to provide us candidate keywords to search and localize individual signs with two different approaches: (i) using mouthing cues at the lip region and (ii) looking up videos from sign language dictionaries. Then, we attempt to train a direct video-to-text sign language translation Transformer model with this unconstrained data. We observe that while the translation performance is low, a sign localisation ability emerges from the attention mechanism (iii). These three approaches allow us to automatically annotate 1 million video-sign pairs, which we use to train strong sign recognition models for a vocabulary of over 1,000 signs. However, the subtitles remain noisy, especially their alignments with the interpreted signing video when they are obtained through speech. Therefore, recently we tackle the problem of automatic subtitle alignment to temporally localise a sequence of text within a long continuous sign language video (iv). I will summarise results from the papers listed below and conclude by discussing open problems.
    (i) lnkd.in/gBPySa8a
    (ii) lnkd.in/g-i2zT3N
    (iii) lnkd.in/gFNqrBnk
    (iv) lnkd.in/gF37qwjF
  • Розваги

КОМЕНТАРІ •