Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation

Research Projects
Organizational Units
Journal Issue
Subtitles are a key element to make any media content accessible for people who suffer from hearing impairment and for elderly people, but also useful when watching TV in a noisy environment or learning new languages. Most of the time, subtitles are generated manually in advance, building a verbatim and synchronised transcription of the audio. However, in TV live broadcasts, captions are created in real time by a re-speaker with the help of a voice recognition software, which inevitability leads to delays and lack of synchronisation. In this paper, we present Deep-Sync, a tool for the alignment of subtitles with the audio-visual content. The architecture integrates a deep language representation model and a real-time voice recognition software to build a semantic-aware alignment tool that successfully aligns most of the subtitles even when there is no direct correspondence between the re-speaker and the audio content. In order to avoid any kind of censorship, Deep-Sync can be deployed directly on users' TVs causing a small delay to perform the alignment, but avoiding to delay the signal at the broadcaster station. Deep-Sync was compared with other subtitles alignment tool, showing that our proposal is able to improve the synchronisation in all tested cases.
tv broadcasting, synchronisation, language model, deep neural networks, machine learning
Bibliographic citation
Martín, A., González-Carrasco, I., Rodriguez-Fernandez, V. et al. Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation. Neural Comput & Applic (2021).