Publication:
An auditory saliency pooling-based LSTM model for speech intelligibility classification

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorGallardo Antolín, Ascensión
dc.contributor.authorMontero, Juan Manuel
dc.contributor.funderMinisterio de Economía y Competitividad (España)es
dc.contributor.funderUniversidad Carlos III de Madrides
dc.date.accessioned2021-11-29T11:13:17Z
dc.date.available2021-11-29T11:13:17Z
dc.date.issued2021-09
dc.descriptionThis article belongs to the Section Computer and Engineering Science and Symmetry/Asymmetry.en
dc.description.abstractSpeech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.en
dc.description.sponsorshipThe work leading to these results has been supported by the Spanish Ministry of Economy, Industry and Competitiveness through TEC2017-84395-P (MINECO) and TEC2017-84593-C2-1-R (MINECO) projects (AEI/FEDER, UE), and the Universidad Carlos III de Madrid under Strategic Action 2018/00071/001.en
dc.format.extent15
dc.identifier.bibliographicCitationGallardo-Antolín, A. & Montero, J. M. (2021a). An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification. Symmetry, 13(9), 1728.en
dc.identifier.doihttps://doi.org/10.3390/sym13091728
dc.identifier.issn2073-8994
dc.identifier.publicationfirstpage1728
dc.identifier.publicationissue9
dc.identifier.publicationtitleSymmetryen
dc.identifier.publicationvolume13
dc.identifier.urihttps://hdl.handle.net/10016/33706
dc.identifier.uxxiAR/0000028624
dc.language.isoeng
dc.publisherMDPI
dc.relation.projectIDGobierno de España. TEC2017-84395-Pes
dc.relation.projectIDGobierno de España. TEC2017-84593-C2-1-Res
dc.rights© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.en
dc.rightsAtribución 3.0 España*
dc.rights.accessRightsopen accessen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherSpeech intelligibilityen
dc.subject.otherLSTMen
dc.subject.otherWeighted poolingen
dc.subject.otherAttentionen
dc.subject.otherSaliencyen
dc.subject.otherAuditory saliency modelen
dc.titleAn auditory saliency pooling-based LSTM model for speech intelligibility classificationen
dc.typeresearch article*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Auditory_SYMMETRY_2021.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format