Publication:
End-to-end recurrent denoising autoencoder embeddings for speaker identification

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorRituerto González, Esther
dc.contributor.authorPeláez Moreno, Carmen
dc.contributor.funderComunidad de Madrides
dc.date.accessioned2022-10-10T10:43:49Z
dc.date.available2022-10-10T10:43:49Z
dc.date.issued2021-05-10
dc.description.abstractSpeech -in-the-wild- is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as handcrafted features.en
dc.description.sponsorshipThe authors would like to thank the rest of the members of the UC3M4Safety for their support and NVIDIA Corporation for the donation of a TITAN Xp. This work has been partially supported by the Dept. of Research and Innovation of Madrid Regional Authority (EMPATIA-CM Y2018/TCS-5046) and the Dept. of Education and Research of Madrid Regional Authority with a European Social Fund for the Pre-doctoral Research Staff grant for Research Activities, within the CAM Youth Employment Programme (PEJD-2019-PRE/TIC-16295).en
dc.format.extent11
dc.identifier.bibliographicCitationRituerto-González, E. & Peláez-Moreno, C. (2021, 10 mayo). End-to-end recurrent denoising autoencoder embeddings for speaker identification. Neural Computing and Applications, 33(21), 14429-14439.en
dc.identifier.doihttps://doi.org/10.1007/s00521-021-06083-7
dc.identifier.issn0941-0643
dc.identifier.publicationfirstpage14429
dc.identifier.publicationlastpage14439
dc.identifier.publicationtitleNeural Computing & Applicationsen
dc.identifier.publicationvolume33
dc.identifier.urihttps://hdl.handle.net/10016/35865
dc.identifier.uxxiAR/0000030589
dc.language.isoeng
dc.publisherSpringeren
dc.relation.projectIDComunidad de Madrid. PEJD-2019-PRE/TIC-16295es
dc.relation.projectIDComunidad de Madrid. EMPATIA-CM Y2018/TCS-504es
dc.rights© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021en
dc.rights.accessRightsopen accessen
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherDenoising autoencoderen
dc.subject.otherSpeaker embeddingsen
dc.subject.otherNoisy conditionsen
dc.subject.otherStressen
dc.subject.otherEnd-to-end modelen
dc.subject.otherSpeaker identificationen
dc.titleEnd-to-end recurrent denoising autoencoder embeddings for speaker identificationen
dc.typeresearch article*
dc.type.hasVersionAM*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
End-to-end_NCAA_2021_ps.pdf
Size:
917.91 KB
Format:
Adobe Portable Document Format