End-to-end recurrent denoising autoencoder embeddings for speaker identification

Rituerto González, Esther; Peláez Moreno, Carmen

Publication:
End-to-end recurrent denoising autoencoder embeddings for speaker identification

dc.affiliation.dpto	UC3M. Departamento de Teoría de la Señal y Comunicaciones	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Procesado Multimedia	es
dc.contributor.author	Rituerto González, Esther
dc.contributor.author	Peláez Moreno, Carmen
dc.contributor.funder	Comunidad de Madrid	es
dc.date.accessioned	2022-10-10T10:43:49Z
dc.date.available	2022-10-10T10:43:49Z
dc.date.issued	2021-05-10
dc.description.abstract	Speech -in-the-wild- is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as handcrafted features.	en
dc.description.sponsorship	The authors would like to thank the rest of the members of the UC3M4Safety for their support and NVIDIA Corporation for the donation of a TITAN Xp. This work has been partially supported by the Dept. of Research and Innovation of Madrid Regional Authority (EMPATIA-CM Y2018/TCS-5046) and the Dept. of Education and Research of Madrid Regional Authority with a European Social Fund for the Pre-doctoral Research Staff grant for Research Activities, within the CAM Youth Employment Programme (PEJD-2019-PRE/TIC-16295).	en
dc.format.extent	11
dc.identifier.bibliographicCitation	Rituerto-González, E. & Peláez-Moreno, C. (2021, 10 mayo). End-to-end recurrent denoising autoencoder embeddings for speaker identification. Neural Computing and Applications, 33(21), 14429-14439.	en
dc.identifier.doi	https://doi.org/10.1007/s00521-021-06083-7
dc.identifier.issn	0941-0643
dc.identifier.publicationfirstpage	14429
dc.identifier.publicationlastpage	14439
dc.identifier.publicationtitle	Neural Computing & Applications	en
dc.identifier.publicationvolume	33
dc.identifier.uri	https://hdl.handle.net/10016/35865
dc.identifier.uxxi	AR/0000030589
dc.language.iso	eng
dc.publisher	Springer	en
dc.relation.projectID	Comunidad de Madrid. PEJD-2019-PRE/TIC-16295	es
dc.relation.projectID	Comunidad de Madrid. EMPATIA-CM Y2018/TCS-504	es
dc.rights	© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021	en
dc.rights.accessRights	open access	en
dc.subject.eciencia	Telecomunicaciones	es
dc.subject.other	Denoising autoencoder	en
dc.subject.other	Speaker embeddings	en
dc.subject.other	Noisy conditions	en
dc.subject.other	Stress	en
dc.subject.other	End-to-end model	en
dc.subject.other	Speaker identification	en
dc.title	End-to-end recurrent denoising autoencoder embeddings for speaker identification	en
dc.type	research article	*
dc.type.hasVersion	AM	*
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: End-to-end_NCAA_2021_ps.pdf
Size:: 917.91 KB
Format:: Adobe Portable Document Format

Download

Collections

DTSC - GPM - Artículos de Revistas

Publication: End-to-end recurrent denoising autoencoder embeddings for speaker identification

Files

Original bundle

Collections

Publication:
End-to-end recurrent denoising autoencoder embeddings for speaker identification