Publication:
Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorCalle Silos, Fernando de laes
dc.contributor.authorValverde Albacete, Francisco Josées
dc.contributor.authorGallardo Antolín, Ascensiónes
dc.contributor.authorPeláez Moreno, Carmenes
dc.date.accessioned2015-10-14T08:51:41Z
dc.date.available2015-10-14T08:51:41Z
dc.date.issued2015-11
dc.description.abstractIn this paper, we present advances in the modeling of the masking behavior of the human auditory system (HAS) to enhance the robustness of the feature extraction stage in automatic speech recognition (ASR). The solution adopted is based on a nonlinear filtering of a spectro-temporal representation applied simultaneously to both frequency and time domains-as if it were an image-using mathematical morphology operations. A particularly important component of this architecture is the so-called structuring element (SE) that in the present contribution is designed as a single three-dimensional pattern using physiological facts, in such a way that closely resembles the masking phenomena taking place in the cochlea. A proper choice of spectro-temporal representation lends validity to the model throughout the whole frequency spectrum and intensity spans assuming the variability of the masking properties of the HAS in these two domains. The best results were achieved with the representation introduced as part of the power normalized cepstral coefficients (PNCC) together with a spectral subtraction step. This method has been tested on Aurora 2, Wall Street Journal and ISOLET databases including both classical hidden Markov model (HMM) and hybrid artificial neural networks (ANN)-HMM back-ends. In these, the proposed front-end analysis provides substantial and significant improvements compared to baseline techniques: up to 39.5% relative improvement compared to MFCC, and 18.7% compared to PNCC in the Aurora 2 database.en
dc.description.sponsorshipThis contribution has been supported by an Airbus Defense and Space Grant (Open Innovation - SAVIER) and Spanish Government-CICYT projects TEC2014-53390-P and TEC2014-61729-EXPen
dc.format.extent13
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationIEEE/ACM Transactions on Audio, Speech, and Language Processing (2015). 23(11), 2070-2080.en
dc.identifier.doi10.1109/TASLP.2015.2464691
dc.identifier.issn2329-9290
dc.identifier.publicationfirstpage2070
dc.identifier.publicationissue11
dc.identifier.publicationlastpage2080
dc.identifier.publicationtitleIEEE-ACM Transactions on audio speech and language processingen
dc.identifier.publicationvolume23
dc.identifier.urihttps://hdl.handle.net/10016/21710
dc.identifier.uxxiAR/0000017269
dc.language.isoengen
dc.publisherIEEEen
dc.relation.projectIDGobierno de España. TEC2014-53390-Pes
dc.relation.projectIDGobierno de España. TEC2014-61729-EXPes
dc.relation.publisherversionhttp://dx.doi.org/10.1109/TASLP.2015.2464691
dc.rights© 2015 IEEE.es
dc.rights.accessRightsopen accessen
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherSpectro-temporal processingen
dc.subject.otherCochlear masking modelsen
dc.subject.otherMorphological filteringen
dc.subject.otherAutomatic speech recognitionen
dc.subject.otherAuditory-based featuresen
dc.subject.otherPNCCen
dc.titleMorphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASRen
dc.typeresearch article*
dc.type.hasVersionAM*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
morphologically_TASLP_2015_ps.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format