Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

e-Archivo Repository

Show simple item record

dc.contributor.author Calle Silos, Fernando de la
dc.contributor.author Valverde Albacete, Francisco José
dc.contributor.author Gallardo Antolín, Ascensión
dc.contributor.author Peláez Moreno, Carmen
dc.date.accessioned 2015-10-14T08:51:41Z
dc.date.available 2015-10-14T08:51:41Z
dc.date.issued 2015-11
dc.identifier.bibliographicCitation IEEE/ACM Transactions on Audio, Speech, and Language Processing (2015). 23(11), 2070-2080.
dc.identifier.issn 2329-9290
dc.identifier.uri http://hdl.handle.net/10016/21710
dc.description.abstract In this paper, we present advances in the modeling of the masking behavior of the human auditory system (HAS) to enhance the robustness of the feature extraction stage in automatic speech recognition (ASR). The solution adopted is based on a nonlinear filtering of a spectro-temporal representation applied simultaneously to both frequency and time domains-as if it were an image-using mathematical morphology operations. A particularly important component of this architecture is the so-called structuring element (SE) that in the present contribution is designed as a single three-dimensional pattern using physiological facts, in such a way that closely resembles the masking phenomena taking place in the cochlea. A proper choice of spectro-temporal representation lends validity to the model throughout the whole frequency spectrum and intensity spans assuming the variability of the masking properties of the HAS in these two domains. The best results were achieved with the representation introduced as part of the power normalized cepstral coefficients (PNCC) together with a spectral subtraction step. This method has been tested on Aurora 2, Wall Street Journal and ISOLET databases including both classical hidden Markov model (HMM) and hybrid artificial neural networks (ANN)-HMM back-ends. In these, the proposed front-end analysis provides substantial and significant improvements compared to baseline techniques: up to 39.5% relative improvement compared to MFCC, and 18.7% compared to PNCC in the Aurora 2 database.
dc.description.sponsorship This contribution has been supported by an Airbus Defense and Space Grant (Open Innovation - SAVIER) and Spanish Government-CICYT projects TEC2014-53390-P and TEC2014-61729-EXP
dc.format.extent 13
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher IEEE
dc.rights © 2015 IEEE.
dc.subject.other Spectro-temporal processing
dc.subject.other Cochlear masking models
dc.subject.other Morphological filtering
dc.subject.other Automatic speech recognition
dc.subject.other Auditory-based features
dc.subject.other PNCC
dc.title Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR
dc.type article
dc.relation.publisherversion http://dx.doi.org/10.1109/TASLP.2015.2464691
dc.subject.eciencia Telecomunicaciones
dc.identifier.doi 10.1109/TASLP.2015.2464691
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. TEC2014-53390-P
dc.relation.projectID Gobierno de España. TEC2014-61729-EXP
dc.type.version acceptedVersion
dc.identifier.publicationfirstpage 2070
dc.identifier.publicationissue 11
dc.identifier.publicationlastpage 2080
dc.identifier.publicationtitle IEEE-ACM Transactions on audio speech and language processing
dc.identifier.publicationvolume 23
dc.identifier.uxxi AR/0000017269
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


This item appears in the following Collection(s)

Show simple item record