Publication:
ASR Feature Extraction with Morphologically-Filtered Power-Normalized Cochleograms

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorCalle Silos, Fernando de laes
dc.contributor.authorValverde Albacete, Francisco Josées
dc.contributor.authorGallardo Antolín, Ascensiónes
dc.contributor.authorPeláez Moreno, Carmenes
dc.date.accessioned2015-07-30T11:19:46Z
dc.date.available2015-07-30T11:19:46Z
dc.date.issued2014
dc.descriptionProceedings of: 15th Annual Conference of the International Speech Communication Association. Singapore, September 14-18, 2014.en
dc.description.abstractIn this paper we present advances in the modeling of the masking behavior of the Human Auditory System to enhance the robustness of the feature extraction stage in Automatic Speech Recognition. The solution adopted is based on a non-linear filtering of a spectro-temporal representation applied simultaneously on both the frequency and time domains, by processing it using mathematical morphology operations as if it were an image. A particularly important component of this architecture is the so called structuring element: biologically-based considerations are addressed in the present contribution to design an element that closely resembles the masking phenomena taking place in the cochlea. The second feature of this contribution is the choice of underlying spectro-temporal representation. The best results were achieved by the representation introduced as part of the Power Normalized Cepstral Coefficients together with a spectral subtraction step. On the Aurora 2 noisy continuous digits task, we report relative error reductions of 18.7% compared to PNCC and 39.5% compared to MFCC.en
dc.description.sponsorshipThis contribution has been supported by an Airbus Defense and Space Grant (Open Innovation - SAVIER) and Spanish Government-CICYT project 2011-26807/TEC.en
dc.description.statusPublicadoes
dc.format.extent5
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationLi, Haizhou, et al. (eds). (2014). INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014. (pp. 2430-2434). International Speech Communication Association.en
dc.identifier.isbn9781634394352
dc.identifier.publicationfirstpage2430
dc.identifier.publicationlastpage2434
dc.identifier.publicationtitleINTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014.en
dc.identifier.urihttps://hdl.handle.net/10016/21480
dc.identifier.uxxiCC/0000022423
dc.language.isoengen
dc.publisherInternational Speech Communication Associationen
dc.relation.eventdateSeptember 14-18, 2014.en
dc.relation.eventnumber15
dc.relation.eventplaceSingaporeen
dc.relation.eventtitleAnnual Conference of the International Speech Communication Association (INTERSPEECH 2014)en
dc.relation.projectIDGobierno de España. TEC2011-26807es
dc.relation.publisherversionhttp://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2430.pdfen
dc.rights© 2014 ISCAes
dc.rights.accessRightsopen accesses
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherSpectro-temporal processingen
dc.subject.otherMorphological filteringen
dc.subject.otherAutomatic speech recognitionen
dc.subject.otherAuditory-based featuresen
dc.subject.otherPNCCen
dc.titleASR Feature Extraction with Morphologically-Filtered Power-Normalized Cochleogramsen
dc.typeconference poster*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Feature_INTERSPEECH_2014.pdf
Size:
342.09 KB
Format:
Adobe Portable Document Format