Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

Calle Silos, Fernando de la; Valverde Albacete, Francisco José; Gallardo Antolín, Ascensión; Peláez Moreno, Carmen

Publication:
Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

Identifiers

URI: https://hdl.handle.net/10016/21710

ISSN: 2329-9290

DOI: 10.1109/TASLP.2015.2464691

UXXI: AR/0000017269

Files

morphologically_TASLP_2015_ps.pdf (1.02 MB)

Publication date

2015-11

Authors

Calle Silos, Fernando de la

Valverde Albacete, Francisco José

Gallardo Antolín, Ascensión

Peláez Moreno, Carmen

Publisher

IEEE

Impact

Export

Abstract

In this paper, we present advances in the modeling of the masking behavior of the human auditory system (HAS) to enhance the robustness of the feature extraction stage in automatic speech recognition (ASR). The solution adopted is based on a nonlinear filtering of a spectro-temporal representation applied simultaneously to both frequency and time domains-as if it were an image-using mathematical morphology operations. A particularly important component of this architecture is the so-called structuring element (SE) that in the present contribution is designed as a single three-dimensional pattern using physiological facts, in such a way that closely resembles the masking phenomena taking place in the cochlea. A proper choice of spectro-temporal representation lends validity to the model throughout the whole frequency spectrum and intensity spans assuming the variability of the masking properties of the HAS in these two domains. The best results were achieved with the representation introduced as part of the power normalized cepstral coefficients (PNCC) together with a spectral subtraction step. This method has been tested on Aurora 2, Wall Street Journal and ISOLET databases including both classical hidden Markov model (HMM) and hybrid artificial neural networks (ANN)-HMM back-ends. In these, the proposed front-end analysis provides substantial and significant improvements compared to baseline techniques: up to 39.5% relative improvement compared to MFCC, and 18.7% compared to PNCC in the Aurora 2 database.

Keywords

Spectro-temporal processing, Cochlear masking models, Morphological filtering, Automatic speech recognition, Auditory-based features, PNCC

Bibliographic citation

IEEE/ACM Transactions on Audio, Speech, and Language Processing (2015). 23(11), 2070-2080.

Collections

DTSC - GPM - Artículos de Revistas

Full item page

Publication:
Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication: Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR

Identifiers

Files

Publication date

Defense date

Authors

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

Impact

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Collections

Publication:
Morphologically filtered power-normalized cochleograms as robust, biologically inspired features for ASR