RT Conference Proceedings
T1 ASR Feature Extraction with Morphologically-Filtered Power-Normalized Cochleograms
A1 Calle Silos, Fernando de la
A1 Valverde Albacete, Francisco José
A1 Gallardo Antolín, Ascensión
A1 Peláez Moreno, Carmen
AB In this paper we present advances in the modeling of the masking behavior of the Human Auditory System to enhance the robustness of the feature extraction stage in Automatic Speech Recognition. The solution adopted is based on a non-linear filtering of a spectro-temporal representation applied simultaneously on both the frequency and time domains, by processing it using mathematical morphology operations as if it were an image. A particularly important component of this architecture is the so called structuring element: biologically-based considerations are addressed in the present contribution to design an element that closely resembles the masking phenomena taking place in the cochlea. The second feature of this contribution is the choice of underlying spectro-temporal representation. The best results were achieved by the representation introduced as part of the Power Normalized Cepstral Coefficients together with a spectral subtraction step. On the Aurora 2 noisy continuous digits task, we report relative error reductions of 18.7% compared to PNCC and 39.5% compared to MFCC.
PB International Speech Communication Association
SN 9781634394352
YR 2014
FD 2014
LK https://hdl.handle.net/10016/21480
UL https://hdl.handle.net/10016/21480
LA eng
NO Proceedings of: 15th Annual Conference of the International Speech Communication Association. Singapore, September 14-18, 2014.
NO This contribution has been supported by an Airbus Defense and Space Grant (Open Innovation - SAVIER) and Spanish Government-CICYT project 2011-26807/TEC.
DS e-Archivo
RD 1 jul. 2024