Español English Contacte con nosotros http://www.uc3m.es/portal/page/portal/biblioteca
DSpace e-Archivo

Archivo Abierto Institucional de la Universidad Carlos III de Madrid > Investigación > Departamentos > Departamento de Teoría de la Señal y Comunicaciones > Grupo de Procesado Multimedia > DTSC - GPM - Artículos de Revistas >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10016/2319

Files in This Item:
pelaez01.pdfpublished version149,6 kBAdobe PDFformato pdf
Title: Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web
Author(s): Peláez-Moreno, Carmen
Gallardo Antolín, Asunción
Díaz-de-María, Fernando
Publisher: IEEE Circuits & Systems Society : IEEE Signal Processing Society : IEEE Communications Society : IEEE Computer Society
Issued date: 2001
Citation: IEEE Transactions on multimedia. Vol. 3, no. 2, Junio 2001, pp.209—218
URI: http://hdl.handle.net/10016/2319
ISSN: 1520-9210
DOI: 10.1109/TSA.2005.853210
Abstract: The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.
Review: PeerReviewed
Publisher version: http://dx.doi.org/10.1109/TSA.2005.853210
Keywords: Internet
Internet telephony
Decoding
Encoding
Error handling
Information resources
Protocols
Speech coding
Speech recognition
ITU G.723.1 standard codec
Rights: © IEEE
Appears in Collections:DTSC - GPM - Artículos de Revistas

Refworks Export

SFX Query

Items in E-Archivo are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! © Universidad Carlos III de Madrid - Software DSpace - Terms of use - Feedback