Archivo Abierto Institucional de la Universidad Carlos III de Madrid >
Departamento de Teoría de la Señal y Comunicaciones >
Grupo de Procesado Multimedia >
DTSC - GPM - Capítulos de Monografías >
Please use this identifier to cite or link to this item:
|Title: ||SVMs for Automatic Speech Recognition: a Survey|
|Author(s): ||Solera Ureña, R.|
Padrell Sendra, J.
Martín Iglesias, D.
Días de María, F.
|Issued date: ||2007|
|Citation: ||Progress in Nonlinear Speech Processing. Springer, 2007. ISBN 978-3-540-71503-0. PP. 190-216|
|ISSN: ||0302-9743 (Print)|
|Abstract: ||Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.|
|Serie / Nº.: ||Lecture Notes on Computer Science|
|Publisher version: ||http://www.springerlink.com/content/r828226517290181/fulltext.pdf|
|Appears in Collections:||DTSC - GPM - Capítulos de Monografías|
Items in E-Archivo are protected by copyright, with all rights reserved, unless otherwise indicated.