Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

García-Moral, Ana I.; Solera Ureña, R.; Peláez Moreno, Carmen; Díaz de María, Fernando

Publication:
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

dc.affiliation.dpto	UC3M. Departamento de Teoría de la Señal y Comunicaciones	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Procesado Multimedia	es
dc.contributor.author	García-Moral, Ana I.
dc.contributor.author	Solera Ureña, R.
dc.contributor.author	Peláez Moreno, Carmen
dc.contributor.author	Díaz de María, Fernando
dc.date.accessioned	2012-01-25T09:20:13Z
dc.date.available	2012-01-25T09:20:13Z
dc.date.issued	2011-03
dc.description.abstract	Hybrid speech recognizers, where the estimation of the emission pdf of the states of Hidden Markov Models (HMMs), usually carried out using Gaussian Mixture Models (GMMs), is substituted by Artificial Neural Networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in the paper.
dc.description.sponsorship	This work was supported in part by the regional grant (Comunidad Autónoma de Madrid-UC3M) CCG06-UC3M/TIC-0812 and in part by a project funded by the Spanish Ministry of Science and Innovation (TEC 2008-06382).
dc.description.status	Publicado
dc.format.mimetype	application/pdf
dc.identifier.bibliographicCitation	IEEE Transactions on Audio, Speech, and Language Processing, 19(3), Mar. 2011, pp. 468–481
dc.identifier.doi	10.1109/TASL.2010.2050513
dc.identifier.issn	1558-7916
dc.identifier.publicationfirstpage	468
dc.identifier.publicationissue	3
dc.identifier.publicationlastpage	481
dc.identifier.publicationtitle	aIEEE Transactions on Audio, Speech, and Language Processing
dc.identifier.publicationvolume	19
dc.identifier.uri	https://hdl.handle.net/10016/13074
dc.language.iso	eng
dc.publisher	IEEE
dc.relation.publisherversion	http://dx.doi.org/10.1109/TASL.2010.2050513
dc.rights	© IEEE
dc.rights.accessRights	open access
dc.subject.eciencia	Telecomunicaciones
dc.subject.other	Robust ASR
dc.subject.other	Additive noise
dc.subject.other	Machine learning
dc.subject.other	Hybrid ASR
dc.subject.other	Artificial Neural Networks
dc.subject.other	Multilayer Perceptrons
dc.subject.other	Hidden Markov Models
dc.subject.other	Active Learning
dc.subject.other	ANN/HMM
dc.subject.other	MLP/HMM
dc.title	Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
dc.type	research article	*
dc.type.hasVersion	AM	*
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TASLP09_revised_doublecolumn.pdf
Size:: 348.25 KB
Format:: Adobe Portable Document Format

Download

Collections

DTSC - GPM - Artículos de Revistas

Publication: Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

Files

Original bundle

Collections

Publication:
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems