Publication:
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorGarcía-Moral, Ana I.
dc.contributor.authorSolera Ureña, R.
dc.contributor.authorPeláez Moreno, Carmen
dc.contributor.authorDíaz de María, Fernando
dc.date.accessioned2012-01-25T09:20:13Z
dc.date.available2012-01-25T09:20:13Z
dc.date.issued2011-03
dc.description.abstractHybrid speech recognizers, where the estimation of the emission pdf of the states of Hidden Markov Models (HMMs), usually carried out using Gaussian Mixture Models (GMMs), is substituted by Artificial Neural Networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in the paper.
dc.description.sponsorshipThis work was supported in part by the regional grant (Comunidad Autónoma de Madrid-UC3M) CCG06-UC3M/TIC-0812 and in part by a project funded by the Spanish Ministry of Science and Innovation (TEC 2008-06382).
dc.description.statusPublicado
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationIEEE Transactions on Audio, Speech, and Language Processing, 19(3), Mar. 2011, pp. 468–481
dc.identifier.doi10.1109/TASL.2010.2050513
dc.identifier.issn1558-7916
dc.identifier.publicationfirstpage468
dc.identifier.publicationissue3
dc.identifier.publicationlastpage481
dc.identifier.publicationtitleaIEEE Transactions on Audio, Speech, and Language Processing
dc.identifier.publicationvolume19
dc.identifier.urihttps://hdl.handle.net/10016/13074
dc.language.isoeng
dc.publisherIEEE
dc.relation.publisherversionhttp://dx.doi.org/10.1109/TASL.2010.2050513
dc.rights© IEEE
dc.rights.accessRightsopen access
dc.subject.ecienciaTelecomunicaciones
dc.subject.otherRobust ASR
dc.subject.otherAdditive noise
dc.subject.otherMachine learning
dc.subject.otherHybrid ASR
dc.subject.otherArtificial Neural Networks
dc.subject.otherMultilayer Perceptrons
dc.subject.otherHidden Markov Models
dc.subject.otherActive Learning
dc.subject.otherANN/HMM
dc.subject.otherMLP/HMM
dc.titleData Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
dc.typeresearch article*
dc.type.hasVersionAM*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TASLP09_revised_doublecolumn.pdf
Size:
348.25 KB
Format:
Adobe Portable Document Format