A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

e-Archivo Repository

Show simple item record

dc.contributor.author Gallardo Antolín, Ascensión
dc.contributor.author Montero, Juan Manuel
dc.contributor.author King, Simon
dc.date.accessioned 2015-07-30T09:27:02Z
dc.date.available 2015-07-30T09:27:02Z
dc.date.issued 2014
dc.identifier.bibliographicCitation Li, Haizhou, et al. (eds). (2014). INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014. (pp. 2370-2374). International Speech Communication Association.
dc.identifier.isbn 9781634394352
dc.identifier.uri http://hdl.handle.net/10016/21478
dc.description Proceedings of: 15th Annual Conference of the International Speech Communication Association. Singapore, September 14-18, 2014.
dc.description.abstract Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
dc.description.sponsorship This work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant, no. EP/I031022/1 (Natural Speech Technology, NST) and Spanish Government grants TEC2011-26807 and DPI2010-21247-C02-02.
dc.format.extent 5
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher International Speech Communication Association
dc.rights © 2014 ISCA
dc.subject.other Diarization
dc.subject.other Audio segmentation
dc.subject.other Expressive text-to-speech
dc.subject.other Media recordings
dc.title A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
dc.type bookPart
dc.type conferenceObject
dc.description.status Publicado
dc.relation.publisherversion http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2370.pdf
dc.subject.eciencia Telecomunicaciones
dc.rights.accessRights openAccess
dc.relation.projectID Gobierno de España. TEC2011-26807
dc.type.version publishedVersion
dc.relation.eventdate September 14-18, 2014.
dc.relation.eventnumber 15
dc.relation.eventplace Singapore
dc.relation.eventtitle 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014).
dc.relation.eventtype poster
dc.identifier.publicationfirstpage 2370
dc.identifier.publicationlastpage 2374
dc.identifier.publicationtitle INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014.
dc.identifier.uxxi CC/0000022424
 Find Full text

Files in this item

*Click on file's image for preview. (Embargoed files's preview is not supported)


This item appears in the following Collection(s)

Show simple item record