Publication:
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

dc.affiliation.dptoUC3M. Departamento de Teoría de la Señal y Comunicacioneses
dc.affiliation.grupoinvUC3M. Grupo de Investigación: Procesado Multimediaes
dc.contributor.authorGallardo Antolín, Ascensiónes
dc.contributor.authorMontero, Juan Manueles
dc.contributor.authorKing, Simones
dc.date.accessioned2015-07-30T09:27:02Z
dc.date.available2015-07-30T09:27:02Z
dc.date.issued2014
dc.descriptionProceedings of: 15th Annual Conference of the International Speech Communication Association. Singapore, September 14-18, 2014.en
dc.description.abstractTraditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.en
dc.description.sponsorshipThis work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant, no. EP/I031022/1 (Natural Speech Technology, NST) and Spanish Government grants TEC2011-26807 and DPI2010-21247-C02-02.en
dc.description.statusPublicadoes
dc.format.extent5
dc.format.mimetypeapplication/pdf
dc.identifier.bibliographicCitationLi, Haizhou, et al. (eds). (2014). INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014. (pp. 2370-2374). International Speech Communication Association.en
dc.identifier.isbn9781634394352
dc.identifier.publicationfirstpage2370
dc.identifier.publicationlastpage2374
dc.identifier.publicationtitleINTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014.en
dc.identifier.urihttp://hdl.handle.net/10016/21478
dc.identifier.uxxiCC/0000022424
dc.language.isoengen
dc.publisherInternational Speech Communication Associationen
dc.relation.eventdateSeptember 14-18, 2014.en
dc.relation.eventnumber15
dc.relation.eventplaceSingaporeen
dc.relation.eventtitle15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014).en
dc.relation.projectIDGobierno de España. TEC2011-26807es
dc.relation.publisherversionhttp://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2370.pdfen
dc.rights© 2014 ISCAen
dc.rights.accessRightsopen accessen
dc.subject.ecienciaTelecomunicacioneses
dc.subject.otherDiarizationen
dc.subject.otherAudio segmentationen
dc.subject.otherExpressive text-to-speechen
dc.subject.otherMedia recordingsen
dc.titleA Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesisen
dc.typeconference poster*
dc.type.hasVersionVoR*
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
comparison_INTERSPEECH_2014.pdf
Size:
403.12 KB
Format:
Adobe Portable Document Format