A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

Gallardo Antolín, Ascensión; Montero, Juan Manuel; King, Simon

Publication:
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

dc.affiliation.dpto	UC3M. Departamento de Teoría de la Señal y Comunicaciones	es
dc.affiliation.grupoinv	UC3M. Grupo de Investigación: Procesado Multimedia	es
dc.contributor.author	Gallardo Antolín, Ascensión	es
dc.contributor.author	Montero, Juan Manuel	es
dc.contributor.author	King, Simon	es
dc.date.accessioned	2015-07-30T09:27:02Z
dc.date.available	2015-07-30T09:27:02Z
dc.date.issued	2014
dc.description	Proceedings of: 15th Annual Conference of the International Speech Communication Association. Singapore, September 14-18, 2014.	en
dc.description.abstract	Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.	en
dc.description.sponsorship	This work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant, no. EP/I031022/1 (Natural Speech Technology, NST) and Spanish Government grants TEC2011-26807 and DPI2010-21247-C02-02.	en
dc.description.status	Publicado	es
dc.format.extent	5
dc.format.mimetype	application/pdf
dc.identifier.bibliographicCitation	Li, Haizhou, et al. (eds). (2014). INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014. (pp. 2370-2374). International Speech Communication Association.	en
dc.identifier.isbn	9781634394352
dc.identifier.publicationfirstpage	2370
dc.identifier.publicationlastpage	2374
dc.identifier.publicationtitle	INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014.	en
dc.identifier.uri	http://hdl.handle.net/10016/21478
dc.identifier.uxxi	CC/0000022424
dc.language.iso	eng	en
dc.publisher	International Speech Communication Association	en
dc.relation.eventdate	September 14-18, 2014.	en
dc.relation.eventnumber	15
dc.relation.eventplace	Singapore	en
dc.relation.eventtitle	15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014).	en
dc.relation.projectID	Gobierno de España. TEC2011-26807	es
dc.relation.publisherversion	http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2370.pdf	en
dc.rights	© 2014 ISCA	en
dc.rights.accessRights	open access	en
dc.subject.eciencia	Telecomunicaciones	es
dc.subject.other	Diarization	en
dc.subject.other	Audio segmentation	en
dc.subject.other	Expressive text-to-speech	en
dc.subject.other	Media recordings	en
dc.title	A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis	en
dc.type	conference poster	*
dc.type.hasVersion	VoR	*
dspace.entity.type	Publication

Files

Original bundle

Now showing 1 - 1 of 1

Name:: comparison_INTERSPEECH_2014.pdf
Size:: 403.12 KB
Format:: Adobe Portable Document Format

Download

Collections

DTSC - GPM - Comunicaciones en congresos y otros eventos
DTSC - GPM - Capítulos de Monografías

Publication: A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

Files

Original bundle

Collections

Publication:
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis