Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Mostrar todas las versiones(2)

This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Torres, H.M., Gurlekian, J.A., Evin, D.A., Cossio Mercado, C.G.
Formato:	JOUR
Materias:	Argentine Spanish Phonetic corpus Phonetic transcription Speech corpus design Text-to-speech
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_1574020X_v_n_p_Torres
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_1574020X_v_n_p_Torres
record_format	dspace
spelling	todo:paper_1574020X_v_n_p_Torres2023-10-03T16:27:36Z Emilia: a speech corpus for Argentine Spanish text to speech synthesis Torres, H.M. Gurlekian, J.A. Evin, D.A. Cossio Mercado, C.G. Argentine Spanish Phonetic corpus Phonetic transcription Speech corpus design Text-to-speech This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one. © 2019, Springer Nature B.V. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_1574020X_v_n_p_Torres
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Argentine Spanish Phonetic corpus Phonetic transcription Speech corpus design Text-to-speech
spellingShingle	Argentine Spanish Phonetic corpus Phonetic transcription Speech corpus design Text-to-speech Torres, H.M. Gurlekian, J.A. Evin, D.A. Cossio Mercado, C.G. Emilia: a speech corpus for Argentine Spanish text to speech synthesis
topic_facet	Argentine Spanish Phonetic corpus Phonetic transcription Speech corpus design Text-to-speech
description	This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one. © 2019, Springer Nature B.V.
format	JOUR
author	Torres, H.M. Gurlekian, J.A. Evin, D.A. Cossio Mercado, C.G.
author_facet	Torres, H.M. Gurlekian, J.A. Evin, D.A. Cossio Mercado, C.G.
author_sort	Torres, H.M.
title	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
title_short	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
title_full	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
title_fullStr	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
title_full_unstemmed	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
title_sort	emilia: a speech corpus for argentine spanish text to speech synthesis
url	http://hdl.handle.net/20.500.12110/paper_1574020X_v_n_p_Torres
work_keys_str_mv	AT torreshm emiliaaspeechcorpusforargentinespanishtexttospeechsynthesis AT gurlekianja emiliaaspeechcorpusforargentinespanishtexttospeechsynthesis AT evinda emiliaaspeechcorpusforargentinespanishtexttospeechsynthesis AT cossiomercadocg emiliaaspeechcorpusforargentinespanishtexttospeechsynthesis
_version_	1807318626695380992

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Ejemplares similares