Improving speech synthesis quality by reducing pitch peaks in the source recordings

Mostrar otras versiones (1)

We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two co...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Violante, L.
Otros Autores:	Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
Formato:	Acta de conferencia Capítulo de libro
Lenguaje:	Inglés
Publicado:	Association for Computational Linguistics (ACL) 2013
Acceso en línea:	Registro en Scopus Handle Registro en la Biblioteca Digital
Aporte de:	Registro referencial: Solicitar el recurso aquí Biblioteca Central Dr. Luis F. Leloir (FCEN) de Universidad de Buenos Aires


LEADER	04688caa a22005177a 4500
001	PAPER-13017
003	AR-BaUEN
005	20230518204315.0
008	190411s2013 xx \|\|\|\|fo\|\|\|\| 10\| 0 eng\|d
024	7		\|2 scopus \|a 2-s2.0-84926221891
040			\|a Scopus \|b spa \|c AR-BaUEN \|d AR-BaUEN
100	1		\|a Violante, L.
245	1	0	\|a Improving speech synthesis quality by reducing pitch peaks in the source recordings
260			\|b Association for Computational Linguistics (ACL) \|c 2013
506			\|2 openaire \|e Política editorial
504			\|a Black, A.W., Lenzo, K.A., (2007) Building Synthetic Voices, , http://festvox.org/bsv, Language Technologies Institute, Carnegie Mellon University
504			\|a Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H., (2001) The festival speech synthesis system
504			\|a Boersma, P., Weenink, D., (2012) Praat: Doing Phonetics by Computer, , http://www.praat.org/
504			\|a Gurlekian, J., Colantoni, L., Torres, H., El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados (2001) Fonoaudiológica, 47, pp. 58-69
504			\|a Gurlekian, J.A., Cossio-Mercado, C., Torres, H., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for Argentine Spanish (2012) Proceedings of Iberspeech, , Madrid, Spain
504			\|a Moulines, E., Charpentier, F., Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones (1990) Speech Communication, 9 (5), pp. 453-467
504			\|a Nye, P.W., Gaitenby, J.H., The intelligibility of synthetic monosyllabic words in short, syntactically normal sentences (1974) Haskins Laboratories Status Report on Speech Research, 37 (38), pp. 169-190
504			\|a Schröder, M., Trouvain, J., The German text-tospeech synthesis system MARY: A tool for research, development and teaching (2003) International Journal of Speech Technology, 6 (4), pp. 365-377
504			\|a Torres, H.M., Gurlekian, J.A., Automatic determination of phrase breaks for Argentine Spanish (2004) Speech Prosody 2004, International Conference
504			\|a Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83A4 - Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
520	3		\|a We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility. © 2013 Association for Computational Linguistics. \|l eng
593			\|a Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina
690	1	0	\|a COMPUTATIONAL LINGUISTICS
690	1	0	\|a CONTINUOUS SPEECH RECOGNITION
690	1	0	\|a SPEECH SYNTHESIS
690	1	0	\|a CORPUS-BASED
690	1	0	\|a HMM-BASED
690	1	0	\|a SPEECH SYNTHESIZER
690	1	0	\|a SYNTHESIZED SPEECH
690	1	0	\|a AUDIO RECORDINGS
700	1		\|a Rodríguez Zivic, P.
700	1		\|a Gravano, A.
700	1		\|a Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
711	2		\|d 9 June 2013 through 14 June 2013 \|g Código de la conferencia: 111457
773	0		\|d Association for Computational Linguistics (ACL), 2013 \|h pp. 502-506 \|p NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf. \|n NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference \|z 9781937284473 \|t 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
856	4	1	\|u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84926221891&partnerID=40&md5=09042af08f13da35f68fdb8157548bc7 \|y Registro en Scopus
856	4	0	\|u https://hdl.handle.net/20.500.12110/paper_97819372_v_n_p502_Violante \|y Handle
856	4	0	\|u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante \|y Registro en la Biblioteca Digital
961			\|a paper_97819372_v_n_p502_Violante \|b paper \|c PE
962			\|a info:eu-repo/semantics/conferenceObject \|a info:ar-repo/semantics/documento de conferencia \|b info:eu-repo/semantics/publishedVersion
999			\|c 73970

Improving speech synthesis quality by reducing pitch peaks in the source recordings

Ejemplares similares