Improving speech synthesis quality by reducing pitch peaks in the source recordings

We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two co...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Violante, L.
Otros Autores: Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
Formato: Acta de conferencia Capítulo de libro
Lenguaje:Inglés
Publicado: Association for Computational Linguistics (ACL) 2013
Acceso en línea:Registro en Scopus
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 04688caa a22005177a 4500
001 PAPER-13017
003 AR-BaUEN
005 20230518204315.0
008 190411s2013 xx ||||fo|||| 10| 0 eng|d
024 7 |2 scopus  |a 2-s2.0-84926221891 
040 |a Scopus  |b spa  |c AR-BaUEN  |d AR-BaUEN 
100 1 |a Violante, L. 
245 1 0 |a Improving speech synthesis quality by reducing pitch peaks in the source recordings 
260 |b Association for Computational Linguistics (ACL)  |c 2013 
506 |2 openaire  |e Política editorial 
504 |a Black, A.W., Lenzo, K.A., (2007) Building Synthetic Voices, , http://festvox.org/bsv, Language Technologies Institute, Carnegie Mellon University 
504 |a Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H., (2001) The festival speech synthesis system 
504 |a Boersma, P., Weenink, D., (2012) Praat: Doing Phonetics by Computer, , http://www.praat.org/ 
504 |a Gurlekian, J., Colantoni, L., Torres, H., El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados (2001) Fonoaudiológica, 47, pp. 58-69 
504 |a Gurlekian, J.A., Cossio-Mercado, C., Torres, H., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for Argentine Spanish (2012) Proceedings of Iberspeech, , Madrid, Spain 
504 |a Moulines, E., Charpentier, F., Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones (1990) Speech Communication, 9 (5), pp. 453-467 
504 |a Nye, P.W., Gaitenby, J.H., The intelligibility of synthetic monosyllabic words in short, syntactically normal sentences (1974) Haskins Laboratories Status Report on Speech Research, 37 (38), pp. 169-190 
504 |a Schröder, M., Trouvain, J., The German text-tospeech synthesis system MARY: A tool for research, development and teaching (2003) International Journal of Speech Technology, 6 (4), pp. 365-377 
504 |a Torres, H.M., Gurlekian, J.A., Automatic determination of phrase breaks for Argentine Spanish (2004) Speech Prosody 2004, International Conference 
504 |a Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83A4 - Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten 
520 3 |a We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility. © 2013 Association for Computational Linguistics.  |l eng 
593 |a Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina 
690 1 0 |a COMPUTATIONAL LINGUISTICS 
690 1 0 |a CONTINUOUS SPEECH RECOGNITION 
690 1 0 |a SPEECH SYNTHESIS 
690 1 0 |a CORPUS-BASED 
690 1 0 |a HMM-BASED 
690 1 0 |a SPEECH SYNTHESIZER 
690 1 0 |a SYNTHESIZED SPEECH 
690 1 0 |a AUDIO RECORDINGS 
700 1 |a Rodríguez Zivic, P. 
700 1 |a Gravano, A. 
700 1 |a Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten 
711 2 |d 9 June 2013 through 14 June 2013  |g Código de la conferencia: 111457 
773 0 |d Association for Computational Linguistics (ACL), 2013  |h pp. 502-506  |p NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf.  |n NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference  |z 9781937284473  |t 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 
856 4 1 |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84926221891&partnerID=40&md5=09042af08f13da35f68fdb8157548bc7  |y Registro en Scopus 
856 4 0 |u https://hdl.handle.net/20.500.12110/paper_97819372_v_n_p502_Violante  |y Handle 
856 4 0 |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante  |y Registro en la Biblioteca Digital 
961 |a paper_97819372_v_n_p502_Violante  |b paper  |c PE 
962 |a info:eu-repo/semantics/conferenceObject  |a info:ar-repo/semantics/documento de conferencia  |b info:eu-repo/semantics/publishedVersion 
999 |c 73970