Improving speech synthesis quality by reducing pitch peaks in the source recordings
We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two co...
Guardado en:
| Autor principal: | |
|---|---|
| Otros Autores: | , , |
| Formato: | Acta de conferencia Capítulo de libro |
| Lenguaje: | Inglés |
| Publicado: |
Association for Computational Linguistics (ACL)
2013
|
| Acceso en línea: | Registro en Scopus Handle Registro en la Biblioteca Digital |
| Aporte de: | Registro referencial: Solicitar el recurso aquí |
| LEADER | 04688caa a22005177a 4500 | ||
|---|---|---|---|
| 001 | PAPER-13017 | ||
| 003 | AR-BaUEN | ||
| 005 | 20230518204315.0 | ||
| 008 | 190411s2013 xx ||||fo|||| 10| 0 eng|d | ||
| 024 | 7 | |2 scopus |a 2-s2.0-84926221891 | |
| 040 | |a Scopus |b spa |c AR-BaUEN |d AR-BaUEN | ||
| 100 | 1 | |a Violante, L. | |
| 245 | 1 | 0 | |a Improving speech synthesis quality by reducing pitch peaks in the source recordings |
| 260 | |b Association for Computational Linguistics (ACL) |c 2013 | ||
| 506 | |2 openaire |e Política editorial | ||
| 504 | |a Black, A.W., Lenzo, K.A., (2007) Building Synthetic Voices, , http://festvox.org/bsv, Language Technologies Institute, Carnegie Mellon University | ||
| 504 | |a Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H., (2001) The festival speech synthesis system | ||
| 504 | |a Boersma, P., Weenink, D., (2012) Praat: Doing Phonetics by Computer, , http://www.praat.org/ | ||
| 504 | |a Gurlekian, J., Colantoni, L., Torres, H., El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados (2001) Fonoaudiológica, 47, pp. 58-69 | ||
| 504 | |a Gurlekian, J.A., Cossio-Mercado, C., Torres, H., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for Argentine Spanish (2012) Proceedings of Iberspeech, , Madrid, Spain | ||
| 504 | |a Moulines, E., Charpentier, F., Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones (1990) Speech Communication, 9 (5), pp. 453-467 | ||
| 504 | |a Nye, P.W., Gaitenby, J.H., The intelligibility of synthetic monosyllabic words in short, syntactically normal sentences (1974) Haskins Laboratories Status Report on Speech Research, 37 (38), pp. 169-190 | ||
| 504 | |a Schröder, M., Trouvain, J., The German text-tospeech synthesis system MARY: A tool for research, development and teaching (2003) International Journal of Speech Technology, 6 (4), pp. 365-377 | ||
| 504 | |a Torres, H.M., Gurlekian, J.A., Automatic determination of phrase breaks for Argentine Spanish (2004) Speech Prosody 2004, International Conference | ||
| 504 | |a Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83A4 - Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten | ||
| 520 | 3 | |a We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility. © 2013 Association for Computational Linguistics. |l eng | |
| 593 | |a Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina | ||
| 690 | 1 | 0 | |a COMPUTATIONAL LINGUISTICS |
| 690 | 1 | 0 | |a CONTINUOUS SPEECH RECOGNITION |
| 690 | 1 | 0 | |a SPEECH SYNTHESIS |
| 690 | 1 | 0 | |a CORPUS-BASED |
| 690 | 1 | 0 | |a HMM-BASED |
| 690 | 1 | 0 | |a SPEECH SYNTHESIZER |
| 690 | 1 | 0 | |a SYNTHESIZED SPEECH |
| 690 | 1 | 0 | |a AUDIO RECORDINGS |
| 700 | 1 | |a Rodríguez Zivic, P. | |
| 700 | 1 | |a Gravano, A. | |
| 700 | 1 | |a Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten | |
| 711 | 2 | |d 9 June 2013 through 14 June 2013 |g Código de la conferencia: 111457 | |
| 773 | 0 | |d Association for Computational Linguistics (ACL), 2013 |h pp. 502-506 |p NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf. |n NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference |z 9781937284473 |t 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 | |
| 856 | 4 | 1 | |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84926221891&partnerID=40&md5=09042af08f13da35f68fdb8157548bc7 |y Registro en Scopus |
| 856 | 4 | 0 | |u https://hdl.handle.net/20.500.12110/paper_97819372_v_n_p502_Violante |y Handle |
| 856 | 4 | 0 | |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante |y Registro en la Biblioteca Digital |
| 961 | |a paper_97819372_v_n_p502_Violante |b paper |c PE | ||
| 962 | |a info:eu-repo/semantics/conferenceObject |a info:ar-repo/semantics/documento de conferencia |b info:eu-repo/semantics/publishedVersion | ||
| 999 | |c 73970 | ||