Exploring the role of phonetic bottleneck features for speaker and language recognition

Mostrar otras versiones (1)

Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification. For language identification, the features' dense phonetic information is believed to enable improve...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	McLaren, M.
Otros Autores:	Ferrer, L., Lawson, A., The Institute of Electrical and Electronics Engineers Signal Processing Society
Formato:	Acta de conferencia Capítulo de libro
Lenguaje:	Inglés
Publicado:	Institute of Electrical and Electronics Engineers Inc. 2016
Acceso en línea:	Registro en Scopus DOI Handle Registro en la Biblioteca Digital
Aporte de:	Registro referencial: Solicitar el recurso aquí Biblioteca Central Dr. Luis F. Leloir (FCEN) de Universidad de Buenos Aires


LEADER	07494caa a22006497a 4500
001	PAPER-15997
003	AR-BaUEN
005	20230518204651.0
008	190411s2016 xx \|\|\|\|fo\|\|\|\| 10\| 0 eng\|d
024	7		\|2 scopus \|a 2-s2.0-84973343060
040			\|a Scopus \|b spa \|c AR-BaUEN \|d AR-BaUEN
030			\|a IPROD
100	1		\|a McLaren, M.
245	1	0	\|a Exploring the role of phonetic bottleneck features for speaker and language recognition
260			\|b Institute of Electrical and Electronics Engineers Inc. \|c 2016
506			\|2 openaire \|e Política editorial
504			\|a Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically aware deep neural network (2014) Proc. ICASSP
504			\|a Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) Submitted to IEEE Trans. Audio Speech and Language Processing
504			\|a Richardson, F., Reynolds, D., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech
504			\|a McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. IEEE ICASSP
504			\|a Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Language identification based on senone posteriors (2014) Proc. Interspeech
504			\|a Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-Vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
504			\|a Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
504			\|a Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
504			\|a Matejka, P., Schwarz, P., Cernocky, J., Chytil, P., Phonotactic language identification using high-quality phoneme recognition (2005) Proc Interspeech
504			\|a Shen, W., Campbell, W., Gleason, T., Reynolds, D., Singer, E., Experiments with lattice-based PPRLM language identification (2006) Proc. Odyssey
504			\|a Stolcke, A., Akbacak, M., Ferrer, L., Kajarekar, S., Richey, C., Scheffer, N., Shriberg, E., Improving language recognition with multilingual phone recognition and speaker adaptation transforms (2010) Proc. Odyssey
504			\|a Fernando D'Haro Enŕquez, L., Glembek, O., Plchot, O., Matejka, P., Soufifar, M., De Córdoba Herralde, R., Ernockỳ, J.C., Phonotactic language recognition using i-vectors and phoneme posteriogram counts (2012) Proc. Interspeech
504			\|a Penagarikano, M., Varona, A., Diez, M., Rodriguez-Fuentes, L.J., Bordel, G., Study of different backends in a state-of-the-art language recognition system (2012) Proc. Interspeech
504			\|a Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
504			\|a Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Scheffer, N., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
504			\|a Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256
504			\|a Larcher, A., Lee, K., Ma, B., Li, H., RSR2015: Database for text-dependent speaker verification using multiple pass-phrases (2012) Proc. Interspeech
504			\|a McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Trial-based calibration for speaker recognition in unseen conditions (2014) Odyssey 2014: The Speaker and Language Recognition Workshop
504			\|a (2009) The 2009 NIST Language Recognition Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/lre/2009/
504			\|a Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., Graciarena, M., Improving language identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. Interspeech
504			\|a Walker, K., Strassel, S., The rats radio traffic collection system (2012) Proc. Odyssey
504			\|a Stafylakis, T., Kenny, P., Ouellet, P., Perez, J., Kockmann, M., Dumouchel, P., Text-dependent speaker recognition using PLDA with uncertainty propagation (2013) Proc. Interspeech, p. 36843688A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society
520	3		\|a Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification. For language identification, the features' dense phonetic information is believed to enable improved performance by better representing language-dependent phone distributions. For speaker recognition, the role of these features is less clear, given that a bottleneck layer near the DNN output layer is thought to contain limited speaker information. In this article, we analyze the role of bottleneck features in these identification tasks by varying the DNN layer from which they are extracted, under the hypothesis that speaker information is traded for dense phonetic information as the layer moves toward the DNN output layer. Experiments support this hypothesis under certain conditions, and highlight the benefit of using a bottleneck layer close to the DNN output layer when DNN training data is matched to the evaluation conditions, and a layer more central to the DNN otherwise. © 2016 IEEE. \|l eng
593			\|a Speech Technology and Research Laboratory, SRI InternationalCA, United States
593			\|a Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
690	1	0	\|a BOTTLENECK FEATURES
690	1	0	\|a DEEP NEURAL NETWORKS
690	1	0	\|a LANGUAGE RECOGNITION
690	1	0	\|a SPEAKER RECOGNITION
700	1		\|a Ferrer, L.
700	1		\|a Lawson, A.
700	1		\|a The Institute of Electrical and Electronics Engineers Signal Processing Society
711	2		\|d 20 March 2016 through 25 March 2016 \|g Código de la conferencia: 121667
773	0		\|d Institute of Electrical and Electronics Engineers Inc., 2016 \|g v. 2016-May \|h pp. 5575-5579 \|p ICASSP IEEE Int Conf Acoust Speech Signal Process Proc \|n ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings \|x 15206149 \|z 9781479999880 \|t 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
856	4	1	\|u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84973343060&doi=10.1109%2fICASSP.2016.7472744&partnerID=40&md5=a9bb7fa9c0296bd4ee8f95f07d0d04aa \|y Registro en Scopus
856	4	0	\|u https://doi.org/10.1109/ICASSP.2016.7472744 \|y DOI
856	4	0	\|u https://hdl.handle.net/20.500.12110/paper_15206149_v2016-May_n_p5575_McLaren \|y Handle
856	4	0	\|u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2016-May_n_p5575_McLaren \|y Registro en la Biblioteca Digital
961			\|a paper_15206149_v2016-May_n_p5575_McLaren \|b paper \|c PE
962			\|a info:eu-repo/semantics/conferenceObject \|a info:ar-repo/semantics/documento de conferencia \|b info:eu-repo/semantics/publishedVersion
999			\|c 76950

Exploring the role of phonetic bottleneck features for speaker and language recognition

Ejemplares similares