A deep neural network speaker verification system targeting microphone speech

Mostrar otras versiones (1)

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper expl...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Lei, Y.
Otros Autores:	Ferrer, L., McLaren, M., Scheffer, N., Chng E.S, Li H., Meng H., Ma B., Xie L., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Formato:	Acta de conferencia Capítulo de libro
Lenguaje:	Inglés
Publicado:	International Speech and Communication Association 2014
Acceso en línea:	Registro en Scopus Handle Registro en la Biblioteca Digital
Aporte de:	Registro referencial: Solicitar el recurso aquí Biblioteca Central Dr. Luis F. Leloir (FCEN) de Universidad de Buenos Aires


LEADER	06250caa a22007457a 4500
001	PAPER-14655
003	AR-BaUEN
005	20230518204513.0
008	190411s2014 xx \|\|\|\|fo\|\|\|\| 10\| 0 eng\|d
024	7		\|2 scopus \|a 2-s2.0-84910072392
040			\|a Scopus \|b spa \|c AR-BaUEN \|d AR-BaUEN
100	1		\|a Lei, Y.
245	1	2	\|a A deep neural network speaker verification system targeting microphone speech
260			\|b International Speech and Communication Association \|c 2014
270	1	0	\|m Lei, Y.; Speech Technology and Research Laboratory, SRI InternationalUnited States
506			\|2 openaire \|e Política editorial
504			\|a Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2010) IEEE Trans. ASLP, 19, pp. 788-798. , May
504			\|a Reynolds, D.A., Quatieri, T.F., Speaker verification using adapted Gaussian mixture models (2000) Digital Signal Processing, 10, pp. 19-41
504			\|a Prince, S., Probabilistic linear discriminant analysis for inferences about identity (2007) ICCV-2007, pp. 1-8. , IEEE
504			\|a Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97
504			\|a Lei, Y., Scheffer, N., Ferrer, L., Mclaren, M., A novel scheme for speaker recognition using a phoneticallyaware deep neural network (2007) ICASSP-2014, , IEEE
504			\|a Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P., A study of inter-speaker variability in speaker verification (2008) IEEE Trans. ASLP, 16, pp. 980-988. , July
504			\|a Dahl, G., Yu, D., Deng, L., Acero, A., Contextdependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42
504			\|a Ferrer, L., Mclaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Interspeech- 2013, pp. 1981-1985
504			\|a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312
504			\|a NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplanv11-r0.pdf
504			\|a Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Janin, A., Magimai-Doss, M., Wooters, C., Zheng, J., The SRIICSI spring 2007 meeting and lecture recognition system (2008) Proc. NIST Rich TranscriptionWorkshop, pp. 450-463. , Springer Lecture Notes in Computer Science
504			\|a Deng, L., Li, J., Huang, J., Yao, K., Yu, D., Seide, F., Seltzer, M., Acero, A., Recent advances in deep learning for speech research at Microsoft (2013) ICASSP-2013, pp. 8604-8608. , IEEEA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
520	3		\|a We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA. \|l eng
593			\|a Speech Technology and Research Laboratory, SRI InternationalCA, United States
593			\|a Departamento de Computacíon, FCEN, Universidad de Buenos Aires, Argentina
690	1	0	\|a DEEP NEURAL NETWORKS
690	1	0	\|a I-VECTORS
690	1	0	\|a MICROPHONE DATA
690	1	0	\|a SPEAKER RECOGNITION
690	1	0	\|a MICROPHONES
690	1	0	\|a SPEECH
690	1	0	\|a SPEECH COMMUNICATION
690	1	0	\|a TELEPHONE SETS
690	1	0	\|a ACCURACY IMPROVEMENT
690	1	0	\|a DEEP NEURAL NETWORKS
690	1	0	\|a GAUSSIAN MIXTURE MODEL
690	1	0	\|a I-VECTORS
690	1	0	\|a IN-DEPTH ANALYSIS
690	1	0	\|a SPEAKER RECOGNITION
690	1	0	\|a SPEAKER RECOGNITION EVALUATIONS
690	1	0	\|a SPEAKER VERIFICATION SYSTEM
690	1	0	\|a SPEECH RECOGNITION
700	1		\|a Ferrer, L.
700	1		\|a McLaren, M.
700	1		\|a Scheffer, N.
700	1		\|a Chng E.S.
700	1		\|a Li H.
700	1		\|a Meng H.
700	1		\|a Ma B.
700	1		\|a Xie L.
700	1		\|a Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
711	2		\|d 14 September 2014 through 18 September 2014 \|g Código de la conferencia: 108771
773	0		\|d International Speech and Communication Association, 2014 \|h pp. 681-685 \|p Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH \|n Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH \|x 2308457X \|t 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
856	4	1	\|u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84910072392&partnerID=40&md5=3d7cfc95ca9afff86dfe60130dd52891 \|y Registro en Scopus
856	4	0	\|u https://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei \|y Handle
856	4	0	\|u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei \|y Registro en la Biblioteca Digital
961			\|a paper_2308457X_v_n_p681_Lei \|b paper \|c PE
962			\|a info:eu-repo/semantics/conferenceObject \|a info:ar-repo/semantics/documento de conferencia \|b info:eu-repo/semantics/publishedVersion
999			\|c 75608

A deep neural network speaker verification system targeting microphone speech

Ejemplares similares