A deep neural network speaker verification system targeting microphone speech

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper expl...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Lei, Y.
Otros Autores: Ferrer, L., McLaren, M., Scheffer, N., Chng E.S, Li H., Meng H., Ma B., Xie L., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Formato: Acta de conferencia Capítulo de libro
Lenguaje:Inglés
Publicado: International Speech and Communication Association 2014
Acceso en línea:Registro en Scopus
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 06250caa a22007457a 4500
001 PAPER-14655
003 AR-BaUEN
005 20230518204513.0
008 190411s2014 xx ||||fo|||| 10| 0 eng|d
024 7 |2 scopus  |a 2-s2.0-84910072392 
040 |a Scopus  |b spa  |c AR-BaUEN  |d AR-BaUEN 
100 1 |a Lei, Y. 
245 1 2 |a A deep neural network speaker verification system targeting microphone speech 
260 |b International Speech and Communication Association  |c 2014 
270 1 0 |m Lei, Y.; Speech Technology and Research Laboratory, SRI InternationalUnited States 
506 |2 openaire  |e Política editorial 
504 |a Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2010) IEEE Trans. ASLP, 19, pp. 788-798. , May 
504 |a Reynolds, D.A., Quatieri, T.F., Speaker verification using adapted Gaussian mixture models (2000) Digital Signal Processing, 10, pp. 19-41 
504 |a Prince, S., Probabilistic linear discriminant analysis for inferences about identity (2007) ICCV-2007, pp. 1-8. , IEEE 
504 |a Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97 
504 |a Lei, Y., Scheffer, N., Ferrer, L., Mclaren, M., A novel scheme for speaker recognition using a phoneticallyaware deep neural network (2007) ICASSP-2014, , IEEE 
504 |a Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P., A study of inter-speaker variability in speaker verification (2008) IEEE Trans. ASLP, 16, pp. 980-988. , July 
504 |a Dahl, G., Yu, D., Deng, L., Acero, A., Contextdependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42 
504 |a Ferrer, L., Mclaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Interspeech- 2013, pp. 1981-1985 
504 |a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312 
504 |a NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplanv11-r0.pdf 
504 |a Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Janin, A., Magimai-Doss, M., Wooters, C., Zheng, J., The SRIICSI spring 2007 meeting and lecture recognition system (2008) Proc. NIST Rich TranscriptionWorkshop, pp. 450-463. , Springer Lecture Notes in Computer Science 
504 |a Deng, L., Li, J., Huang, J., Yao, K., Yu, D., Seide, F., Seltzer, M., Acero, A., Recent advances in deep learning for speech research at Microsoft (2013) ICASSP-2013, pp. 8604-8608. , IEEEA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat 
520 3 |a We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA.  |l eng 
593 |a Speech Technology and Research Laboratory, SRI InternationalCA, United States 
593 |a Departamento de Computacíon, FCEN, Universidad de Buenos Aires, Argentina 
690 1 0 |a DEEP NEURAL NETWORKS 
690 1 0 |a I-VECTORS 
690 1 0 |a MICROPHONE DATA 
690 1 0 |a SPEAKER RECOGNITION 
690 1 0 |a MICROPHONES 
690 1 0 |a SPEECH 
690 1 0 |a SPEECH COMMUNICATION 
690 1 0 |a TELEPHONE SETS 
690 1 0 |a ACCURACY IMPROVEMENT 
690 1 0 |a DEEP NEURAL NETWORKS 
690 1 0 |a GAUSSIAN MIXTURE MODEL 
690 1 0 |a I-VECTORS 
690 1 0 |a IN-DEPTH ANALYSIS 
690 1 0 |a SPEAKER RECOGNITION 
690 1 0 |a SPEAKER RECOGNITION EVALUATIONS 
690 1 0 |a SPEAKER VERIFICATION SYSTEM 
690 1 0 |a SPEECH RECOGNITION 
700 1 |a Ferrer, L. 
700 1 |a McLaren, M. 
700 1 |a Scheffer, N. 
700 1 |a Chng E.S. 
700 1 |a Li H. 
700 1 |a Meng H. 
700 1 |a Ma B. 
700 1 |a Xie L. 
700 1 |a Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat 
711 2 |d 14 September 2014 through 18 September 2014  |g Código de la conferencia: 108771 
773 0 |d International Speech and Communication Association, 2014  |h pp. 681-685  |p Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH  |n Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH  |x 2308457X  |t 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 
856 4 1 |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84910072392&partnerID=40&md5=3d7cfc95ca9afff86dfe60130dd52891  |y Registro en Scopus 
856 4 0 |u https://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei  |y Handle 
856 4 0 |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei  |y Registro en la Biblioteca Digital 
961 |a paper_2308457X_v_n_p681_Lei  |b paper  |c PE 
962 |a info:eu-repo/semantics/conferenceObject  |a info:ar-repo/semantics/documento de conferencia  |b info:eu-repo/semantics/publishedVersion 
999 |c 75608