Advances in deep neural network approaches to speaker recognition

The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: McLaren, M.
Otros Autores: Lei, Y., Ferrer, L., The Institute of Electrical and Electronics Engineers Signal Processing Society
Formato: Acta de conferencia Capítulo de libro
Lenguaje:Inglés
Publicado: Institute of Electrical and Electronics Engineers Inc. 2015
Acceso en línea:Registro en Scopus
DOI
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 06993caa a22006257a 4500
001 PAPER-13921
003 AR-BaUEN
005 20230518204421.0
008 190411s2015 xx ||||fo|||| 10| 0 eng|d
024 7 |2 scopus  |a 2-s2.0-84946053838 
040 |a Scopus  |b spa  |c AR-BaUEN  |d AR-BaUEN 
030 |a IPROD 
100 1 |a McLaren, M. 
245 1 0 |a Advances in deep neural network approaches to speaker recognition 
260 |b Institute of Electrical and Electronics Engineers Inc.  |c 2015 
506 |2 openaire  |e Política editorial 
504 |a Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP 
504 |a Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech 
504 |a Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798 
504 |a Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570 
504 |a Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2014) Submitted to IEEE Trans. ASLP 
504 |a Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Spoken language recognition based on senone posteriors (2014) Proc. Interspeech 
504 |a McLaren, M., Lei, Y., Scheffer, N., Ferrer, L., Application of convolutional neural networks to speaker recognition in noisy conditions (2014) Proc Interspeech 
504 |a Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey 
504 |a Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey 
504 |a Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Comparative study on the use of senone-based deep neural networks for speaker recognition (2014) Submitted to IEEE Trans. ASLP 
504 |a Pelecanos, J., Sridharan, S., Feature warping for robust speaker verification (2001) Proc. Speaker Odyssey 
504 |a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) Proc.Workshop on Human Language Technology, pp. 307-312 
504 |a McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP 
504 |a McLaren, M., Lei, Y., Improved speaker recognition using DCT coefficients as features (2015) Proc. ICASSP (Submitted) 
504 |a Prince, S.J.D., Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity (2007) Proc. ICCV. IEEE, pp. 1-8 
504 |a Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech 
504 |a (2012), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf; Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop 
504 |a Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Speech Communication and Technology 
504 |a Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society 
520 3 |a The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation. © 2015 IEEE.  |l eng 
593 |a Speech Technology and Research Laboratory, SRI International, California, United States 
593 |a Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina 
690 1 0 |a BOTTLENECK FEATURES 
690 1 0 |a CHANNEL MISMATCH 
690 1 0 |a DEEP NEURAL NETWORKS 
690 1 0 |a NORMALIZATION 
690 1 0 |a SPEAKER RECOGNITION 
700 1 |a Lei, Y. 
700 1 |a Ferrer, L. 
700 1 |a The Institute of Electrical and Electronics Engineers Signal Processing Society 
711 2 |d 19 April 2014 through 24 April 2014  |g Código de la conferencia: 116006 
773 0 |d Institute of Electrical and Electronics Engineers Inc., 2015  |g v. 2015-August  |h pp. 4814-4818  |p ICASSP IEEE Int Conf Acoust Speech Signal Process Proc  |n ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings  |x 15206149  |z 9781467369978  |t 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 
856 4 1 |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84946053838&doi=10.1109%2fICASSP.2015.7178885&partnerID=40&md5=f1c8dc1793bd7ea2758f140f80a896b3  |y Registro en Scopus 
856 4 0 |u https://doi.org/10.1109/ICASSP.2015.7178885  |y DOI 
856 4 0 |u https://hdl.handle.net/20.500.12110/paper_15206149_v2015-August_n_p4814_McLaren  |y Handle 
856 4 0 |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2015-August_n_p4814_McLaren  |y Registro en la Biblioteca Digital 
961 |a paper_15206149_v2015-August_n_p4814_McLaren  |b paper  |c PE 
962 |a info:eu-repo/semantics/conferenceObject  |a info:ar-repo/semantics/documento de conferencia  |b info:eu-repo/semantics/publishedVersion 
999 |c 74874