Advances in deep neural network approaches to speaker recognition

The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: McLaren, M.
Otros Autores: Lei, Y., Ferrer, L., The Institute of Electrical and Electronics Engineers Signal Processing Society
Formato: Acta de conferencia Capítulo de libro
Lenguaje:Inglés
Publicado: Institute of Electrical and Electronics Engineers Inc. 2015
Acceso en línea:Registro en Scopus
DOI
Handle
Registro en la Biblioteca Digital
Aporte de:Registro referencial: Solicitar el recurso aquí
Descripción
Sumario:The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation. © 2015 IEEE.
Bibliografía:Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2014) Submitted to IEEE Trans. ASLP
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Spoken language recognition based on senone posteriors (2014) Proc. Interspeech
McLaren, M., Lei, Y., Scheffer, N., Ferrer, L., Application of convolutional neural networks to speaker recognition in noisy conditions (2014) Proc Interspeech
Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Comparative study on the use of senone-based deep neural networks for speaker recognition (2014) Submitted to IEEE Trans. ASLP
Pelecanos, J., Sridharan, S., Feature warping for robust speaker verification (2001) Proc. Speaker Odyssey
Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) Proc.Workshop on Human Language Technology, pp. 307-312
McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP
McLaren, M., Lei, Y., Improved speaker recognition using DCT coefficients as features (2015) Proc. ICASSP (Submitted)
Prince, S.J.D., Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity (2007) Proc. ICCV. IEEE, pp. 1-8
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
(2012), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf; Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Speech Communication and Technology
Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society
ISBN:9781467369978
ISSN:15206149
DOI:10.1109/ICASSP.2015.7178885