Advances in deep neural network approaches to speaker recognition
The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches...
Guardado en:
| Autor principal: | |
|---|---|
| Otros Autores: | , , |
| Formato: | Acta de conferencia Capítulo de libro |
| Lenguaje: | Inglés |
| Publicado: |
Institute of Electrical and Electronics Engineers Inc.
2015
|
| Acceso en línea: | Registro en Scopus DOI Handle Registro en la Biblioteca Digital |
| Aporte de: | Registro referencial: Solicitar el recurso aquí |
| Sumario: | The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation. © 2015 IEEE. |
|---|---|
| Bibliografía: | Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798 Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570 Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2014) Submitted to IEEE Trans. ASLP Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Spoken language recognition based on senone posteriors (2014) Proc. Interspeech McLaren, M., Lei, Y., Scheffer, N., Ferrer, L., Application of convolutional neural networks to speaker recognition in noisy conditions (2014) Proc Interspeech Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Comparative study on the use of senone-based deep neural networks for speaker recognition (2014) Submitted to IEEE Trans. ASLP Pelecanos, J., Sridharan, S., Feature warping for robust speaker verification (2001) Proc. Speaker Odyssey Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) Proc.Workshop on Human Language Technology, pp. 307-312 McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP McLaren, M., Lei, Y., Improved speaker recognition using DCT coefficients as features (2015) Proc. ICASSP (Submitted) Prince, S.J.D., Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity (2007) Proc. ICCV. IEEE, pp. 1-8 Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech (2012), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf; Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Speech Communication and Technology Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society |
| ISBN: | 9781467369978 |
| ISSN: | 15206149 |
| DOI: | 10.1109/ICASSP.2015.7178885 |