Advances in deep neural network approaches to speaker recognition
The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches...
Guardado en:
| Autor principal: | |
|---|---|
| Otros Autores: | , , |
| Formato: | Acta de conferencia Capítulo de libro |
| Lenguaje: | Inglés |
| Publicado: |
Institute of Electrical and Electronics Engineers Inc.
2015
|
| Acceso en línea: | Registro en Scopus DOI Handle Registro en la Biblioteca Digital |
| Aporte de: | Registro referencial: Solicitar el recurso aquí |
| LEADER | 06993caa a22006257a 4500 | ||
|---|---|---|---|
| 001 | PAPER-13921 | ||
| 003 | AR-BaUEN | ||
| 005 | 20230518204421.0 | ||
| 008 | 190411s2015 xx ||||fo|||| 10| 0 eng|d | ||
| 024 | 7 | |2 scopus |a 2-s2.0-84946053838 | |
| 040 | |a Scopus |b spa |c AR-BaUEN |d AR-BaUEN | ||
| 030 | |a IPROD | ||
| 100 | 1 | |a McLaren, M. | |
| 245 | 1 | 0 | |a Advances in deep neural network approaches to speaker recognition |
| 260 | |b Institute of Electrical and Electronics Engineers Inc. |c 2015 | ||
| 506 | |2 openaire |e Política editorial | ||
| 504 | |a Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP | ||
| 504 | |a Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech | ||
| 504 | |a Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798 | ||
| 504 | |a Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570 | ||
| 504 | |a Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2014) Submitted to IEEE Trans. ASLP | ||
| 504 | |a Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Spoken language recognition based on senone posteriors (2014) Proc. Interspeech | ||
| 504 | |a McLaren, M., Lei, Y., Scheffer, N., Ferrer, L., Application of convolutional neural networks to speaker recognition in noisy conditions (2014) Proc Interspeech | ||
| 504 | |a Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey | ||
| 504 | |a Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey | ||
| 504 | |a Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Comparative study on the use of senone-based deep neural networks for speaker recognition (2014) Submitted to IEEE Trans. ASLP | ||
| 504 | |a Pelecanos, J., Sridharan, S., Feature warping for robust speaker verification (2001) Proc. Speaker Odyssey | ||
| 504 | |a Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) Proc.Workshop on Human Language Technology, pp. 307-312 | ||
| 504 | |a McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP | ||
| 504 | |a McLaren, M., Lei, Y., Improved speaker recognition using DCT coefficients as features (2015) Proc. ICASSP (Submitted) | ||
| 504 | |a Prince, S.J.D., Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity (2007) Proc. ICCV. IEEE, pp. 1-8 | ||
| 504 | |a Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech | ||
| 504 | |a (2012), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf; Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop | ||
| 504 | |a Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Speech Communication and Technology | ||
| 504 | |a Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society | ||
| 520 | 3 | |a The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation. © 2015 IEEE. |l eng | |
| 593 | |a Speech Technology and Research Laboratory, SRI International, California, United States | ||
| 593 | |a Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina | ||
| 690 | 1 | 0 | |a BOTTLENECK FEATURES |
| 690 | 1 | 0 | |a CHANNEL MISMATCH |
| 690 | 1 | 0 | |a DEEP NEURAL NETWORKS |
| 690 | 1 | 0 | |a NORMALIZATION |
| 690 | 1 | 0 | |a SPEAKER RECOGNITION |
| 700 | 1 | |a Lei, Y. | |
| 700 | 1 | |a Ferrer, L. | |
| 700 | 1 | |a The Institute of Electrical and Electronics Engineers Signal Processing Society | |
| 711 | 2 | |d 19 April 2014 through 24 April 2014 |g Código de la conferencia: 116006 | |
| 773 | 0 | |d Institute of Electrical and Electronics Engineers Inc., 2015 |g v. 2015-August |h pp. 4814-4818 |p ICASSP IEEE Int Conf Acoust Speech Signal Process Proc |n ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |x 15206149 |z 9781467369978 |t 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 | |
| 856 | 4 | 1 | |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-84946053838&doi=10.1109%2fICASSP.2015.7178885&partnerID=40&md5=f1c8dc1793bd7ea2758f140f80a896b3 |y Registro en Scopus |
| 856 | 4 | 0 | |u https://doi.org/10.1109/ICASSP.2015.7178885 |y DOI |
| 856 | 4 | 0 | |u https://hdl.handle.net/20.500.12110/paper_15206149_v2015-August_n_p4814_McLaren |y Handle |
| 856 | 4 | 0 | |u https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2015-August_n_p4814_McLaren |y Registro en la Biblioteca Digital |
| 961 | |a paper_15206149_v2015-August_n_p4814_McLaren |b paper |c PE | ||
| 962 | |a info:eu-repo/semantics/conferenceObject |a info:ar-repo/semantics/documento de conferencia |b info:eu-repo/semantics/publishedVersion | ||
| 999 | |c 74874 | ||