A deep neural network speaker verification system targeting microphone speech

Mostrar todas las versiones(2)

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper expl...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., Meng H., Ma B., Xie L., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Formato:	CONF
Materias:	Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Gaussian Mixture Model In-depth analysis Speaker recognition evaluations Speaker verification system Speech recognition
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_2308457X_v_n_p681_Lei
record_format	dspace
spelling	todo:paper_2308457X_v_n_p681_Lei2023-10-03T16:40:55Z A deep neural network speaker verification system targeting microphone speech Lei, Y. Ferrer, L. McLaren, M. Scheffer, N. Chng E.S. Li H. Meng H. Ma B. Xie L. Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition
spellingShingle	Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition Lei, Y. Ferrer, L. McLaren, M. Scheffer, N. Chng E.S. Li H. Meng H. Ma B. Xie L. Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat A deep neural network speaker verification system targeting microphone speech
topic_facet	Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition
description	We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA.
format	CONF
author	Lei, Y. Ferrer, L. McLaren, M. Scheffer, N. Chng E.S. Li H. Meng H. Ma B. Xie L. Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
author_facet	Lei, Y. Ferrer, L. McLaren, M. Scheffer, N. Chng E.S. Li H. Meng H. Ma B. Xie L. Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
author_sort	Lei, Y.
title	A deep neural network speaker verification system targeting microphone speech
title_short	A deep neural network speaker verification system targeting microphone speech
title_full	A deep neural network speaker verification system targeting microphone speech
title_fullStr	A deep neural network speaker verification system targeting microphone speech
title_full_unstemmed	A deep neural network speaker verification system targeting microphone speech
title_sort	deep neural network speaker verification system targeting microphone speech
url	http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
work_keys_str_mv	AT leiy adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT ferrerl adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mclarenm adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT scheffern adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT chnges adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT lih adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mengh adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mab adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT xiel adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT amazonbaiduetalgoogletemaseklaboratoriesatnanyangtechnologicaluniversitytlatntuwechat adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT leiy deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT ferrerl deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mclarenm deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT scheffern deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT chnges deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT lih deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mengh deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT mab deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT xiel deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech AT amazonbaiduetalgoogletemaseklaboratoriesatnanyangtechnologicaluniversitytlatntuwechat deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
_version_	1807318003294928896

A deep neural network speaker verification system targeting microphone speech

Ejemplares similares