A deep neural network speaker verification system targeting microphone speech

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper expl...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., Meng H., Ma B., Xie L., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Formato: CONF
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
Aporte de:
id todo:paper_2308457X_v_n_p681_Lei
record_format dspace
spelling todo:paper_2308457X_v_n_p681_Lei2023-10-03T16:40:55Z A deep neural network speaker verification system targeting microphone speech Lei, Y. Ferrer, L. McLaren, M. Scheffer, N. Chng E.S. Li H. Meng H. Ma B. Xie L. Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Deep neural networks
I-vectors
Microphone data
Speaker recognition
Microphones
Speech
Speech communication
Telephone sets
Accuracy Improvement
Deep neural networks
Gaussian Mixture Model
I-vectors
In-depth analysis
Speaker recognition
Speaker recognition evaluations
Speaker verification system
Speech recognition
spellingShingle Deep neural networks
I-vectors
Microphone data
Speaker recognition
Microphones
Speech
Speech communication
Telephone sets
Accuracy Improvement
Deep neural networks
Gaussian Mixture Model
I-vectors
In-depth analysis
Speaker recognition
Speaker recognition evaluations
Speaker verification system
Speech recognition
Lei, Y.
Ferrer, L.
McLaren, M.
Scheffer, N.
Chng E.S.
Li H.
Meng H.
Ma B.
Xie L.
Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
A deep neural network speaker verification system targeting microphone speech
topic_facet Deep neural networks
I-vectors
Microphone data
Speaker recognition
Microphones
Speech
Speech communication
Telephone sets
Accuracy Improvement
Deep neural networks
Gaussian Mixture Model
I-vectors
In-depth analysis
Speaker recognition
Speaker recognition evaluations
Speaker verification system
Speech recognition
description We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA.
format CONF
author Lei, Y.
Ferrer, L.
McLaren, M.
Scheffer, N.
Chng E.S.
Li H.
Meng H.
Ma B.
Xie L.
Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
author_facet Lei, Y.
Ferrer, L.
McLaren, M.
Scheffer, N.
Chng E.S.
Li H.
Meng H.
Ma B.
Xie L.
Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
author_sort Lei, Y.
title A deep neural network speaker verification system targeting microphone speech
title_short A deep neural network speaker verification system targeting microphone speech
title_full A deep neural network speaker verification system targeting microphone speech
title_fullStr A deep neural network speaker verification system targeting microphone speech
title_full_unstemmed A deep neural network speaker verification system targeting microphone speech
title_sort deep neural network speaker verification system targeting microphone speech
url http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei
work_keys_str_mv AT leiy adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT ferrerl adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mclarenm adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT scheffern adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT chnges adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT lih adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mengh adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mab adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT xiel adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT amazonbaiduetalgoogletemaseklaboratoriesatnanyangtechnologicaluniversitytlatntuwechat adeepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT leiy deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT ferrerl deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mclarenm deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT scheffern deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT chnges deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT lih deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mengh deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT mab deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT xiel deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
AT amazonbaiduetalgoogletemaseklaboratoriesatnanyangtechnologicaluniversitytlatntuwechat deepneuralnetworkspeakerverificationsystemtargetingmicrophonespeech
_version_ 1807318003294928896