A deep neural network speaker verification system targeting microphone speech
We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper expl...
Guardado en:
| Publicado: |
2014
|
|---|---|
| Materias: | |
| Acceso en línea: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei |
| Aporte de: |
| id |
paper:paper_2308457X_v_n_p681_Lei |
|---|---|
| record_format |
dspace |
| spelling |
paper:paper_2308457X_v_n_p681_Lei2025-07-30T19:11:11Z A deep neural network speaker verification system targeting microphone speech Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA. 2014 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei |
| institution |
Universidad de Buenos Aires |
| institution_str |
I-28 |
| repository_str |
R-134 |
| collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
| topic |
Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition |
| spellingShingle |
Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition A deep neural network speaker verification system targeting microphone speech |
| topic_facet |
Deep neural networks I-vectors Microphone data Speaker recognition Microphones Speech Speech communication Telephone sets Accuracy Improvement Deep neural networks Gaussian Mixture Model I-vectors In-depth analysis Speaker recognition Speaker recognition evaluations Speaker verification system Speech recognition |
| description |
We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA. |
| title |
A deep neural network speaker verification system targeting microphone speech |
| title_short |
A deep neural network speaker verification system targeting microphone speech |
| title_full |
A deep neural network speaker verification system targeting microphone speech |
| title_fullStr |
A deep neural network speaker verification system targeting microphone speech |
| title_full_unstemmed |
A deep neural network speaker verification system targeting microphone speech |
| title_sort |
deep neural network speaker verification system targeting microphone speech |
| publishDate |
2014 |
| url |
https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei http://hdl.handle.net/20.500.12110/paper_2308457X_v_n_p681_Lei |
| _version_ |
1840324521256550400 |